% -*-Latex-*-
%if False
> module Introduction
%endif
% -*-Latex-*-
\section{Introduction}
\label{section:introduction}
Backward induction is a method introduced by \cite{bellman1957} that
is routinely used to solve \emph{finite-horizon sequential decision
problems (SDP)}. Such problems lie
at the core of many applications in economics, logistics,
and computer science \citep{finus+al2003, helm2003,
heitzig2012, gintis2007, botta+al2013b,
de_moor1995, de_moor1999}.
Examples include inventory, scheduling and shortest path
problems, but also the search for optimal strategies in
games~\citep{bertsekas1995, diederich01}.
%
Botta, Jansson and Ionescu (\citeyear{2017_Botta_Jansson_Ionescu})
propose a generic framework for \emph{monadic} finite-horizon SDPs as
generalisation of the deterministic, non-deterministic and stochastic SDPs
treated in control theory textbooks \citep{bertsekas1995,
puterman2014markov}. This framework allows to
specify such problems and to solve them with a generic version of
backward induction that we will refer to as \emph{monadic backward
induction}.
The Botta-Jansson-Ionescu-framework, subsequently referred to as
\emph{BJI-framework}, \emph{BJI-theory} or simply \emph{framework},
already includes a verification of monadic
backward induction with respect to a certain underlying \emph{value}
function (see Sec.~\ref{subsection:solution_components}). However,
in the literature on stochastic SDPs this formulation of the function
is itself part of the backward induction algorithm and needs to be
verified against an optimisation criterion, the \emph{expected total
reward} \citep[Ch.~4.2]{puterman2014markov}.
%
For stochastic SDPs semi-formal proofs can be found in textbooks -- but
monadic SDPs are substantially more general than the stochastic SDPs
for which these results are established.
This observation raises a number of questions:
%
\begin{itemize}
%
\item What exactly should ``correctness'' mean for a solution of
monadic SDPs?
%
\item Does monadic backward induction provide
correct solutions in this sense for monadic SDPs in their full
generality?
%
\item And if not, is there a class of monadic SDPs for which
monadic backward induction does provide provably correct
solutions?
%
\end{itemize}
%
In the present paper we address these questions and make the
following contributions to answering them:
%
\begin{itemize}
%
\item We put forward a formal specification that monadic backward
induction should meet in order
to be considered ``correct'' as solution method for monadic SDPs.
This specification uses an optimisation criterion that is
a generic version of the \emph{expected total reward} of standard control
theory textbooks.\footnote{Note that in control theory
backward
induction is often referred to as \emph{the dynamic programming
algorithm} where the term \emph{dynamic programming} is used in
the original sense of \cite{bellman1957}.} In analogy, we call this
criterion \emph{measured total reward}.
%
\item We consider the value function underlying monadic backward
induction as ``correct'' if it computes the \emph{measured total reward}.
%
\item If the value function is correct, monadic backward induction can
be proven to be correct in our sense by extending the result of
\cite {2017_Botta_Jansson_Ionescu}.
However, we show that this is not necessarily the case, i.e.
the value function does not compute the \emph{measured total reward}
for arbitrary monadic SDPs.
%
\item We therefore formulate conditions that identify a class of monadic SDPs
for which the value function and thus monadic backward induction
can be shown to be correct. The conditions are fairly simple and allow for
a neat description in category-theoretical terms using the notion of
Eilenberg-Moore-algebra.
%
\item We give a formalised proof that monadic backward induction fulfils
the correctness criterion if the conditions hold. This correctness result can
be seen as a generic version of correctness results for standard backward induction
like \citep[Prop.~1.3.1]{bertsekas1995} and
\citep[Th.~4.5.1.c]{puterman2014markov}.
%
\end{itemize}
Our results rule out the application of backward induction to certain monadic SDPs that
were previously considered as admissible in the BJI-framework. Thus, they complement
the verification result of \bottaetal and provide a necessary clarification.
Still, the new conditions are simple enough to be checked for
non-standard instantiations of the framework. This
allows to broaden the applicability of backward induction to settings
which are not commonly discussed in the literature and to obtain a
formalised proof of correctness with considerably less effort.
It is worth stressing that our conditions can
be useful for anyone interested in applying monadic backward induction in
non-standard situations -- completely independent of the BJI-framework.
Finally, the value function underlying monadic backward induction is
also interesting in itself. Given the conditions hold, it can be used to compute
the measured total reward efficiently, using a method reminiscent of a \emph{thinning}
algorithm \cite[Ch.~10]{adwh}.
For the reader unfamiliar with SDPs, we provide a brief informal
overview and two simple examples in the next section. We recap the
BJI-framework and its (partial) verification result for monadic
backward induction in Sec.~\ref{section:framework}.
In Sec.~\ref{section:preparation}
we specify correctness for monadic backward induction
and the BJI-value function. We also
show that in the general monadic case the value function does not
necessarily meet the specification. To resolve this problem, we
identify conditions under which the value function does meet the
specification. These conditions are stated and analysed in
Sec.~\ref{section:conditions}. In Sec.~\ref{section:valval} we
prove that, given the conditions hold, the BJI-value function and monadic
backward induction are correct in the sense defined in
Sec.~\ref{section:preparation}. We discuss the conditions from a more
abstract perspective in Sec.~\ref{section:discussion} and
conclude in Sec.~\ref{section:conclusion}.
Throughout the paper we use Idris as our host language
\citep{JFP:9060502,idrisbook}. We assume some familiarity
with Haskell-like syntax and notions like \emph{functor} and
\emph{monad} as used in functional programming. We tacitly consider
types as logical statements and programs as proofs, justified by the
propositions-as-types correspondence \citep[for an accessible
introduction see][]{DBLP:journals/cacm/Wadler15}.
%
\paragraph*{Source code.} \hspace{0.1cm}
Our development is
formalised in Idris as an extension of a lightweight version of the
BJI-framework. The proofs are machine-checked and the source code is
available as supplementary material attached to this paper.
The sources of this document have been written in literal Idris and are
available at \citep{IdrisLibsValVal}, together with some example code.
All source files can be type checked with Idris~1.3.2.
%\vfill
%\pagebreak