Introduction.lidr 6.94 KB
 Nuria Brede committed Feb 23, 2021 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 % -*-Latex-*- %if False > module Introduction %endif % -*-Latex-*- \section{Introduction} \label{section:introduction} Backward induction is a method introduced by \cite{bellman1957} that is routinely used to solve \emph{finite-horizon sequential decision problems (SDP)}. Such problems lie at the core of many applications in economics, logistics, and computer science \citep{finus+al2003, helm2003, heitzig2012, gintis2007, botta+al2013b, de_moor1995, de_moor1999}. Examples include inventory, scheduling and shortest path problems, but also the search for optimal strategies in games~\citep{bertsekas1995, diederich01}. % Nuria Brede committed Jul 23, 2021 27 Botta, Jansson and Ionescu (\citeyear{2017_Botta_Jansson_Ionescu}) Nuria Brede committed Feb 23, 2021 28 propose a generic framework for \emph{monadic} finite-horizon SDPs as Nuria Brede committed Jul 23, 2021 29 generalisation of the deterministic, non-deterministic and stochastic SDPs Nuria Brede committed Feb 23, 2021 30 31 32 33 34 35 36 37 38 39 treated in control theory textbooks \citep{bertsekas1995, puterman2014markov}. This framework allows to specify such problems and to solve them with a generic version of backward induction that we will refer to as \emph{monadic backward induction}. The Botta-Jansson-Ionescu-framework, subsequently referred to as \emph{BJI-framework}, \emph{BJI-theory} or simply \emph{framework}, already includes a verification of monadic backward induction with respect to a certain underlying \emph{value} Nuria Brede committed Jul 23, 2021 40 function (see Sec.~\ref{subsection:solution_components}). However, Nuria Brede committed Feb 23, 2021 41 42 in the literature on stochastic SDPs this formulation of the function is itself part of the backward induction algorithm and needs to be Nuria Brede committed Jul 23, 2021 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 verified against an optimisation criterion, the \emph{expected total reward} \citep[Ch.~4.2]{puterman2014markov}. % For stochastic SDPs semi-formal proofs can be found in textbooks -- but monadic SDPs are substantially more general than the stochastic SDPs for which these results are established. This observation raises a number of questions: % \begin{itemize} % \item What exactly should correctness'' mean for a solution of monadic SDPs? % \item Does monadic backward induction provide correct solutions in this sense for monadic SDPs in their full generality? % \item And if not, is there a class of monadic SDPs for which monadic backward induction does provide provably correct solutions? % \end{itemize} % In the present paper we address these questions and make the following contributions to answering them: % \begin{itemize} % \item We put forward a formal specification that monadic backward induction should meet in order to be considered correct'' as solution method for monadic SDPs. This specification uses an optimisation criterion that is a generic version of the \emph{expected total reward} of standard control theory textbooks.\footnote{Note that in control theory backward Nuria Brede committed Feb 23, 2021 78 79 induction is often referred to as \emph{the dynamic programming algorithm} where the term \emph{dynamic programming} is used in Nuria Brede committed Jul 23, 2021 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 the original sense of \cite{bellman1957}.} In analogy, we call this criterion \emph{measured total reward}. % \item We consider the value function underlying monadic backward induction as correct'' if it computes the \emph{measured total reward}. % \item If the value function is correct, monadic backward induction can be proven to be correct in our sense by extending the result of \cite {2017_Botta_Jansson_Ionescu}. However, we show that this is not necessarily the case, i.e. the value function does not compute the \emph{measured total reward} for arbitrary monadic SDPs. % \item We therefore formulate conditions that identify a class of monadic SDPs for which the value function and thus monadic backward induction can be shown to be correct. The conditions are fairly simple and allow for a neat description in category-theoretical terms using the notion of Eilenberg-Moore-algebra. % \item We give a formalised proof that monadic backward induction fulfils the correctness criterion if the conditions hold. This correctness result can be seen as a generic version of correctness results for standard backward induction like \citep[Prop.~1.3.1]{bertsekas1995} and Nuria Brede committed Feb 23, 2021 103 \citep[Th.~4.5.1.c]{puterman2014markov}. Nuria Brede committed Jul 23, 2021 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 % \end{itemize} Our results rule out the application of backward induction to certain monadic SDPs that were previously considered as admissible in the BJI-framework. Thus, they complement the verification result of \bottaetal and provide a necessary clarification. Still, the new conditions are simple enough to be checked for non-standard instantiations of the framework. This allows to broaden the applicability of backward induction to settings which are not commonly discussed in the literature and to obtain a formalised proof of correctness with considerably less effort. It is worth stressing that our conditions can be useful for anyone interested in applying monadic backward induction in non-standard situations -- completely independent of the BJI-framework. Finally, the value function underlying monadic backward induction is also interesting in itself. Given the conditions hold, it can be used to compute the measured total reward efficiently, using a method reminiscent of a \emph{thinning} algorithm \cite[Ch.~10]{adwh}. Nuria Brede committed Feb 23, 2021 123 124 125 126 For the reader unfamiliar with SDPs, we provide a brief informal overview and two simple examples in the next section. We recap the BJI-framework and its (partial) verification result for monadic Nuria Brede committed Jul 23, 2021 127 128 backward induction in Sec.~\ref{section:framework}. In Sec.~\ref{section:preparation} Nuria Brede committed Feb 23, 2021 129 130 131 132 133 we specify correctness for monadic backward induction and the BJI-value function. We also show that in the general monadic case the value function does not necessarily meet the specification. To resolve this problem, we identify conditions under which the value function does meet the Nuria Brede committed Jul 23, 2021 134 135 specification. These conditions are stated and analysed in Sec.~\ref{section:conditions}. In Sec.~\ref{section:valval} we Nuria Brede committed Feb 23, 2021 136 137 prove that, given the conditions hold, the BJI-value function and monadic backward induction are correct in the sense defined in Nuria Brede committed Jul 23, 2021 138 139 140 Sec.~\ref{section:preparation}. We discuss the conditions from a more abstract perspective in Sec.~\ref{section:discussion} and conclude in Sec.~\ref{section:conclusion}. Nuria Brede committed Feb 23, 2021 141 142 143 144 145 146 147 148 149 Throughout the paper we use Idris as our host language \citep{JFP:9060502,idrisbook}. We assume some familiarity with Haskell-like syntax and notions like \emph{functor} and \emph{monad} as used in functional programming. We tacitly consider types as logical statements and programs as proofs, justified by the propositions-as-types correspondence \citep[for an accessible introduction see][]{DBLP:journals/cacm/Wadler15}. % Nuria Brede committed Jul 23, 2021 150 \paragraph*{Source code.} \hspace{0.1cm} Nuria Brede committed Feb 23, 2021 151 Our development is Nuria Brede committed Jul 23, 2021 152 formalised in Idris as an extension of a lightweight version of the Nuria Brede committed Feb 23, 2021 153 154 155 156 BJI-framework. The proofs are machine-checked and the source code is available as supplementary material attached to this paper. The sources of this document have been written in literal Idris and are available at \citep{IdrisLibsValVal}, together with some example code. Nuria Brede committed Jul 23, 2021 157 158 All source files can be type checked with Idris~1.3.2. Nicola Botta committed Sep 23, 2021 159 160 %\vfill %\pagebreak