Conclusion.lidr 3.63 KB
 Nuria Brede committed Feb 23, 2021 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 % -*-Latex-*- \section{Conclusion} \label{section:conclusion} We have shown that, for measures of uncertainty that fulfill three general compatibility conditions, the monadic backward induction of the framework for specifying and solving finite-horizon, monadic SDPs proposed in \citep{2017_Botta_Jansson_Ionescu} is correct. The main result has been proved via the extensional equality of two value functions: 1) the value function of Bellman's dynamic programming \citep{bellman1957} and optimal control theory \citep{bertsekas1995, puterman2014markov} that is also at the core of the generic backward induction algorithm of \citep{2017_Botta_Jansson_Ionescu} and 2) the measured total reward function that specifies the objective of decision making in monadic SDPs: the maximization of a measure of the sum of the rewards along the trajectories rooted at the state associated with the first decision. Our contribution to verified optimal decision making is twofold: On the one hand, we have implemented a machine-checked generalization of the semi-formal results for deterministic and stochastic SDPs discussed in \citep[Prop.~1.3.1]{bertsekas1995} and \citep[Theorem~4.5.1.c]{puterman2014markov}. % As a consequence, we now have a provably correct method for solving deterministic and stochastic sequential decision problems with their canonical measure functions. % On the other hand, we have identified three general conditions that are sufficient for the equivalence between the two value functions to hold. The first two conditions are natural compatibility conditions between the measure of uncertainty |meas| and the monadic operations associated with the uncertainty monad |M|. The third condition is a relationship between |meas|, the functorial map associated with |M| and the rule for adding rewards |<+>|. All three conditions have a straightforward category-theoretical interpretation in terms of Eilenberg-Moore algebras \citep[ch.~VI.2]{maclane}. % As discussed in section~\ref{subsection:abstractCond}, the three conditions are independent and have non-trivial implications for the measure and the addition function that cannot be derived from the monotonicity condition on |meas| already imposed in \citep{ionescu2009, 2017_Botta_Jansson_Ionescu}. A consequence of this contribution is that we can now compute verified solutions of stochastic sequential decision problems in which the measure of uncertainty is different from the expected value measure. This is important for applications in which the goal of decision making is, for example, of maximizing the value of worst-case outcomes. % To the best of our knowledge, the formulation of the compatibility condition and the proof of the equivalence between the two value functions are novel results. The latter can be employed in a wider context than the one that has motivated our study: in many practical problems in science and engineering, the computation of optimal policies via backward induction (let apart brute-force or gradient methods) is simply not feasible. % In these problems one often still needs to generate, evaluate and compare different policies and our result shows under which conditions such evaluation can safely be done via the fast'' value function |val| of standard control theory. Finally, our contribution is an application of verified, literal programming to optimal decision making: the sources of this document have been written in literal Idris and are available at \citep{IdrisLibsValVal}, where the reader can also find the bare code and some examples. Although the development has been carried out in Idris, it should be readily reproducible in other implementations of type theory like Agda or Coq.