Conclusion.lidr 3.63 KB
Newer Older
Nuria Brede's avatar
Nuria Brede committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
% -*-Latex-*-

\section{Conclusion}
\label{section:conclusion}

We have shown that, for measures of uncertainty that fulfill three
general compatibility conditions, the monadic backward induction of the
framework for specifying and solving finite-horizon, monadic SDPs proposed
in \citep{2017_Botta_Jansson_Ionescu} is correct.

The main result has been proved via the extensional equality of
two value functions: 1) the value function of Bellman's dynamic programming
\citep{bellman1957} and optimal control theory \citep{bertsekas1995,
  puterman2014markov} that is also at the core of the generic
backward induction algorithm of \citep{2017_Botta_Jansson_Ionescu} and
2) the measured total reward function that specifies the objective of
decision making in monadic SDPs: the maximization of a measure of the
sum of the rewards along the trajectories rooted at the state
associated with the first decision.

Our contribution to verified optimal decision making is twofold: On the
one hand, we have implemented a machine-checked generalization of the
semi-formal results for deterministic and stochastic SDPs
discussed in \citep[Prop.~1.3.1]{bertsekas1995} and
\citep[Theorem~4.5.1.c]{puterman2014markov}.
%
As a consequence, we now have a provably correct method for solving
deterministic and stochastic sequential decision problems with their
canonical measure functions.
%
On the other hand, we have identified three general conditions that are
sufficient for the equivalence between the two value functions to
hold. The first two conditions are natural compatibility conditions
between the measure of uncertainty |meas| and the monadic operations
associated with the uncertainty monad |M|. The third condition is a
relationship between |meas|, the functorial map associated with |M| and
the rule for adding rewards |<+>|. All three conditions have a
straightforward category-theoretical interpretation in terms of
Eilenberg-Moore algebras \citep[ch.~VI.2]{maclane}.
%
As discussed in section~\ref{subsection:abstractCond}, the three
conditions are independent and have non-trivial implications for the
measure and the addition function that cannot be derived from the
monotonicity condition on |meas| already imposed in
\citep{ionescu2009, 2017_Botta_Jansson_Ionescu}.

A consequence of this contribution is that we can now compute verified
solutions of stochastic sequential decision problems in which the
measure of uncertainty is different from the expected value
measure. This is important for applications in which the goal of
decision making is, for example, of maximizing the value of
worst-case outcomes.
%
To the best of our knowledge, the formulation of the compatibility
condition and the proof of the equivalence between the two value
functions are novel results.

The latter can be employed in a wider context than the one that has
motivated our study: in many practical problems in science and
engineering, the computation of optimal policies via backward induction
(let apart brute-force or gradient methods) is simply not feasible.
%
In these problems one often still needs to generate, evaluate and
compare different policies and our result shows under which conditions
such evaluation can safely be done via the ``fast'' value function |val|
of standard control theory.

Finally, our contribution is an application of verified, literal
programming to optimal decision making: the sources of this document
have been written in literal Idris and are available at
\citep{IdrisLibsValVal}, where the reader can also find the bare code
and some examples. Although the development has been carried out in
Idris, it should be readily reproducible in other implementations of
type theory like Agda or Coq.