Commit 12723851 authored by Nuria Brede's avatar Nuria Brede
Browse files

Initial.

parent e0f847b0
% -*-Latex-*-
\section{Conclusion}
\label{section:conclusion}
We have shown that, for measures of uncertainty that fulfill three
general compatibility conditions, the monadic backward induction of the
framework for specifying and solving finite-horizon, monadic SDPs proposed
in \citep{2017_Botta_Jansson_Ionescu} is correct.
The main result has been proved via the extensional equality of
two value functions: 1) the value function of Bellman's dynamic programming
\citep{bellman1957} and optimal control theory \citep{bertsekas1995,
puterman2014markov} that is also at the core of the generic
backward induction algorithm of \citep{2017_Botta_Jansson_Ionescu} and
2) the measured total reward function that specifies the objective of
decision making in monadic SDPs: the maximization of a measure of the
sum of the rewards along the trajectories rooted at the state
associated with the first decision.
Our contribution to verified optimal decision making is twofold: On the
one hand, we have implemented a machine-checked generalization of the
semi-formal results for deterministic and stochastic SDPs
discussed in \citep[Prop.~1.3.1]{bertsekas1995} and
\citep[Theorem~4.5.1.c]{puterman2014markov}.
%
As a consequence, we now have a provably correct method for solving
deterministic and stochastic sequential decision problems with their
canonical measure functions.
%
On the other hand, we have identified three general conditions that are
sufficient for the equivalence between the two value functions to
hold. The first two conditions are natural compatibility conditions
between the measure of uncertainty |meas| and the monadic operations
associated with the uncertainty monad |M|. The third condition is a
relationship between |meas|, the functorial map associated with |M| and
the rule for adding rewards |<+>|. All three conditions have a
straightforward category-theoretical interpretation in terms of
Eilenberg-Moore algebras \citep[ch.~VI.2]{maclane}.
%
As discussed in section~\ref{subsection:abstractCond}, the three
conditions are independent and have non-trivial implications for the
measure and the addition function that cannot be derived from the
monotonicity condition on |meas| already imposed in
\citep{ionescu2009, 2017_Botta_Jansson_Ionescu}.
A consequence of this contribution is that we can now compute verified
solutions of stochastic sequential decision problems in which the
measure of uncertainty is different from the expected value
measure. This is important for applications in which the goal of
decision making is, for example, of maximizing the value of
worst-case outcomes.
%
To the best of our knowledge, the formulation of the compatibility
condition and the proof of the equivalence between the two value
functions are novel results.
The latter can be employed in a wider context than the one that has
motivated our study: in many practical problems in science and
engineering, the computation of optimal policies via backward induction
(let apart brute-force or gradient methods) is simply not feasible.
%
In these problems one often still needs to generate, evaluate and
compare different policies and our result shows under which conditions
such evaluation can safely be done via the ``fast'' value function |val|
of standard control theory.
Finally, our contribution is an application of verified, literal
programming to optimal decision making: the sources of this document
have been written in literal Idris and are available at
\citep{IdrisLibsValVal}, where the reader can also find the bare code
and some examples. Although the development has been carried out in
Idris, it should be readily reproducible in other implementations of
type theory like Agda or Coq.
% -*-Latex-*-
%if False
> module Conditions
> import Framework
> %default total
> %auto_implicits off
> %access public export
> Q : Type
> Prob : Type -> Type
> supp : {A : Type} -> Prob A -> List A
> prob : {A : Type} -> Prob A -> A -> Q
> neutr : Val
>
> odot : Val -> Val -> Val
%endif
\section{Correctness conditions}
\label{section:conditions}
%\subsection{Conditions on measures}
%\label{subsection:measConditions}
We now formulate three conditions on measure functions that imply the
extensional equality of |val| and |val'|:
% measPureSpec
\noindent The measure needs to be left-inverse to |pure|:
\footnote{The symbol |`ExtEq`| denotes \emph{extensional} equality,
see appendix~\ref{appendix:monadLaws}}
< (1) measPureSpec : meas . pure `ExtEq` id
\begin{center}
\begin{tikzcd}[column sep=2cm]
%
Val \arrow[d, "pure", swap] \arrow[dr, "id"] \\
M Val \arrow[r, "meas", swap] & Val\\
%
\end{tikzcd}
\end{center}
% measJoinSpec
\noindent Applying the measure after |join| needs to be extensionally
equal to applying it after |map meas|:
< (2) measJoinSpec : meas . join `ExtEq` meas . map meas
\begin{center}
\begin{tikzcd}[column sep=3cm]
%
M (M Val) \arrow[r, "map\ meas"] \arrow[d, "join", swap] & M Val \arrow[d, "meas" ] \\
M Val \arrow[r, "meas", swap] & Val\\
%
\end{tikzcd}
\end{center}
% measPlusSpec
\noindent For arbitrary |v: Val| and non-empty |mv: M Val| applying
the measure after mapping |(v <+>)| onto |mv| needs to be equal to
applying |(v <+>)| after the measure:
< (3) measPlusSpec : (v : Val) -> (mv : M Val) -> (NotEmpty mv) ->
< (meas . map (v <+>)) mv = ((v <+>) . meas) mv
\begin{center}
\begin{tikzcd}[column sep=3cm]
%
M Val \arrow[r, "map (v\, \oplus)"] \arrow[d, "meas", swap] & M Val \arrow[d, "meas" ] \\
Val \arrow[r, "(v\, \oplus)", swap] & Val\\
%
\end{tikzcd}
\vspace{0.1cm}
\end{center}
Essentially, these conditions assure that the measure is well-behaved
relative to the monad structure and the |<+>|-operation.
\subsection{Examples and counter-examples}
\label{subsection:exAndCounterEx}
%
To get a better intuition, let us consider a few measures that do or
do not fulfill the three conditions.
%
Simple examples of admissible measures are the minimum (|minList| as
defined in fig.~\ref{fig:example1Formal}) or maximum (|maxList = foldr
`maximum` 0|) of a list for |M = List| with |Nat| as type of values
and ordinary addition as |<+>|. It is straightforward to prove that
the conditions hold for these two measures and the proofs for
|maxList| are included in the supplementary material.
As to counter-examples, let's take another look at our counter-example
from the last section, the arithmetic sum of a list. It does fulfill
|measPureSpec| and |measJoinSpec|, the first by definition, the second
by structural induction using the associativity of |+| (the list
monads' |join| is |concat|). But it fails to fulfill
|measPlusSpec|. The premiss |nonEmpty mv| (see appendix
\ref{appendix:monadLaws} and next subsection) tells us that the list
must not be |[]| -- otherwise we can prove the equality from a
contradiction. But if the list has the form |a :: as| we would have to
show the equality:
< (sum . map (v +)) (a :: as) = ((v +) . sum) (a :: as)
Clearly, if |v != 0| and |as != []| this equality cannot hold.
This is why in the last section the equality of |val| and |val'|
failed for |meas = sum|.
A similar failure would arise if we chose |meas = foldr (*) 1| instead,
as |+| does not distribute over |*|. But if we turned the situation
around by setting |<+> = *| and |meas = sum|, the condition
|measPlusSpec| would hold thanks to the usual arithmetic
distributivity law for |*| over |+|.
What about the other conditions? We remain in the setting of example~1
as above, and just vary the measure. Using a somewhat contrived
variation of |maxList|
< meas = foldr (\x, v => (x + 1 `maximum` v)) 0
it suffices to consider for an arbitrary |n : Nat| that
< (meas . pure) n = meas [n] = (n + 1) `maximum` 0 = n + 1 != n = id n
to see that now the condition |measPureSpec| fails.
%
To exhibit a measure that fails the condition |measJoinSpec|, we
switch to |Double| as type of values (still with addition as binary
operator) and the arithmetic average as measure |meas = avg|. Taking
a list of lists of different lengths like [[1], [2, 3]] we have
< meas (join [[1], [2, 3]]) = avg [1, 2, 3] = 2
< !=
< meas (map meas [[1], [2, 3]]) = avg [1, 2.5] = 1.75
All of the measures considered in this subsection do fulfill the
|measMonSpec| condition imposed by the BJI-theory. This raises the
question how previously admissible measures are impacted by adding the
three new conditions to the framework.
\subsection{Impact on previously admissible measures}
\label{subsection:impactMeas}
%
As we have seen in section~\ref{subsection:solution_components}, the
BJI-framework already requires measures to fulfill the monotonicity
condition
< measMonSpec : {A : Type} -> (f, g : A -> Val) -> ((a : A) -> (f a) <= (g a)) ->
< (ma : M A) -> meas (map f ma) <= meas (map g ma)
\bottaetal show that the arithmetic average (for |M = List|), the
worst-case measure (for |M = List| and for |M = Prob|) and the
expected value measure (for |M = Prob|) all fulfill |measMonSpec|. Thus, a
natural question is whether these measures also fulfill the three
additional requirements.
\paragraph*{Expected value measure.} \hspace{0.1cm}
%
Most applications of backward induction aggregate possible rewards with the
expected value measure. In a nutshell, for a numerical type |Q|, the
expected value of a probability distribution on |Q| is
> expVal : Num Q => Prob Q -> Q
> expVal spq = sum [q * prob spq q | q <- supp spq]
where |prob| and |supp| are generic functions that encode the notion of
\emph{probability} and of \emph{support} associated with a finite
probability distribution:
< prob : {A : Type} -> Prob A -> A -> Q
<
< supp : {A : Type} -> Prob A -> List A
For |spa| and |a| of suitable types, |prob spa a| represents the
probability of |a| according to |spa|. Similarly, |supp spa| returns a
list of those values whose probability is not zero in |spa|.
%
The probability function |prob| has to fulfill the axioms of
probability theory. In particular,
< sum [prob spa a | a <- supp spa] = 1
This condition implies that probability distributions cannot be empty, a
precondition of |measPlusSpec|. Putting forward minimal specifications
for |prob| and |supp| is not completely trivial but if the |+|-operation
associated with |Q| is commutative and associative, if |*|
distributes over |+| and if the |map| and |join| associated with
|Prob| -- for |f|, |a|, |b|, |spa| and |spspa| of suitable types --
fulfill the conservation law
< prob (map f spa) b = sum [prob spa a | a <- supp spa, f a == b]
and the total probability law
< prob (join spspa) a = sum [prob spa a * prob spspa spa | spa <- supp spspa]
then the expected value measure fulfills |measPureSpec|, |measJoinSpec| and
|measPlusSpec|. This is not surprising as it is standard
to apply backward induction to SDPs with the expected value of possible
rewards as measure.
\paragraph*{Average and arithmetic sum.} \hspace{0.1cm}
%
As can already be concluded
from the corresponding counter-examples in the previous subsection,
neither the plain arithmetic average nor the arithmetic sum are
suited as measure when using the standard monad structure on
|List| to represent non-deterministic
uncertainty. However, the arithmetic average can be used as measure,
if |List| is endowed with another monad structure, corresponding to a
probability monad for uniform distributions.
\paragraph*{Worst-case measures.} \hspace{0.1cm}
%
In many important applications in
climate impact research but also in portfolio management and sports,
decisions are taken as to minimize the consequences of worst case
outcomes. Depending on how ``worse'' is defined, the corresponding
measures might pick the maximum or
minimum from an |M|-structure of values. In the previous subsection we
considered an example in which the monad was |List|, the operation
|<+>| plain addition together with either |maxList| or |minList| as
measure. And indeed we can prove that for both measures the three
requirements hold (the proofs for |maxList| can be found in the
supplementary material). This gives us a notion of worst-case measure
that is admissible for monadic backward induction.
\vspace{0.2cm}
We can thus conclude that the new requirements hold for certain
familiar measures, but also that they have non-trivial consequences on
measures that can be used in the BJI-framework.
\subsection{The measure conditions from an abstract perspective}
\label{subsection:abstractCond}
%
Now that we have seen what the three conditions mean for concrete examples,
we can consider them from a more abstract point of view.
\paragraph*{Category-theoretical perspective.}\hspace{0.1cm}
Readers familiar with the theory of monads might have recognized that the
first two conditions ensure that |meas| is the structure map of a
monad algebra for |M| on |Val| and thus the pair |(Val, meas)| is an
object of
the Eilenberg-Moore category associated with the monad |M|. The third
condition requires the map |(v <+>)| to be an |M|-algebra homomorphism
-- a structure preserving map -- for arbitrary values |v|.
This perspective allows us to use existing knowledge about monad
algebras as a first criterion for choosing measures. For example, the
Eilenberg-Moore-algebras of the list monad are monoids -- this
implicitly played a role in the examples we considered
above. \cite{DBLP:journals/tcs/Jacobs11} shows that the algebras of
the distribution monad for probability distribution with finite
support correspond to convex sets. Interestingly, convex sets play an
important role in the theory of optimal control
\citep{bertsekas2003convex}.
\paragraph*{Measures for the list monad.} \hspace{0.1cm}
%
The knowledge that monoids are |List|-algebras suggests a generic
description of admissible measures for |M = List|:
Given a monoid |(Val, odot, b)|, we can prove that monoid
homomorphisms of the form |foldr odot b| fulfill the three conditions,
if |<+>| distributes over $\odot$ on the left. I.e. for |meas = foldr
odot b| the three conditions can be proven from
> odotNeutrRight : (l : Val) -> l `odot` neutr = l
> odotNeutrLeft : (r : Val) -> neutr `odot` r = r
> odotAssociative : (l, v, r : Val) -> l `odot` (v `odot` r) = (l `odot` v) `odot` r
> oplusOdotDistrLeft : (n, l, r : Val) -> n <+> (l `odot` r) = (n <+> l) `odot` (n <+> r)
Neutrality of |b| on the right is needed for |measPureSpec|,
while |measJoinSpec| follows from neutrality on the left and
the associativity of |odot|. The algebra morphism condition on |(v <+>)|
is provable from the distributivity of |<+>| over |odot| and again
neutrality of |b| on the right.
If moreover |odot| is monotone with respect to |<=|
> odotMon : {a, b, c, d : Val} -> a <= b -> c <= d -> (a `odot` c) <= (b `odot` d)
then we can also prove |measMonSpec| using the transitivity of |<=|.
%
The proofs are simple and can be found in the
supplementary material to this paper.
This also illustrates how the three abstract conditions follow from
more familiar algebraic properties.
\paragraph*{Mutual independence.}\hspace{0.1cm}
%
Although it does not seem surprising, it should be noted that the
three conditions are mutually independent. This can be concluded from
the counter-examples in section~\ref{subsection:exAndCounterEx}: The
sum, the modified list maximum and the arithmetic average each fail
exactly one of the three conditions.
\paragraph*{Sufficient vs.\ necessary.}\hspace{0.1cm}
%
The three conditions are sufficient to prove the extensional equality
of the functions |val| and |val'|. They are justified by their level
of generality and the fact that they hold for standard |measures| used
in control theory. However, we leave open the interesting question
whether these conditions are also necessary for the correctness of
monadic backward induction.
\paragraph*{Non-emptiness requirement.}\hspace{0.1cm}
%
Note that |mv| in the premisses of |measPlusSpec| is required to be
non-empty.
This condition makes sense: If we use again the list monad with
|Val=Nat| and |<+> = +|, it is not hard to see that for any natural
number |n| greater than 0 the equality |meas (map (n +) []) = n + meas
[]| must fail. Thus, the only way to prove the base case of
|measPlusSpec| is by contradiction with the non-emptiness premiss.
However, omitting the premiss |mv : NotEmpty| would not prevent us
from generically proving the correctness result in the next section --
it would even simplify matters as it would spare us reasoning about
preservation of non-emptiness.
But it would implicitly restrict the class of monads that can be used
to instantiate |M|. For example, we have seen above, that
|measPlusSpec| is not provable for the empty list without the
non-emptiness premiss. We would therefore need to resort to a type of
non-empty lists instead.
The price to pay for including the non-emptiness premiss is
the additional condition |nextNotEmpty| on the transition function
|next| that was already stated in section~\ref{subsection:wrap-up}.
Moreover, we have to postulate non-emptiness preservation laws for the
monad operations (appendix~\ref{appendix:monadLaws}) and to prove an
additional lemma about the preservation of non-emptiness
(appendix~\ref{appendix:lemmas}).
%
Conceptually, it might seem cleaner to omit the non-emptiness
condition: In this case, the remaining conditions would only concern
the interaction between the monad, the measure, the type of values and the
binary operation |<+>|. However, the non-emptiness preservation laws seem
less restrictive with respect to the monad. In particular, for our
above example of ordinary lists they hold (the relevant proofs can be
found in the supplementary material).
Thus we have opted for explicitly restricting the |next| function
instead of implicitly restricting the class of monads for which the
results of section~\ref{section:valval} holds.
\vspace{0.2cm}
Given the three conditions |measPureSpec|, |measJoinSpec|,
|measPlusSpec| on the measure function hold, we can prove the
extensional equality
of the functions |val| and |val'| generically. This is what we will do
in the next section.
% -*-Latex-*-
%if False
> module Introduction
%endif
% -*-Latex-*-
\section{Introduction}
\label{section:introduction}
Backward induction is a method introduced by \cite{bellman1957} that
is routinely used to solve \emph{finite-horizon sequential decision
problems (SDP)}. Such problems lie
at the core of many applications in economics, logistics,
and computer science \citep{finus+al2003, helm2003,
heitzig2012, gintis2007, botta+al2013b,
de_moor1995, de_moor1999}.
Examples include inventory, scheduling and shortest path
problems, but also the search for optimal strategies in
games~\citep{bertsekas1995, diederich01}.
%
In \citep{2017_Botta_Jansson_Ionescu}, Botta, Jansson and Ionescu
propose a generic framework for \emph{monadic} finite-horizon SDPs as
generalization of the deterministic, non-deterministic and stochastic SDPs
treated in control theory textbooks \citep{bertsekas1995,
puterman2014markov}. This framework allows to
specify such problems and to solve them with a generic version of
backward induction that we will refer to as \emph{monadic backward
induction}.
The Botta-Jansson-Ionescu-framework, subsequently referred to as
\emph{BJI-framework}, \emph{BJI-theory} or simply \emph{framework},
already includes a verification of monadic
backward induction with respect to a certain underlying \emph{value}
function (see section~\ref{subsection:solution_components}). However,
in the literature on stochastic SDPs this formulation of the function
is itself part of the backward induction algorithm and needs to be
verified against an optimization criterion, the \emph{expected total
reward}. %, see \citep[ch.~4.2]{puterman2014markov}.
This raises the question whether monadic backward induction can be
considered correct as solution method for the substantially more
general monadic SDPs.
In the present paper, we address this question and extend the
\bottaetal verification result. To this end, we put forward a formal
specification that the BJI-value function has to meet. This
specification uses an optimization criterion for monadic SDPs that is
a generic version of the expected total reward of standard control
theory textbooks.\footnote{Note that in control theory backward
induction is often referred to as \emph{the dynamic programming
algorithm} where the term \emph{dynamic programming} is used in
the original sense of \citep{bellman1957}.} We prove that the value
function of the BJI-framework meets the specification if the monadic
SDP fullfils
certain natural conditions. We discuss these conditions and express
them in category-theoretical terms using the notion of
Eilenberg-Moore-algebra.
As corollary we obtain a correctness result for monadic backward
induction that can be seen as a generic version of correctness results
for standard backward induction
like \citep[prop.~1.3.1]{bertsekas1995} and
\citep[Th.~4.5.1.c]{puterman2014markov}.
For the reader unfamiliar with SDPs, we provide a brief informal
overview and two simple examples in the next section. We recap the
BJI-framework and its (partial) verification result for monadic
backward induction in section~\ref{section:framework}.
In section~\ref{section:preparation}
we specify correctness for monadic backward induction
and the BJI-value function. We also
show that in the general monadic case the value function does not
necessarily meet the specification. To resolve this problem, we
identify conditions under which the value function does meet the
specification. These conditions are stated and discussed in
section~\ref{section:conditions}. In section~\ref{section:valval} we
prove that, given the conditions hold, the BJI-value function and monadic
backward induction are correct in the sense defined in
section~\ref{section:preparation}.
We conclude in section~\ref{section:conclusion}.
Throughout the paper we use Idris as our host language
\citep{JFP:9060502,idrisbook}. We assume some familiarity
with Haskell-like syntax and notions like \emph{functor} and
\emph{monad} as used in functional programming. We tacitly consider
types as logical statements and programs as proofs, justified by the
propositions-as-types correspondence \citep[for an accessible
introduction see][]{DBLP:journals/cacm/Wadler15}.
%
Our development is
formalized in Idris as an extension of a lightweight version of the
BJI-framework. The proofs are machine-checked and the source code is
available as supplementary material attached to this paper.
The sources of this document have been written in literal Idris and are
available at \citep{IdrisLibsValVal}, together with some example code.
All source files can be type checked with Idris version 1.3.2.
main: main.tex
latexmk -pdf main.tex
LHS2TEX = lhs2TeX
main.tex: main.lhs \
main.fmt \
Introduction.lidr \
MonadicSDP.lidr \
Framework.lidr \
Preparation.lidr \
Conditions.lidr \
Theorem.lidr \
Conclusion.lidr \
Appendix.lidr \
references.bib
${LHS2TEX} --poly main.lhs > main.almost_tex
./spacefix.bash main.almost_tex > main.tex
clean:
-rm *.pre *.vrb *.fls *.out *fdb_latexmk *.ibc
-rm main.tex \
main.pdf \
main-*.pdf \
main.ps \
main.dvi \
main.aux \
main.log \
main.toc \
main.ptb \
main.nav \
main.out \
main.snm \
main.bbl \
main.blg \
main.almost_tex \
NFAexample.tex.aux \
SchedulingExample.tex.aux
% -*-Latex-*-
%if False
> module MonadicSDP
%endif
\section{Finite-horizon Sequential Decision Problems}
\label{section:SDPs}
In deterministic, non-deterministic and stochastic finite-horizon