Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
Nicola Botta
papers
Commits
12723851
Commit
12723851
authored
Feb 23, 2021
by
Nuria Brede
Browse files
Initial.
parent
e0f847b0
Changes
22
Expand all
Hide whitespace changes
Inline
Sidebyside
2021.On the Correctness of Monadic Backward Induction/Brede,Botta2021On the Correctness of Monadic Backward Inductionrevision 202102.pdf
0 → 100644
View file @
12723851
File added
2021.On the Correctness of Monadic Backward Induction/paper/Appendix.lidr
0 → 100644
View file @
12723851
This diff is collapsed.
Click to expand it.
2021.On the Correctness of Monadic Backward Induction/paper/Conclusion.lidr
0 → 100644
View file @
12723851
% *Latex*
\section{Conclusion}
\label{section:conclusion}
We have shown that, for measures of uncertainty that fulfill three
general compatibility conditions, the monadic backward induction of the
framework for specifying and solving finitehorizon, monadic SDPs proposed
in \citep{2017_Botta_Jansson_Ionescu} is correct.
The main result has been proved via the extensional equality of
two value functions: 1) the value function of Bellman's dynamic programming
\citep{bellman1957} and optimal control theory \citep{bertsekas1995,
puterman2014markov} that is also at the core of the generic
backward induction algorithm of \citep{2017_Botta_Jansson_Ionescu} and
2) the measured total reward function that specifies the objective of
decision making in monadic SDPs: the maximization of a measure of the
sum of the rewards along the trajectories rooted at the state
associated with the first decision.
Our contribution to verified optimal decision making is twofold: On the
one hand, we have implemented a machinechecked generalization of the
semiformal results for deterministic and stochastic SDPs
discussed in \citep[Prop.~1.3.1]{bertsekas1995} and
\citep[Theorem~4.5.1.c]{puterman2014markov}.
%
As a consequence, we now have a provably correct method for solving
deterministic and stochastic sequential decision problems with their
canonical measure functions.
%
On the other hand, we have identified three general conditions that are
sufficient for the equivalence between the two value functions to
hold. The first two conditions are natural compatibility conditions
between the measure of uncertainty meas and the monadic operations
associated with the uncertainty monad M. The third condition is a
relationship between meas, the functorial map associated with M and
the rule for adding rewards <+>. All three conditions have a
straightforward categorytheoretical interpretation in terms of
EilenbergMoore algebras \citep[ch.~VI.2]{maclane}.
%
As discussed in section~\ref{subsection:abstractCond}, the three
conditions are independent and have nontrivial implications for the
measure and the addition function that cannot be derived from the
monotonicity condition on meas already imposed in
\citep{ionescu2009, 2017_Botta_Jansson_Ionescu}.
A consequence of this contribution is that we can now compute verified
solutions of stochastic sequential decision problems in which the
measure of uncertainty is different from the expected value
measure. This is important for applications in which the goal of
decision making is, for example, of maximizing the value of
worstcase outcomes.
%
To the best of our knowledge, the formulation of the compatibility
condition and the proof of the equivalence between the two value
functions are novel results.
The latter can be employed in a wider context than the one that has
motivated our study: in many practical problems in science and
engineering, the computation of optimal policies via backward induction
(let apart bruteforce or gradient methods) is simply not feasible.
%
In these problems one often still needs to generate, evaluate and
compare different policies and our result shows under which conditions
such evaluation can safely be done via the ``fast'' value function val
of standard control theory.
Finally, our contribution is an application of verified, literal
programming to optimal decision making: the sources of this document
have been written in literal Idris and are available at
\citep{IdrisLibsValVal}, where the reader can also find the bare code
and some examples. Although the development has been carried out in
Idris, it should be readily reproducible in other implementations of
type theory like Agda or Coq.
2021.On the Correctness of Monadic Backward Induction/paper/Conditions.lidr
0 → 100644
View file @
12723851
% *Latex*
%if False
> module Conditions
> import Framework
> %default total
> %auto_implicits off
> %access public export
> Q : Type
> Prob : Type > Type
> supp : {A : Type} > Prob A > List A
> prob : {A : Type} > Prob A > A > Q
> neutr : Val
>
> odot : Val > Val > Val
%endif
\section{Correctness conditions}
\label{section:conditions}
%\subsection{Conditions on measures}
%\label{subsection:measConditions}
We now formulate three conditions on measure functions that imply the
extensional equality of val and val':
% measPureSpec
\noindent The measure needs to be leftinverse to pure:
\footnote{The symbol `ExtEq` denotes \emph{extensional} equality,
see appendix~\ref{appendix:monadLaws}}
< (1) measPureSpec : meas . pure `ExtEq` id
\begin{center}
\begin{tikzcd}[column sep=2cm]
%
Val \arrow[d, "pure", swap] \arrow[dr, "id"] \\
M Val \arrow[r, "meas", swap] & Val\\
%
\end{tikzcd}
\end{center}
% measJoinSpec
\noindent Applying the measure after join needs to be extensionally
equal to applying it after map meas:
< (2) measJoinSpec : meas . join `ExtEq` meas . map meas
\begin{center}
\begin{tikzcd}[column sep=3cm]
%
M (M Val) \arrow[r, "map\ meas"] \arrow[d, "join", swap] & M Val \arrow[d, "meas" ] \\
M Val \arrow[r, "meas", swap] & Val\\
%
\end{tikzcd}
\end{center}
% measPlusSpec
\noindent For arbitrary v: Val and nonempty mv: M Val applying
the measure after mapping (v <+>) onto mv needs to be equal to
applying (v <+>) after the measure:
< (3) measPlusSpec : (v : Val) > (mv : M Val) > (NotEmpty mv) >
< (meas . map (v <+>)) mv = ((v <+>) . meas) mv
\begin{center}
\begin{tikzcd}[column sep=3cm]
%
M Val \arrow[r, "map (v\, \oplus)"] \arrow[d, "meas", swap] & M Val \arrow[d, "meas" ] \\
Val \arrow[r, "(v\, \oplus)", swap] & Val\\
%
\end{tikzcd}
\vspace{0.1cm}
\end{center}
Essentially, these conditions assure that the measure is wellbehaved
relative to the monad structure and the <+>operation.
\subsection{Examples and counterexamples}
\label{subsection:exAndCounterEx}
%
To get a better intuition, let us consider a few measures that do or
do not fulfill the three conditions.
%
Simple examples of admissible measures are the minimum (minList as
defined in fig.~\ref{fig:example1Formal}) or maximum (maxList = foldr
`maximum` 0) of a list for M = List with Nat as type of values
and ordinary addition as <+>. It is straightforward to prove that
the conditions hold for these two measures and the proofs for
maxList are included in the supplementary material.
As to counterexamples, let's take another look at our counterexample
from the last section, the arithmetic sum of a list. It does fulfill
measPureSpec and measJoinSpec, the first by definition, the second
by structural induction using the associativity of + (the list
monads' join is concat). But it fails to fulfill
measPlusSpec. The premiss nonEmpty mv (see appendix
\ref{appendix:monadLaws} and next subsection) tells us that the list
must not be []  otherwise we can prove the equality from a
contradiction. But if the list has the form a :: as we would have to
show the equality:
< (sum . map (v +)) (a :: as) = ((v +) . sum) (a :: as)
Clearly, if v != 0 and as != [] this equality cannot hold.
This is why in the last section the equality of val and val'
failed for meas = sum.
A similar failure would arise if we chose meas = foldr (*) 1 instead,
as + does not distribute over *. But if we turned the situation
around by setting <+> = * and meas = sum, the condition
measPlusSpec would hold thanks to the usual arithmetic
distributivity law for * over +.
What about the other conditions? We remain in the setting of example~1
as above, and just vary the measure. Using a somewhat contrived
variation of maxList
< meas = foldr (\x, v => (x + 1 `maximum` v)) 0
it suffices to consider for an arbitrary n : Nat that
< (meas . pure) n = meas [n] = (n + 1) `maximum` 0 = n + 1 != n = id n
to see that now the condition measPureSpec fails.
%
To exhibit a measure that fails the condition measJoinSpec, we
switch to Double as type of values (still with addition as binary
operator) and the arithmetic average as measure meas = avg. Taking
a list of lists of different lengths like [[1], [2, 3]] we have
< meas (join [[1], [2, 3]]) = avg [1, 2, 3] = 2
< !=
< meas (map meas [[1], [2, 3]]) = avg [1, 2.5] = 1.75
All of the measures considered in this subsection do fulfill the
measMonSpec condition imposed by the BJItheory. This raises the
question how previously admissible measures are impacted by adding the
three new conditions to the framework.
\subsection{Impact on previously admissible measures}
\label{subsection:impactMeas}
%
As we have seen in section~\ref{subsection:solution_components}, the
BJIframework already requires measures to fulfill the monotonicity
condition
< measMonSpec : {A : Type} > (f, g : A > Val) > ((a : A) > (f a) <= (g a)) >
< (ma : M A) > meas (map f ma) <= meas (map g ma)
\bottaetal show that the arithmetic average (for M = List), the
worstcase measure (for M = List and for M = Prob) and the
expected value measure (for M = Prob) all fulfill measMonSpec. Thus, a
natural question is whether these measures also fulfill the three
additional requirements.
\paragraph*{Expected value measure.} \hspace{0.1cm}
%
Most applications of backward induction aggregate possible rewards with the
expected value measure. In a nutshell, for a numerical type Q, the
expected value of a probability distribution on Q is
> expVal : Num Q => Prob Q > Q
> expVal spq = sum [q * prob spq q  q < supp spq]
where prob and supp are generic functions that encode the notion of
\emph{probability} and of \emph{support} associated with a finite
probability distribution:
< prob : {A : Type} > Prob A > A > Q
<
< supp : {A : Type} > Prob A > List A
For spa and a of suitable types, prob spa a represents the
probability of a according to spa. Similarly, supp spa returns a
list of those values whose probability is not zero in spa.
%
The probability function prob has to fulfill the axioms of
probability theory. In particular,
< sum [prob spa a  a < supp spa] = 1
This condition implies that probability distributions cannot be empty, a
precondition of measPlusSpec. Putting forward minimal specifications
for prob and supp is not completely trivial but if the +operation
associated with Q is commutative and associative, if *
distributes over + and if the map and join associated with
Prob  for f, a, b, spa and spspa of suitable types 
fulfill the conservation law
< prob (map f spa) b = sum [prob spa a  a < supp spa, f a == b]
and the total probability law
< prob (join spspa) a = sum [prob spa a * prob spspa spa  spa < supp spspa]
then the expected value measure fulfills measPureSpec, measJoinSpec and
measPlusSpec. This is not surprising as it is standard
to apply backward induction to SDPs with the expected value of possible
rewards as measure.
\paragraph*{Average and arithmetic sum.} \hspace{0.1cm}
%
As can already be concluded
from the corresponding counterexamples in the previous subsection,
neither the plain arithmetic average nor the arithmetic sum are
suited as measure when using the standard monad structure on
List to represent nondeterministic
uncertainty. However, the arithmetic average can be used as measure,
if List is endowed with another monad structure, corresponding to a
probability monad for uniform distributions.
\paragraph*{Worstcase measures.} \hspace{0.1cm}
%
In many important applications in
climate impact research but also in portfolio management and sports,
decisions are taken as to minimize the consequences of worst case
outcomes. Depending on how ``worse'' is defined, the corresponding
measures might pick the maximum or
minimum from an Mstructure of values. In the previous subsection we
considered an example in which the monad was List, the operation
<+> plain addition together with either maxList or minList as
measure. And indeed we can prove that for both measures the three
requirements hold (the proofs for maxList can be found in the
supplementary material). This gives us a notion of worstcase measure
that is admissible for monadic backward induction.
\vspace{0.2cm}
We can thus conclude that the new requirements hold for certain
familiar measures, but also that they have nontrivial consequences on
measures that can be used in the BJIframework.
\subsection{The measure conditions from an abstract perspective}
\label{subsection:abstractCond}
%
Now that we have seen what the three conditions mean for concrete examples,
we can consider them from a more abstract point of view.
\paragraph*{Categorytheoretical perspective.}\hspace{0.1cm}
Readers familiar with the theory of monads might have recognized that the
first two conditions ensure that meas is the structure map of a
monad algebra for M on Val and thus the pair (Val, meas) is an
object of
the EilenbergMoore category associated with the monad M. The third
condition requires the map (v <+>) to be an Malgebra homomorphism
 a structure preserving map  for arbitrary values v.
This perspective allows us to use existing knowledge about monad
algebras as a first criterion for choosing measures. For example, the
EilenbergMoorealgebras of the list monad are monoids  this
implicitly played a role in the examples we considered
above. \cite{DBLP:journals/tcs/Jacobs11} shows that the algebras of
the distribution monad for probability distribution with finite
support correspond to convex sets. Interestingly, convex sets play an
important role in the theory of optimal control
\citep{bertsekas2003convex}.
\paragraph*{Measures for the list monad.} \hspace{0.1cm}
%
The knowledge that monoids are Listalgebras suggests a generic
description of admissible measures for M = List:
Given a monoid (Val, odot, b), we can prove that monoid
homomorphisms of the form foldr odot b fulfill the three conditions,
if <+> distributes over $\odot$ on the left. I.e. for meas = foldr
odot b the three conditions can be proven from
> odotNeutrRight : (l : Val) > l `odot` neutr = l
> odotNeutrLeft : (r : Val) > neutr `odot` r = r
> odotAssociative : (l, v, r : Val) > l `odot` (v `odot` r) = (l `odot` v) `odot` r
> oplusOdotDistrLeft : (n, l, r : Val) > n <+> (l `odot` r) = (n <+> l) `odot` (n <+> r)
Neutrality of b on the right is needed for measPureSpec,
while measJoinSpec follows from neutrality on the left and
the associativity of odot. The algebra morphism condition on (v <+>)
is provable from the distributivity of <+> over odot and again
neutrality of b on the right.
If moreover odot is monotone with respect to <=
> odotMon : {a, b, c, d : Val} > a <= b > c <= d > (a `odot` c) <= (b `odot` d)
then we can also prove measMonSpec using the transitivity of <=.
%
The proofs are simple and can be found in the
supplementary material to this paper.
This also illustrates how the three abstract conditions follow from
more familiar algebraic properties.
\paragraph*{Mutual independence.}\hspace{0.1cm}
%
Although it does not seem surprising, it should be noted that the
three conditions are mutually independent. This can be concluded from
the counterexamples in section~\ref{subsection:exAndCounterEx}: The
sum, the modified list maximum and the arithmetic average each fail
exactly one of the three conditions.
\paragraph*{Sufficient vs.\ necessary.}\hspace{0.1cm}
%
The three conditions are sufficient to prove the extensional equality
of the functions val and val'. They are justified by their level
of generality and the fact that they hold for standard measures used
in control theory. However, we leave open the interesting question
whether these conditions are also necessary for the correctness of
monadic backward induction.
\paragraph*{Nonemptiness requirement.}\hspace{0.1cm}
%
Note that mv in the premisses of measPlusSpec is required to be
nonempty.
This condition makes sense: If we use again the list monad with
Val=Nat and <+> = +, it is not hard to see that for any natural
number n greater than 0 the equality meas (map (n +) []) = n + meas
[] must fail. Thus, the only way to prove the base case of
measPlusSpec is by contradiction with the nonemptiness premiss.
However, omitting the premiss mv : NotEmpty would not prevent us
from generically proving the correctness result in the next section 
it would even simplify matters as it would spare us reasoning about
preservation of nonemptiness.
But it would implicitly restrict the class of monads that can be used
to instantiate M. For example, we have seen above, that
measPlusSpec is not provable for the empty list without the
nonemptiness premiss. We would therefore need to resort to a type of
nonempty lists instead.
The price to pay for including the nonemptiness premiss is
the additional condition nextNotEmpty on the transition function
next that was already stated in section~\ref{subsection:wrapup}.
Moreover, we have to postulate nonemptiness preservation laws for the
monad operations (appendix~\ref{appendix:monadLaws}) and to prove an
additional lemma about the preservation of nonemptiness
(appendix~\ref{appendix:lemmas}).
%
Conceptually, it might seem cleaner to omit the nonemptiness
condition: In this case, the remaining conditions would only concern
the interaction between the monad, the measure, the type of values and the
binary operation <+>. However, the nonemptiness preservation laws seem
less restrictive with respect to the monad. In particular, for our
above example of ordinary lists they hold (the relevant proofs can be
found in the supplementary material).
Thus we have opted for explicitly restricting the next function
instead of implicitly restricting the class of monads for which the
results of section~\ref{section:valval} holds.
\vspace{0.2cm}
Given the three conditions measPureSpec, measJoinSpec,
measPlusSpec on the measure function hold, we can prove the
extensional equality
of the functions val and val' generically. This is what we will do
in the next section.
2021.On the Correctness of Monadic Backward Induction/paper/Framework.lidr
0 → 100644
View file @
12723851
This diff is collapsed.
Click to expand it.
2021.On the Correctness of Monadic Backward Induction/paper/Introduction.lidr
0 → 100644
View file @
12723851
% *Latex*
%if False
> module Introduction
%endif
% *Latex*
\section{Introduction}
\label{section:introduction}
Backward induction is a method introduced by \cite{bellman1957} that
is routinely used to solve \emph{finitehorizon sequential decision
problems (SDP)}. Such problems lie
at the core of many applications in economics, logistics,
and computer science \citep{finus+al2003, helm2003,
heitzig2012, gintis2007, botta+al2013b,
de_moor1995, de_moor1999}.
Examples include inventory, scheduling and shortest path
problems, but also the search for optimal strategies in
games~\citep{bertsekas1995, diederich01}.
%
In \citep{2017_Botta_Jansson_Ionescu}, Botta, Jansson and Ionescu
propose a generic framework for \emph{monadic} finitehorizon SDPs as
generalization of the deterministic, nondeterministic and stochastic SDPs
treated in control theory textbooks \citep{bertsekas1995,
puterman2014markov}. This framework allows to
specify such problems and to solve them with a generic version of
backward induction that we will refer to as \emph{monadic backward
induction}.
The BottaJanssonIonescuframework, subsequently referred to as
\emph{BJIframework}, \emph{BJItheory} or simply \emph{framework},
already includes a verification of monadic
backward induction with respect to a certain underlying \emph{value}
function (see section~\ref{subsection:solution_components}). However,
in the literature on stochastic SDPs this formulation of the function
is itself part of the backward induction algorithm and needs to be
verified against an optimization criterion, the \emph{expected total
reward}. %, see \citep[ch.~4.2]{puterman2014markov}.
This raises the question whether monadic backward induction can be
considered correct as solution method for the substantially more
general monadic SDPs.
In the present paper, we address this question and extend the
\bottaetal verification result. To this end, we put forward a formal
specification that the BJIvalue function has to meet. This
specification uses an optimization criterion for monadic SDPs that is
a generic version of the expected total reward of standard control
theory textbooks.\footnote{Note that in control theory backward
induction is often referred to as \emph{the dynamic programming
algorithm} where the term \emph{dynamic programming} is used in
the original sense of \citep{bellman1957}.} We prove that the value
function of the BJIframework meets the specification if the monadic
SDP fullfils
certain natural conditions. We discuss these conditions and express
them in categorytheoretical terms using the notion of
EilenbergMoorealgebra.
As corollary we obtain a correctness result for monadic backward
induction that can be seen as a generic version of correctness results
for standard backward induction
like \citep[prop.~1.3.1]{bertsekas1995} and
\citep[Th.~4.5.1.c]{puterman2014markov}.
For the reader unfamiliar with SDPs, we provide a brief informal
overview and two simple examples in the next section. We recap the
BJIframework and its (partial) verification result for monadic
backward induction in section~\ref{section:framework}.
In section~\ref{section:preparation}
we specify correctness for monadic backward induction
and the BJIvalue function. We also
show that in the general monadic case the value function does not
necessarily meet the specification. To resolve this problem, we
identify conditions under which the value function does meet the
specification. These conditions are stated and discussed in
section~\ref{section:conditions}. In section~\ref{section:valval} we
prove that, given the conditions hold, the BJIvalue function and monadic
backward induction are correct in the sense defined in
section~\ref{section:preparation}.
We conclude in section~\ref{section:conclusion}.
Throughout the paper we use Idris as our host language
\citep{JFP:9060502,idrisbook}. We assume some familiarity
with Haskelllike syntax and notions like \emph{functor} and
\emph{monad} as used in functional programming. We tacitly consider
types as logical statements and programs as proofs, justified by the
propositionsastypes correspondence \citep[for an accessible
introduction see][]{DBLP:journals/cacm/Wadler15}.
%
Our development is
formalized in Idris as an extension of a lightweight version of the
BJIframework. The proofs are machinechecked and the source code is
available as supplementary material attached to this paper.
The sources of this document have been written in literal Idris and are
available at \citep{IdrisLibsValVal}, together with some example code.
All source files can be type checked with Idris version 1.3.2.
2021.On the Correctness of Monadic Backward Induction/paper/Makefile
0 → 100755
View file @
12723851
main
:
main.tex
latexmk
pdf
main.tex
LHS2TEX
=
lhs2TeX
main.tex
:
main.lhs
\
main.fmt
\
Introduction.lidr
\
MonadicSDP.lidr
\
Framework.lidr
\
Preparation.lidr
\
Conditions.lidr
\
Theorem.lidr
\
Conclusion.lidr
\
Appendix.lidr
\
references.bib
${LHS2TEX}
poly
main.lhs
>
main.almost_tex
./spacefix.bash main.almost_tex
>
main.tex
clean
:

rm
*
.pre
*
.vrb
*
.fls
*
.out
*
fdb_latexmk
*
.ibc

rm
main.tex
\
main.pdf
\
main
*
.pdf
\
main.ps
\
main.dvi
\
main.aux
\
main.log
\
main.toc
\
main.ptb
\
main.nav
\
main.out
\
main.snm
\
main.bbl
\
main.blg
\
main.almost_tex
\
NFAexample.tex.aux
\
SchedulingExample.tex.aux
2021.On the Correctness of Monadic Backward Induction/paper/MonadicSDP.lidr
0 → 100644
View file @
12723851
% *Latex*
%if False
> module MonadicSDP
%endif
\section{Finitehorizon Sequential Decision Problems}
\label{section:SDPs}
In deterministic, nondeterministic and stochastic finitehorizon
SDPs, a decision maker seeks
to control the evolution of a \emph{(dynamical) system} at a finite number of
\emph{decision steps} by selecting certain \emph{controls} in sequence,
one after the other. The controls
available to the decision maker at a given decision step typically
depend on the \emph{state} of the system at that step.
In \emph{deterministic} problems, selecting a control in a state at