Introduction.lidr 6.94 KB
Newer Older
Nuria Brede's avatar
Nuria Brede committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
% -*-Latex-*-

%if False

> module Introduction

%endif


% -*-Latex-*-
\section{Introduction}
\label{section:introduction}


Backward induction is a method introduced by \cite{bellman1957} that
is routinely used to solve \emph{finite-horizon sequential decision
  problems (SDP)}. Such problems lie
at the core of many applications in economics, logistics,
and computer science \citep{finus+al2003, helm2003,
  heitzig2012, gintis2007, botta+al2013b,
  de_moor1995, de_moor1999}.
Examples include inventory, scheduling and shortest path
problems, but also the search for optimal strategies in
games~\citep{bertsekas1995, diederich01}.
%

Nuria Brede's avatar
Nuria Brede committed
27
Botta, Jansson and Ionescu (\citeyear{2017_Botta_Jansson_Ionescu})
Nuria Brede's avatar
Nuria Brede committed
28
propose a generic framework for \emph{monadic} finite-horizon SDPs as
Nuria Brede's avatar
Nuria Brede committed
29
generalisation of the deterministic, non-deterministic and stochastic SDPs
Nuria Brede's avatar
Nuria Brede committed
30
31
32
33
34
35
36
37
38
39
treated in control theory textbooks \citep{bertsekas1995,
  puterman2014markov}. This framework allows to
specify such problems and to solve them with a generic version of
backward induction that we will refer to as \emph{monadic backward
  induction}.

The Botta-Jansson-Ionescu-framework, subsequently referred to as
\emph{BJI-framework}, \emph{BJI-theory} or simply \emph{framework},
already includes a verification of monadic
backward induction with respect to a certain underlying \emph{value}
Nuria Brede's avatar
Nuria Brede committed
40
function (see Sec.~\ref{subsection:solution_components}). However,
Nuria Brede's avatar
Nuria Brede committed
41
42
in the literature on stochastic SDPs this formulation of the function
is itself part of the backward induction algorithm and needs to be
Nuria Brede's avatar
Nuria Brede committed
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
verified against an optimisation criterion, the \emph{expected total
  reward} \citep[Ch.~4.2]{puterman2014markov}.
%
For stochastic SDPs semi-formal proofs can be found in textbooks -- but
monadic SDPs are substantially more general than the stochastic SDPs
for which these results are established.
This observation raises a number of questions:
%
\begin{itemize}
  % 
\item What exactly should ``correctness'' mean for a solution of
  monadic SDPs?
  % 
\item Does monadic backward induction provide
  correct solutions in this sense for monadic SDPs in their full
  generality?
  % 
\item And if not, is there a class of monadic SDPs for which
  monadic backward induction does provide provably correct
  solutions?
  % 
\end{itemize}
%
In the present paper we address these questions and make the
following contributions to answering them:
%
\begin{itemize}
%
\item We put forward a formal specification that monadic backward
  induction should meet in order
  to be considered ``correct'' as solution method for monadic SDPs.
  This specification uses an optimisation criterion that is
  a generic version of the \emph{expected total reward} of standard control
  theory textbooks.\footnote{Note that in control theory
  backward  
Nuria Brede's avatar
Nuria Brede committed
78
79
  induction is often referred to as \emph{the dynamic programming
    algorithm} where the term \emph{dynamic programming} is used in
Nuria Brede's avatar
Nuria Brede committed
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
  the original sense of \cite{bellman1957}.} In analogy, we call this
  criterion \emph{measured total reward}.
  %
\item We consider the value function underlying monadic backward
  induction as ``correct'' if it computes the \emph{measured total reward}.
  %
\item If the value function is correct, monadic backward induction can
  be proven to be correct in our sense by extending the result of
  \cite {2017_Botta_Jansson_Ionescu}.
  However, we show that this is not necessarily the case, i.e.
  the value function does not compute the \emph{measured total reward}
  for arbitrary monadic SDPs.
%
\item We therefore formulate conditions that identify a class of monadic SDPs
  for which the value function and thus monadic backward induction
  can be shown to be correct. The conditions are fairly simple and allow for
  a neat description in category-theoretical terms using the notion of
  Eilenberg-Moore-algebra.
%
\item We give a formalised proof that monadic backward induction fulfils
  the correctness criterion if the conditions hold. This correctness result can
  be seen as a generic version of correctness results for standard backward induction
like \citep[Prop.~1.3.1]{bertsekas1995} and
Nuria Brede's avatar
Nuria Brede committed
103
\citep[Th.~4.5.1.c]{puterman2014markov}.
Nuria Brede's avatar
Nuria Brede committed
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
  % 
\end{itemize}

Our results rule out the application of backward induction to certain monadic SDPs that
were previously considered as admissible in the BJI-framework. Thus, they complement
the verification result of \bottaetal and provide a necessary clarification. 
Still, the new conditions are simple enough to be checked for
non-standard instantiations of the framework. This
allows to broaden the applicability of backward induction to settings
which are not commonly discussed in the literature and to obtain a
formalised proof of correctness with considerably less effort.
It is worth stressing that our conditions can
be useful for anyone interested in applying monadic backward induction in
non-standard situations -- completely independent of the BJI-framework. 

Finally, the value function underlying monadic backward induction is
also interesting in itself. Given the conditions hold, it can be used to compute
the measured total reward efficiently, using a method reminiscent of a \emph{thinning}
algorithm \cite[Ch.~10]{adwh}.
Nuria Brede's avatar
Nuria Brede committed
123
124
125
126

For the reader unfamiliar with SDPs, we provide a brief informal
overview and two simple examples in the next section. We recap the
BJI-framework and its (partial) verification result for monadic
Nuria Brede's avatar
Nuria Brede committed
127
128
backward induction in Sec.~\ref{section:framework}.
In Sec.~\ref{section:preparation}
Nuria Brede's avatar
Nuria Brede committed
129
130
131
132
133
we specify correctness for monadic backward induction
and the BJI-value function. We also
show that in the general monadic case the value function does not
necessarily meet the specification. To resolve this problem, we
identify conditions under which the value function does meet the
Nuria Brede's avatar
Nuria Brede committed
134
135
specification. These conditions are stated and analysed in
Sec.~\ref{section:conditions}. In Sec.~\ref{section:valval} we
Nuria Brede's avatar
Nuria Brede committed
136
137
prove that, given the conditions hold, the BJI-value function and monadic
backward induction are correct in the sense defined in
Nuria Brede's avatar
Nuria Brede committed
138
139
140
Sec.~\ref{section:preparation}. We discuss the conditions from a more
abstract perspective in Sec.~\ref{section:discussion} and 
conclude in Sec.~\ref{section:conclusion}.
Nuria Brede's avatar
Nuria Brede committed
141
142
143
144
145
146
147
148
149

Throughout the paper we use Idris as our host language
\citep{JFP:9060502,idrisbook}. We assume some familiarity
with Haskell-like syntax and notions like \emph{functor} and
\emph{monad} as used in functional programming. We tacitly consider
types as logical statements and programs as proofs, justified by the
propositions-as-types correspondence \citep[for an accessible
introduction see][]{DBLP:journals/cacm/Wadler15}.
%
Nuria Brede's avatar
Nuria Brede committed
150
\paragraph*{Source code.} \hspace{0.1cm}
Nuria Brede's avatar
Nuria Brede committed
151
Our development is
Nuria Brede's avatar
Nuria Brede committed
152
formalised in Idris as an extension of a lightweight version of the
Nuria Brede's avatar
Nuria Brede committed
153
154
155
156
BJI-framework. The proofs are machine-checked and the source code is
available as supplementary material attached to this paper.
The sources of this document have been written in literal Idris and are
available at \citep{IdrisLibsValVal}, together with some example code.
Nuria Brede's avatar
Nuria Brede committed
157
158
All source files can be type checked with Idris~1.3.2.

Nicola Botta's avatar
Final.    
Nicola Botta committed
159
160
%\vfill
%\pagebreak