Commit 001d4f0b authored by Nicola Botta's avatar Nicola Botta
Browse files

Final.

parent c1f2186a
......@@ -621,7 +621,7 @@ Optimal extensions:
%endif
Together with the result of \bottaetal (|biOptVal|, see
appendix~\ref{appendix:bilemma} below) we can prove the correctness
Appendix~\ref{appendix:bilemma} below) we can prove the correctness
of monadic backward induction as corollary, using a
generalised optimality of policy sequences predicate:
......
......@@ -44,18 +44,19 @@ extensional equality of |val| and |val'|:
\item[{\bf Condition 1.}]
The measure needs to be left-inverse to |pure|:
\footnote{The symbol |`ExtEq`| denotes \emph{extensional} equality,
see appendix~\ref{appendix:monadLaws}}
see Appendix~\ref{appendix:monadLaws}}
< measPureSpec : meas . pure `ExtEq` id
\begin{center}
\begin{tikzcd}[column sep=2cm]
%
Val \arrow[d, "pure", swap] \arrow[dr, "id"] \\
M Val \arrow[r, "meas", swap] & Val\\
%
\end{tikzcd}
\includegraphics{img/diag1.eps}
% \begin{tikzcd}[column sep=2cm]
% %
% Val \arrow[d, "pure", swap] \arrow[dr, "id"] \\
% M Val \arrow[r, "meas", swap] & Val\\
% %
% \end{tikzcd}
\end{center}
% measJoinSpec
......@@ -67,12 +68,13 @@ equal to applying it after |map meas|:
< measJoinSpec : meas . join `ExtEq` meas . map meas
\begin{center}
\begin{tikzcd}[column sep=3cm]
%
M (M Val) \arrow[r, "map\ meas"] \arrow[d, "join", swap] & M Val \arrow[d, "meas" ] \\
M Val \arrow[r, "meas", swap] & Val\\
%
\end{tikzcd}
\includegraphics{img/diag2.eps}
% \begin{tikzcd}[column sep=3cm]
% %
% M (M Val) \arrow[r, "map\ meas"] \arrow[d, "join", swap] & M Val \arrow[d, "meas" ] \\
% M Val \arrow[r, "meas", swap] & Val\\
% %
% \end{tikzcd}
\end{center}
% measPlusSpec
......@@ -87,14 +89,15 @@ applying |(v <+>)| after the measure:
< (meas . map (v <+>)) mv = ((v <+>) . meas) mv
\begin{center}
\begin{tikzcd}[column sep=3cm]
%
M Val \arrow[r, "map (v\, \oplus)"] \arrow[d, "meas", swap] & M Val \arrow[d, "meas" ] \\
Val \arrow[r, "(v\, \oplus)", swap] & Val\\
\includegraphics{img/diag3.eps}
% \begin{tikzcd}[column sep=3cm]
% %
% M Val \arrow[r, "map (v\, \oplus)"] \arrow[d, "meas", swap] & M Val \arrow[d, "meas" ] \\
% Val \arrow[r, "(v\, \oplus)", swap] & Val\\
% %
% \end{tikzcd}
%
\end{tikzcd}
\vspace{0.1cm}
%\vspace{0.1cm}
\end{center}
\end{itemize}
......@@ -146,15 +149,19 @@ is neutral for |*|.
The second and the third condition require some arithmetic reasoning, so
let us just consider them for two examples.
Say we have distributions
Let |a, b, c, d| be variables of type |Double| and say we have distributions
%if False
> parameters (a, b, c, d, e : Double)
> parameters (a, b, c, d : Double)
%endif
> dps1 : Dist Double
> dps1 = [(a, 0.5), (b, 0.3), (c, 0.2)]
> dps2 : Dist Double
> dps2 = [(d, 0.4), (e, 0.6)]
> dps2 = [(a, 0.4), (d, 0.6)]
> dpdps : Dist (Dist Double)
> dpdps = [(dps1, 0.1), (dps2, 0.9)]
......@@ -170,13 +177,13 @@ addition and multiplication:
< (expected . distJoin) dpdps =
<
< expected [(a, 0.5 * 0.1), (b, 0.3 * 0.1), (c, 0.2 * 0.1), (d, 0.4 * 0.9), (e, 0.6 * 0.9)] =
< expected [(a, 0.5 * 0.1), (b, 0.3 * 0.1), (c, 0.2 * 0.1), (a, 0.4 * 0.9), (d, 0.6 * 0.9)] =
<
< (a * 0.5 * 0.1) + (b * 0.3 * 0.1) + (c * 0.2 * 0.1) + (d * 0.4 * 0.9) + (e * 0.6 * 0.9) =
< (a * 0.5 * 0.1) + (b * 0.3 * 0.1) + (c * 0.2 * 0.1) + (a * 0.4 * 0.9) + (d * 0.6 * 0.9) =
<
< ((a * 0.5 + b * 0.3 + c * 0.2) * 0.1 + (d * 0.4 + e * 0.6) * 0.9 =
< ((a * 0.5 + b * 0.3 + c * 0.2) * 0.1 + (a * 0.4 + d * 0.6) * 0.9 =
<
< expected [(a * 0.5 + b * 0.3 + c * 0.2, 0.1) , (d * 0.4 + e * 0.6, 0.9)] =
< expected [(a * 0.5 + b * 0.3 + c * 0.2, 0.1) , (a * 0.4 + d * 0.6, 0.9)] =
<
< (expected . distMap expected) dpdps
......
......@@ -120,9 +120,9 @@ The price to pay for including the non-emptiness premise is
the additional condition |nextNotEmpty| on the transition function
|next| that was already stated in Sec.~\ref{subsection:wrap-up}.
Moreover, we have to postulate non-emptiness preservation laws for the
monad operations (appendix~\ref{appendix:monadLaws}) and to prove an
monad operations (Appendix~\ref{appendix:monadLaws}) and to prove an
additional lemma about the preservation of non-emptiness
(appendix~\ref{appendix:lemmas}).
(Appendix~\ref{appendix:lemmas}).
%
Conceptually, it might seem cleaner to omit the non-emptiness
condition: In this case, the remaining conditions would only concern
......
......@@ -43,7 +43,8 @@ control theory for finite-horizon, discrete-time SDPs. It extends
mathematical formulations for stochastic
SDPs \citep{bertsekas1995, bertsekasShreve96, puterman2014markov}
to the general problem of optimal decision making under \emph{monadic}
uncertainty.\\
uncertainty.
For monadic SDPs, the framework provides a
generic implementation of backward induction. It has been applied to
study the impact of uncertainties on optimal emission policies
......@@ -156,7 +157,8 @@ Note that for deterministic problems it is unnecessary to parameterise
the reward function over the next state as it is unique and can thus be
obtained from the current state and control. But for non-deterministic
problems it is useful to be able to assign rewards depending on the
(uncertain) outcome of a transition.\\
(uncertain) outcome of a transition.
A few remarks are at place here.
\begin{itemize}
......@@ -477,7 +479,7 @@ Like the reference value |zero| discussed above, |plusMonSpec| and
|measMonSpec| are specification components of the BJI-framework that
we have not discussed in Sec.~\ref{subsection:specification_components}.
%
We provide a proof of |Bellman| in appendix~\ref{appendix:Bellman}. As one
We provide a proof of |Bellman| in Appendix~\ref{appendix:Bellman}. As one
would expect, the proof makes essential use of the recursive definition of
the function |val| discussed above.
%
......@@ -597,7 +599,7 @@ orthogonal to the purpose of the current paper.
For the same reason we have not addressed the question of how to make |bi|
more efficient by tabulation.
We briefly discuss the specification and implementation of optimal extensions
in the BJI-framework in appendix~\ref{appendix:optimal_extension}.
in the BJI-framework in Appendix~\ref{appendix:optimal_extension}.
We refer the reader interested in tabulation of |bi| to
\href{https://gitlab.pik-potsdam.de/botta/IdrisLibs/-/blob/master/SequentialDecisionProblems/TabBackwardsInduction.lidr}
{SequentialDecisionProblems.TabBackwardsInduction}
......
......@@ -156,5 +156,5 @@ The sources of this document have been written in literal Idris and are
available at \citep{IdrisLibsValVal}, together with some example code.
All source files can be type checked with Idris~1.3.2.
\vfill
\pagebreak
\ No newline at end of file
%\vfill
%\pagebreak
\ No newline at end of file
main: main.tex
latexmk -pdf main.tex
final: main
latex main.tex
dvips main.dvi
ps2pdf main.ps
LHS2TEX = lhs2TeX
......
......@@ -58,8 +58,8 @@ on total rewards which is usually aggregated using the familiar
\emph{expected value} measure. The value thus obtained is called the
\emph{expected total reward} \citep[ch.~4.1.2]{puterman2014markov} and
its role is central: It is the quantity that is to be optimised in
an SDP.\\
%
an SDP.
In monadic SDPs, the measure is generic, i.e. it is not fixed in advance
but has to be given as part of the specification of a concrete problem.
Therefore we will generalise the notion of \emph{expected total reward} to
......@@ -76,8 +76,8 @@ Similarly, we define that \emph{solving a monadic SDP} consists in
This means that when starting from any initial state at decision step
|t|, following the computed list of rules for selecting controls will
result in a value that is maximal as measure of the sum of rewards
along all possible trajectories rooted in that initial state.\\
%
along all possible trajectories rooted in that initial state.
Equivalently, rewards can instead be considered as \emph{costs}
that need to be \emph{minimised}. This dual perspective is taken e.g.
in \citep{bertsekas1995}. In the subsequent sections we will follow
......@@ -141,22 +141,27 @@ Fig.~\ref{fig:examplesGraph}.
\begin{subfigure}[b]{.35\textwidth-0.55cm}
\centering
\scalebox{0.7}{
%\scalebox{0.7}{
\small
\input{NFAexample.tex}
\includegraphics[height=0.26\textheight]{img/fig1a.eps}
% \small
% \input{NFAexample.tex}
}
%}
\caption{Example~1}
\label{fig:example1}
\end{subfigure}
\begin{subfigure}[b]{.65\textwidth}
\centering
\scalebox{0.7}{
%\scalebox{0.7}{
\input{SchedulingExample.tex}
\vspace{0.3cm}
}
\includegraphics[height=0.25\textheight]{img/fig1b.eps}
% \input{SchedulingExample.tex}
%\vspace{0.3cm}
% }
\caption{Example~2}
\label{fig:example2}
......@@ -218,8 +223,8 @@ these lines we refer the interested reader to~\citep{esd-9-525-2018}.
Scheduling
problems serve as canonical examples in control theory textbooks. The
one we present here is a slightly modified version of
\citep[Example~1.1.2]{bertsekas1995}.\\
%
\citep[Example~1.1.2]{bertsekas1995}.
Think of some machine in a factory that can perform different
operations, say $A, B, C$ and $D$. Each of these operations is supposed
to be performed once. The machine can only perform one operation at a
......
......@@ -71,8 +71,8 @@ starting at that state:
where we use |StateCtrlSeq| as type of trajectories. Essentially it is
a non-empty list of (dependent) state/control pairs, with the exception of the base case
which is a singleton just containing the last state reached.\\
%
which is a singleton just containing the last state reached.
Furthermore, we can compute the \emph{total reward} for a single
trajectory, i.e. its sum of rewards:
......@@ -114,7 +114,7 @@ of the sum of rewards along all possible trajectories of length |n|
that are rooted in an initial state at step |t|.
Again by analogy to the stochastic case, we define monadic
backward induction to be correct, if for a given SDP the policy
backward induction to be correct if, for a given SDP, the policy
sequence computed by |bi| is the solution to the SDP.
I.e., we consider |bi| to be correct if it meets the specification
%
......@@ -230,7 +230,8 @@ and it is not ``obviously clear'' that |val| and |val'| are
extensionally equal without further knowledge about |meas|.
In the deterministic case, i.e. for |M = Id| and
|meas = id|, they are indeed equal without imposing any further
|meas = id|, |val ps x| and |val' ps x| are indeed equal for all |ps|
and |x|, without imposing any further
conditions (as we will see in Sec.~\ref{section:valval}).
For the stochastic case, \cite[Theorem
4.2.1]{puterman2014markov} suggests that the equality
......
......@@ -85,9 +85,9 @@ interaction of the measure with the monad structure and the
|<+>|-operator on |Val|.
%
Machine-checked proofs are given in the
appendices~\ref{appendix:theorem},~\ref{appendix:biCorrectness} and
Appendices~\ref{appendix:theorem},~\ref{appendix:biCorrectness} and
\ref{appendix:lemmas}. The monad laws we use are stated in
appendix~\ref{appendix:monadLaws}. In the remainder of this section,
Appendix~\ref{appendix:monadLaws}. In the remainder of this section,
we discuss semi-formal versions of the proofs.\\
\paragraph*{Monad algebras.} \hspace{0.1cm} The first lemma allows us
......@@ -229,8 +229,8 @@ that the equality holds.\\
The induction hypothesis (|IH|) is:
for all |x : X t|, |val' ps x = val ps x|. We have to show that
|IH| implies that for all |p : Policy t| and |x : X t|, the equality
|val' (p :: ps) x = val (p :: ps) x| holds.\\
%
|val' (p :: ps) x = val (p :: ps) x| holds.
For brevity (and to economise on brackets), let in the following
|y = p x|, |mx' = next t x y|, |r = reward t x y|, |trjps = trj
ps|, and |consxy = ((x ** y) :::)|.
......@@ -295,7 +295,7 @@ may be uninteresting for a pen and paper proof, but turn out to be
crucial in the setting of an intensional type theory -- like Idris --
where function extensionality does not hold in general.
In particular, we have to postulate that the functorial |map|
preserves extensional equality (see appendix~\ref{appendix:monadLaws}
preserves extensional equality (see Appendix~\ref{appendix:monadLaws}
and \citep{botta2020extensional}) for Idris to accept the proof.
In fact, most of the reasoning proceeds by replacing functions that are mapped
onto monadic values by other functions that are only extensionally
......
%!PS-Adobe-3.0 EPSF-3.0
%Produced by poppler pdftops version: 0.86.1 (http://poppler.freedesktop.org)
%%Creator: TeX
%%LanguageLevel: 2
%%DocumentSuppliedResources: (atend)
%%BoundingBox: 0 0 116 67
%%HiResBoundingBox: 0 0 115.147 66.165
%%DocumentSuppliedResources: (atend)
%%EndComments
%%BeginProlog
%%BeginResource: procset xpdf 3.00 0
%%Copyright: Copyright 1996-2011 Glyph & Cog, LLC
/xpdf 75 dict def xpdf begin
% PDF special state
/pdfDictSize 15 def
/pdfSetup {
/setpagedevice where {
pop 2 dict begin
/Policies 1 dict dup begin /PageSize 6 def end def
{ /Duplex true def } if
currentdict end setpagedevice
} {
pop
} ifelse
} def
/pdfSetupPaper {
% Change paper size, but only if different from previous paper size otherwise
% duplex fails. PLRM specifies a tolerance of 5 pts when matching paper size
% so we use the same when checking if the size changes.
/setpagedevice where {
pop currentpagedevice
/PageSize known {
2 copy
currentpagedevice /PageSize get aload pop
exch 4 1 roll
sub abs 5 gt
3 1 roll
sub abs 5 gt
or
} {
true
} ifelse
{
2 array astore
2 dict begin
/PageSize exch def
/ImagingBBox null def
currentdict end
setpagedevice
} {
pop pop
} ifelse
} {
pop
} ifelse
} def
/pdfStartPage {
pdfDictSize dict begin
/pdfFillCS [] def
/pdfFillXform {} def
/pdfStrokeCS [] def
/pdfStrokeXform {} def
/pdfFill [0] def
/pdfStroke [0] def
/pdfFillOP false def
/pdfStrokeOP false def
/pdfLastFill false def
/pdfLastStroke false def
/pdfTextMat [1 0 0 1 0 0] def
/pdfFontSize 0 def
/pdfCharSpacing 0 def
/pdfTextRender 0 def
/pdfPatternCS false def
/pdfTextRise 0 def
/pdfWordSpacing 0 def
/pdfHorizScaling 1 def
/pdfTextClipPath [] def
} def
/pdfEndPage { end } def
% PDF color state
/cs { /pdfFillXform exch def dup /pdfFillCS exch def
setcolorspace } def
/CS { /pdfStrokeXform exch def dup /pdfStrokeCS exch def
setcolorspace } def
/sc { pdfLastFill not { pdfFillCS setcolorspace } if
dup /pdfFill exch def aload pop pdfFillXform setcolor
/pdfLastFill true def /pdfLastStroke false def } def
/SC { pdfLastStroke not { pdfStrokeCS setcolorspace } if
dup /pdfStroke exch def aload pop pdfStrokeXform setcolor
/pdfLastStroke true def /pdfLastFill false def } def
/op { /pdfFillOP exch def
pdfLastFill { pdfFillOP setoverprint } if } def
/OP { /pdfStrokeOP exch def
pdfLastStroke { pdfStrokeOP setoverprint } if } def
/fCol {
pdfLastFill not {
pdfFillCS setcolorspace
pdfFill aload pop pdfFillXform setcolor
pdfFillOP setoverprint
/pdfLastFill true def /pdfLastStroke false def
} if
} def
/sCol {
pdfLastStroke not {
pdfStrokeCS setcolorspace
pdfStroke aload pop pdfStrokeXform setcolor
pdfStrokeOP setoverprint
/pdfLastStroke true def /pdfLastFill false def
} if
} def
% build a font
/pdfMakeFont {
4 3 roll findfont
4 2 roll matrix scale makefont
dup length dict begin
{ 1 index /FID ne { def } { pop pop } ifelse } forall
/Encoding exch def
currentdict
end
definefont pop
} def
/pdfMakeFont16 {
exch findfont
dup length dict begin
{ 1 index /FID ne { def } { pop pop } ifelse } forall
/WMode exch def
currentdict
end
definefont pop
} def
% graphics state operators
/q { gsave pdfDictSize dict begin } def
/Q {
end grestore
/pdfLastFill where {
pop
pdfLastFill {
pdfFillOP setoverprint
} {
pdfStrokeOP setoverprint
} ifelse
} if
} def
/cm { concat } def
/d { setdash } def
/i { setflat } def
/j { setlinejoin } def
/J { setlinecap } def
/M { setmiterlimit } def
/w { setlinewidth } def
% path segment operators
/m { moveto } def
/l { lineto } def
/c { curveto } def
/re { 4 2 roll moveto 1 index 0 rlineto 0 exch rlineto
neg 0 rlineto closepath } def
/h { closepath } def
% path painting operators
/S { sCol stroke } def
/Sf { fCol stroke } def
/f { fCol fill } def
/f* { fCol eofill } def
% clipping operators
/W { clip newpath } def
/W* { eoclip newpath } def
/Ws { strokepath clip newpath } def
% text state operators
/Tc { /pdfCharSpacing exch def } def
/Tf { dup /pdfFontSize exch def
dup pdfHorizScaling mul exch matrix scale
pdfTextMat matrix concatmatrix dup 4 0 put dup 5 0 put
exch findfont exch makefont setfont } def
/Tr { /pdfTextRender exch def } def
/Tp { /pdfPatternCS exch def } def
/Ts { /pdfTextRise exch def } def
/Tw { /pdfWordSpacing exch def } def
/Tz { /pdfHorizScaling exch def } def
% text positioning operators
/Td { pdfTextMat transform moveto } def
/Tm { /pdfTextMat exch def } def
% text string operators
/xyshow where {
pop
/xyshow2 {
dup length array
0 2 2 index length 1 sub {
2 index 1 index 2 copy get 3 1 roll 1 add get
pdfTextMat dtransform
4 2 roll 2 copy 6 5 roll put 1 add 3 1 roll dup 4 2 roll put
} for
exch pop
xyshow
} def
}{
/xyshow2 {
currentfont /FontType get 0 eq {
0 2 3 index length 1 sub {
currentpoint 4 index 3 index 2 getinterval show moveto
2 copy get 2 index 3 2 roll 1 add get
pdfTextMat dtransform rmoveto
} for
} {
0 1 3 index length 1 sub {
currentpoint 4 index 3 index 1 getinterval show moveto
2 copy 2 mul get 2 index 3 2 roll 2 mul 1 add get
pdfTextMat dtransform rmoveto
} for
} ifelse
pop pop
} def
} ifelse
/cshow where {
pop
/xycp {
0 3 2 roll
{
pop pop currentpoint 3 2 roll
1 string dup 0 4 3 roll put false charpath moveto
2 copy get 2 index 2 index 1 add get
pdfTextMat dtransform rmoveto
2 add
} exch cshow
pop pop
} def
}{
/xycp {
currentfont /FontType get 0 eq {
0 2 3 index length 1 sub {
currentpoint 4 index 3 index 2 getinterval false charpath moveto
2 copy get 2 index 3 2 roll 1 add get
pdfTextMat dtransform rmoveto
} for
} {
0 1 3 index length 1 sub {
currentpoint 4 index 3 index 1 getinterval false charpath moveto
2 copy 2 mul get 2 index 3 2 roll 2 mul 1 add get
pdfTextMat dtransform rmoveto
} for
} ifelse
pop pop
} def
} ifelse
/Tj {
fCol
0 pdfTextRise pdfTextMat dtransform rmoveto
currentpoint 4 2 roll
pdfTextRender 1 and 0 eq {
2 copy xyshow2
} if
pdfTextRender 3 and dup 1 eq exch 2 eq or {
3 index 3 index moveto
2 copy
currentfont /FontType get 3 eq { fCol } { sCol } ifelse
xycp currentpoint stroke moveto
} if
pdfTextRender 4 and 0 ne {
4 2 roll moveto xycp
/pdfTextClipPath [ pdfTextClipPath aload pop
{/moveto cvx}
{/lineto cvx}
{/curveto cvx}
{/closepath cvx}
pathforall ] def
currentpoint newpath moveto
} {
pop pop pop pop
} ifelse
0 pdfTextRise neg pdfTextMat dtransform rmoveto
} def
/TJm { 0.001 mul pdfFontSize mul pdfHorizScaling mul neg 0
pdfTextMat dtransform rmoveto } def
/TJmV { 0.001 mul pdfFontSize mul neg 0 exch
pdfTextMat dtransform rmoveto } def
/Tclip { pdfTextClipPath cvx exec clip newpath