Commit 001d4f0b authored by Nicola Botta's avatar Nicola Botta
Browse files

Final.

parent c1f2186a
...@@ -621,7 +621,7 @@ Optimal extensions: ...@@ -621,7 +621,7 @@ Optimal extensions:
%endif %endif
Together with the result of \bottaetal (|biOptVal|, see Together with the result of \bottaetal (|biOptVal|, see
appendix~\ref{appendix:bilemma} below) we can prove the correctness Appendix~\ref{appendix:bilemma} below) we can prove the correctness
of monadic backward induction as corollary, using a of monadic backward induction as corollary, using a
generalised optimality of policy sequences predicate: generalised optimality of policy sequences predicate:
......
...@@ -44,18 +44,19 @@ extensional equality of |val| and |val'|: ...@@ -44,18 +44,19 @@ extensional equality of |val| and |val'|:
\item[{\bf Condition 1.}] \item[{\bf Condition 1.}]
The measure needs to be left-inverse to |pure|: The measure needs to be left-inverse to |pure|:
\footnote{The symbol |`ExtEq`| denotes \emph{extensional} equality, \footnote{The symbol |`ExtEq`| denotes \emph{extensional} equality,
see appendix~\ref{appendix:monadLaws}} see Appendix~\ref{appendix:monadLaws}}
< measPureSpec : meas . pure `ExtEq` id < measPureSpec : meas . pure `ExtEq` id
\begin{center} \begin{center}
\begin{tikzcd}[column sep=2cm] \includegraphics{img/diag1.eps}
% % \begin{tikzcd}[column sep=2cm]
Val \arrow[d, "pure", swap] \arrow[dr, "id"] \\ % %
M Val \arrow[r, "meas", swap] & Val\\ % Val \arrow[d, "pure", swap] \arrow[dr, "id"] \\
% % M Val \arrow[r, "meas", swap] & Val\\
\end{tikzcd} % %
% \end{tikzcd}
\end{center} \end{center}
% measJoinSpec % measJoinSpec
...@@ -67,12 +68,13 @@ equal to applying it after |map meas|: ...@@ -67,12 +68,13 @@ equal to applying it after |map meas|:
< measJoinSpec : meas . join `ExtEq` meas . map meas < measJoinSpec : meas . join `ExtEq` meas . map meas
\begin{center} \begin{center}
\begin{tikzcd}[column sep=3cm] \includegraphics{img/diag2.eps}
% % \begin{tikzcd}[column sep=3cm]
M (M Val) \arrow[r, "map\ meas"] \arrow[d, "join", swap] & M Val \arrow[d, "meas" ] \\ % %
M Val \arrow[r, "meas", swap] & Val\\ % M (M Val) \arrow[r, "map\ meas"] \arrow[d, "join", swap] & M Val \arrow[d, "meas" ] \\
% % M Val \arrow[r, "meas", swap] & Val\\
\end{tikzcd} % %
% \end{tikzcd}
\end{center} \end{center}
% measPlusSpec % measPlusSpec
...@@ -87,14 +89,15 @@ applying |(v <+>)| after the measure: ...@@ -87,14 +89,15 @@ applying |(v <+>)| after the measure:
< (meas . map (v <+>)) mv = ((v <+>) . meas) mv < (meas . map (v <+>)) mv = ((v <+>) . meas) mv
\begin{center} \begin{center}
\begin{tikzcd}[column sep=3cm] \includegraphics{img/diag3.eps}
% % \begin{tikzcd}[column sep=3cm]
M Val \arrow[r, "map (v\, \oplus)"] \arrow[d, "meas", swap] & M Val \arrow[d, "meas" ] \\ % %
Val \arrow[r, "(v\, \oplus)", swap] & Val\\ % M Val \arrow[r, "map (v\, \oplus)"] \arrow[d, "meas", swap] & M Val \arrow[d, "meas" ] \\
% Val \arrow[r, "(v\, \oplus)", swap] & Val\\
% %
% \end{tikzcd}
% %
\end{tikzcd} %\vspace{0.1cm}
\vspace{0.1cm}
\end{center} \end{center}
\end{itemize} \end{itemize}
...@@ -146,15 +149,19 @@ is neutral for |*|. ...@@ -146,15 +149,19 @@ is neutral for |*|.
The second and the third condition require some arithmetic reasoning, so The second and the third condition require some arithmetic reasoning, so
let us just consider them for two examples. let us just consider them for two examples.
Say we have distributions Let |a, b, c, d| be variables of type |Double| and say we have distributions
%if False
> parameters (a, b, c, d, e : Double) > parameters (a, b, c, d : Double)
%endif
> dps1 : Dist Double > dps1 : Dist Double
> dps1 = [(a, 0.5), (b, 0.3), (c, 0.2)] > dps1 = [(a, 0.5), (b, 0.3), (c, 0.2)]
> dps2 : Dist Double > dps2 : Dist Double
> dps2 = [(d, 0.4), (e, 0.6)] > dps2 = [(a, 0.4), (d, 0.6)]
> dpdps : Dist (Dist Double) > dpdps : Dist (Dist Double)
> dpdps = [(dps1, 0.1), (dps2, 0.9)] > dpdps = [(dps1, 0.1), (dps2, 0.9)]
...@@ -170,13 +177,13 @@ addition and multiplication: ...@@ -170,13 +177,13 @@ addition and multiplication:
< (expected . distJoin) dpdps = < (expected . distJoin) dpdps =
< <
< expected [(a, 0.5 * 0.1), (b, 0.3 * 0.1), (c, 0.2 * 0.1), (d, 0.4 * 0.9), (e, 0.6 * 0.9)] = < expected [(a, 0.5 * 0.1), (b, 0.3 * 0.1), (c, 0.2 * 0.1), (a, 0.4 * 0.9), (d, 0.6 * 0.9)] =
< <
< (a * 0.5 * 0.1) + (b * 0.3 * 0.1) + (c * 0.2 * 0.1) + (d * 0.4 * 0.9) + (e * 0.6 * 0.9) = < (a * 0.5 * 0.1) + (b * 0.3 * 0.1) + (c * 0.2 * 0.1) + (a * 0.4 * 0.9) + (d * 0.6 * 0.9) =
< <
< ((a * 0.5 + b * 0.3 + c * 0.2) * 0.1 + (d * 0.4 + e * 0.6) * 0.9 = < ((a * 0.5 + b * 0.3 + c * 0.2) * 0.1 + (a * 0.4 + d * 0.6) * 0.9 =
< <
< expected [(a * 0.5 + b * 0.3 + c * 0.2, 0.1) , (d * 0.4 + e * 0.6, 0.9)] = < expected [(a * 0.5 + b * 0.3 + c * 0.2, 0.1) , (a * 0.4 + d * 0.6, 0.9)] =
< <
< (expected . distMap expected) dpdps < (expected . distMap expected) dpdps
......
...@@ -120,9 +120,9 @@ The price to pay for including the non-emptiness premise is ...@@ -120,9 +120,9 @@ The price to pay for including the non-emptiness premise is
the additional condition |nextNotEmpty| on the transition function the additional condition |nextNotEmpty| on the transition function
|next| that was already stated in Sec.~\ref{subsection:wrap-up}. |next| that was already stated in Sec.~\ref{subsection:wrap-up}.
Moreover, we have to postulate non-emptiness preservation laws for the Moreover, we have to postulate non-emptiness preservation laws for the
monad operations (appendix~\ref{appendix:monadLaws}) and to prove an monad operations (Appendix~\ref{appendix:monadLaws}) and to prove an
additional lemma about the preservation of non-emptiness additional lemma about the preservation of non-emptiness
(appendix~\ref{appendix:lemmas}). (Appendix~\ref{appendix:lemmas}).
% %
Conceptually, it might seem cleaner to omit the non-emptiness Conceptually, it might seem cleaner to omit the non-emptiness
condition: In this case, the remaining conditions would only concern condition: In this case, the remaining conditions would only concern
......
...@@ -43,7 +43,8 @@ control theory for finite-horizon, discrete-time SDPs. It extends ...@@ -43,7 +43,8 @@ control theory for finite-horizon, discrete-time SDPs. It extends
mathematical formulations for stochastic mathematical formulations for stochastic
SDPs \citep{bertsekas1995, bertsekasShreve96, puterman2014markov} SDPs \citep{bertsekas1995, bertsekasShreve96, puterman2014markov}
to the general problem of optimal decision making under \emph{monadic} to the general problem of optimal decision making under \emph{monadic}
uncertainty.\\ uncertainty.
For monadic SDPs, the framework provides a For monadic SDPs, the framework provides a
generic implementation of backward induction. It has been applied to generic implementation of backward induction. It has been applied to
study the impact of uncertainties on optimal emission policies study the impact of uncertainties on optimal emission policies
...@@ -156,7 +157,8 @@ Note that for deterministic problems it is unnecessary to parameterise ...@@ -156,7 +157,8 @@ Note that for deterministic problems it is unnecessary to parameterise
the reward function over the next state as it is unique and can thus be the reward function over the next state as it is unique and can thus be
obtained from the current state and control. But for non-deterministic obtained from the current state and control. But for non-deterministic
problems it is useful to be able to assign rewards depending on the problems it is useful to be able to assign rewards depending on the
(uncertain) outcome of a transition.\\ (uncertain) outcome of a transition.
A few remarks are at place here. A few remarks are at place here.
\begin{itemize} \begin{itemize}
...@@ -477,7 +479,7 @@ Like the reference value |zero| discussed above, |plusMonSpec| and ...@@ -477,7 +479,7 @@ Like the reference value |zero| discussed above, |plusMonSpec| and
|measMonSpec| are specification components of the BJI-framework that |measMonSpec| are specification components of the BJI-framework that
we have not discussed in Sec.~\ref{subsection:specification_components}. we have not discussed in Sec.~\ref{subsection:specification_components}.
% %
We provide a proof of |Bellman| in appendix~\ref{appendix:Bellman}. As one We provide a proof of |Bellman| in Appendix~\ref{appendix:Bellman}. As one
would expect, the proof makes essential use of the recursive definition of would expect, the proof makes essential use of the recursive definition of
the function |val| discussed above. the function |val| discussed above.
% %
...@@ -597,7 +599,7 @@ orthogonal to the purpose of the current paper. ...@@ -597,7 +599,7 @@ orthogonal to the purpose of the current paper.
For the same reason we have not addressed the question of how to make |bi| For the same reason we have not addressed the question of how to make |bi|
more efficient by tabulation. more efficient by tabulation.
We briefly discuss the specification and implementation of optimal extensions We briefly discuss the specification and implementation of optimal extensions
in the BJI-framework in appendix~\ref{appendix:optimal_extension}. in the BJI-framework in Appendix~\ref{appendix:optimal_extension}.
We refer the reader interested in tabulation of |bi| to We refer the reader interested in tabulation of |bi| to
\href{https://gitlab.pik-potsdam.de/botta/IdrisLibs/-/blob/master/SequentialDecisionProblems/TabBackwardsInduction.lidr} \href{https://gitlab.pik-potsdam.de/botta/IdrisLibs/-/blob/master/SequentialDecisionProblems/TabBackwardsInduction.lidr}
{SequentialDecisionProblems.TabBackwardsInduction} {SequentialDecisionProblems.TabBackwardsInduction}
......
...@@ -156,5 +156,5 @@ The sources of this document have been written in literal Idris and are ...@@ -156,5 +156,5 @@ The sources of this document have been written in literal Idris and are
available at \citep{IdrisLibsValVal}, together with some example code. available at \citep{IdrisLibsValVal}, together with some example code.
All source files can be type checked with Idris~1.3.2. All source files can be type checked with Idris~1.3.2.
\vfill %\vfill
\pagebreak %\pagebreak
\ No newline at end of file \ No newline at end of file
main: main.tex main: main.tex
latexmk -pdf main.tex latexmk -pdf main.tex
final: main
latex main.tex
dvips main.dvi
ps2pdf main.ps
LHS2TEX = lhs2TeX LHS2TEX = lhs2TeX
......
...@@ -58,8 +58,8 @@ on total rewards which is usually aggregated using the familiar ...@@ -58,8 +58,8 @@ on total rewards which is usually aggregated using the familiar
\emph{expected value} measure. The value thus obtained is called the \emph{expected value} measure. The value thus obtained is called the
\emph{expected total reward} \citep[ch.~4.1.2]{puterman2014markov} and \emph{expected total reward} \citep[ch.~4.1.2]{puterman2014markov} and
its role is central: It is the quantity that is to be optimised in its role is central: It is the quantity that is to be optimised in
an SDP.\\ an SDP.
%
In monadic SDPs, the measure is generic, i.e. it is not fixed in advance In monadic SDPs, the measure is generic, i.e. it is not fixed in advance
but has to be given as part of the specification of a concrete problem. but has to be given as part of the specification of a concrete problem.
Therefore we will generalise the notion of \emph{expected total reward} to Therefore we will generalise the notion of \emph{expected total reward} to
...@@ -76,8 +76,8 @@ Similarly, we define that \emph{solving a monadic SDP} consists in ...@@ -76,8 +76,8 @@ Similarly, we define that \emph{solving a monadic SDP} consists in
This means that when starting from any initial state at decision step This means that when starting from any initial state at decision step
|t|, following the computed list of rules for selecting controls will |t|, following the computed list of rules for selecting controls will
result in a value that is maximal as measure of the sum of rewards result in a value that is maximal as measure of the sum of rewards
along all possible trajectories rooted in that initial state.\\ along all possible trajectories rooted in that initial state.
%
Equivalently, rewards can instead be considered as \emph{costs} Equivalently, rewards can instead be considered as \emph{costs}
that need to be \emph{minimised}. This dual perspective is taken e.g. that need to be \emph{minimised}. This dual perspective is taken e.g.
in \citep{bertsekas1995}. In the subsequent sections we will follow in \citep{bertsekas1995}. In the subsequent sections we will follow
...@@ -141,22 +141,27 @@ Fig.~\ref{fig:examplesGraph}. ...@@ -141,22 +141,27 @@ Fig.~\ref{fig:examplesGraph}.
\begin{subfigure}[b]{.35\textwidth-0.55cm} \begin{subfigure}[b]{.35\textwidth-0.55cm}
\centering \centering
\scalebox{0.7}{ %\scalebox{0.7}{
\small \includegraphics[height=0.26\textheight]{img/fig1a.eps}
\input{NFAexample.tex}
% \small
% \input{NFAexample.tex}
} %}
\caption{Example~1} \caption{Example~1}
\label{fig:example1} \label{fig:example1}
\end{subfigure} \end{subfigure}
\begin{subfigure}[b]{.65\textwidth} \begin{subfigure}[b]{.65\textwidth}
\centering \centering
\scalebox{0.7}{ %\scalebox{0.7}{
\input{SchedulingExample.tex}
\vspace{0.3cm} \includegraphics[height=0.25\textheight]{img/fig1b.eps}
}
% \input{SchedulingExample.tex}
%\vspace{0.3cm}
% }
\caption{Example~2} \caption{Example~2}
\label{fig:example2} \label{fig:example2}
...@@ -218,8 +223,8 @@ these lines we refer the interested reader to~\citep{esd-9-525-2018}. ...@@ -218,8 +223,8 @@ these lines we refer the interested reader to~\citep{esd-9-525-2018}.
Scheduling Scheduling
problems serve as canonical examples in control theory textbooks. The problems serve as canonical examples in control theory textbooks. The
one we present here is a slightly modified version of one we present here is a slightly modified version of
\citep[Example~1.1.2]{bertsekas1995}.\\ \citep[Example~1.1.2]{bertsekas1995}.
%
Think of some machine in a factory that can perform different Think of some machine in a factory that can perform different
operations, say $A, B, C$ and $D$. Each of these operations is supposed operations, say $A, B, C$ and $D$. Each of these operations is supposed
to be performed once. The machine can only perform one operation at a to be performed once. The machine can only perform one operation at a
......
...@@ -71,8 +71,8 @@ starting at that state: ...@@ -71,8 +71,8 @@ starting at that state:
where we use |StateCtrlSeq| as type of trajectories. Essentially it is where we use |StateCtrlSeq| as type of trajectories. Essentially it is
a non-empty list of (dependent) state/control pairs, with the exception of the base case a non-empty list of (dependent) state/control pairs, with the exception of the base case
which is a singleton just containing the last state reached.\\ which is a singleton just containing the last state reached.
%
Furthermore, we can compute the \emph{total reward} for a single Furthermore, we can compute the \emph{total reward} for a single
trajectory, i.e. its sum of rewards: trajectory, i.e. its sum of rewards:
...@@ -114,7 +114,7 @@ of the sum of rewards along all possible trajectories of length |n| ...@@ -114,7 +114,7 @@ of the sum of rewards along all possible trajectories of length |n|
that are rooted in an initial state at step |t|. that are rooted in an initial state at step |t|.
Again by analogy to the stochastic case, we define monadic Again by analogy to the stochastic case, we define monadic
backward induction to be correct, if for a given SDP the policy backward induction to be correct if, for a given SDP, the policy
sequence computed by |bi| is the solution to the SDP. sequence computed by |bi| is the solution to the SDP.
I.e., we consider |bi| to be correct if it meets the specification I.e., we consider |bi| to be correct if it meets the specification
% %
...@@ -230,7 +230,8 @@ and it is not ``obviously clear'' that |val| and |val'| are ...@@ -230,7 +230,8 @@ and it is not ``obviously clear'' that |val| and |val'| are
extensionally equal without further knowledge about |meas|. extensionally equal without further knowledge about |meas|.
In the deterministic case, i.e. for |M = Id| and In the deterministic case, i.e. for |M = Id| and
|meas = id|, they are indeed equal without imposing any further |meas = id|, |val ps x| and |val' ps x| are indeed equal for all |ps|
and |x|, without imposing any further
conditions (as we will see in Sec.~\ref{section:valval}). conditions (as we will see in Sec.~\ref{section:valval}).
For the stochastic case, \cite[Theorem For the stochastic case, \cite[Theorem
4.2.1]{puterman2014markov} suggests that the equality 4.2.1]{puterman2014markov} suggests that the equality
......
...@@ -85,9 +85,9 @@ interaction of the measure with the monad structure and the ...@@ -85,9 +85,9 @@ interaction of the measure with the monad structure and the
|<+>|-operator on |Val|. |<+>|-operator on |Val|.
% %
Machine-checked proofs are given in the Machine-checked proofs are given in the
appendices~\ref{appendix:theorem},~\ref{appendix:biCorrectness} and Appendices~\ref{appendix:theorem},~\ref{appendix:biCorrectness} and
\ref{appendix:lemmas}. The monad laws we use are stated in \ref{appendix:lemmas}. The monad laws we use are stated in
appendix~\ref{appendix:monadLaws}. In the remainder of this section, Appendix~\ref{appendix:monadLaws}. In the remainder of this section,
we discuss semi-formal versions of the proofs.\\ we discuss semi-formal versions of the proofs.\\
\paragraph*{Monad algebras.} \hspace{0.1cm} The first lemma allows us \paragraph*{Monad algebras.} \hspace{0.1cm} The first lemma allows us
...@@ -229,8 +229,8 @@ that the equality holds.\\ ...@@ -229,8 +229,8 @@ that the equality holds.\\
The induction hypothesis (|IH|) is: The induction hypothesis (|IH|) is:
for all |x : X t|, |val' ps x = val ps x|. We have to show that for all |x : X t|, |val' ps x = val ps x|. We have to show that
|IH| implies that for all |p : Policy t| and |x : X t|, the equality |IH| implies that for all |p : Policy t| and |x : X t|, the equality
|val' (p :: ps) x = val (p :: ps) x| holds.\\ |val' (p :: ps) x = val (p :: ps) x| holds.
%
For brevity (and to economise on brackets), let in the following For brevity (and to economise on brackets), let in the following
|y = p x|, |mx' = next t x y|, |r = reward t x y|, |trjps = trj |y = p x|, |mx' = next t x y|, |r = reward t x y|, |trjps = trj
ps|, and |consxy = ((x ** y) :::)|. ps|, and |consxy = ((x ** y) :::)|.
...@@ -295,7 +295,7 @@ may be uninteresting for a pen and paper proof, but turn out to be ...@@ -295,7 +295,7 @@ may be uninteresting for a pen and paper proof, but turn out to be
crucial in the setting of an intensional type theory -- like Idris -- crucial in the setting of an intensional type theory -- like Idris --
where function extensionality does not hold in general. where function extensionality does not hold in general.
In particular, we have to postulate that the functorial |map| In particular, we have to postulate that the functorial |map|
preserves extensional equality (see appendix~\ref{appendix:monadLaws} preserves extensional equality (see Appendix~\ref{appendix:monadLaws}
and \citep{botta2020extensional}) for Idris to accept the proof. and \citep{botta2020extensional}) for Idris to accept the proof.
In fact, most of the reasoning proceeds by replacing functions that are mapped In fact, most of the reasoning proceeds by replacing functions that are mapped
onto monadic values by other functions that are only extensionally onto monadic values by other functions that are only extensionally
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment