Commit 001d4f0b authored by Nicola Botta's avatar Nicola Botta
Browse files

Final.

parent c1f2186a
......@@ -621,7 +621,7 @@ Optimal extensions:
%endif
Together with the result of \bottaetal (|biOptVal|, see
appendix~\ref{appendix:bilemma} below) we can prove the correctness
Appendix~\ref{appendix:bilemma} below) we can prove the correctness
of monadic backward induction as corollary, using a
generalised optimality of policy sequences predicate:
......
......@@ -44,18 +44,19 @@ extensional equality of |val| and |val'|:
\item[{\bf Condition 1.}]
The measure needs to be left-inverse to |pure|:
\footnote{The symbol |`ExtEq`| denotes \emph{extensional} equality,
see appendix~\ref{appendix:monadLaws}}
see Appendix~\ref{appendix:monadLaws}}
< measPureSpec : meas . pure `ExtEq` id
\begin{center}
\begin{tikzcd}[column sep=2cm]
%
Val \arrow[d, "pure", swap] \arrow[dr, "id"] \\
M Val \arrow[r, "meas", swap] & Val\\
%
\end{tikzcd}
\includegraphics{img/diag1.eps}
% \begin{tikzcd}[column sep=2cm]
% %
% Val \arrow[d, "pure", swap] \arrow[dr, "id"] \\
% M Val \arrow[r, "meas", swap] & Val\\
% %
% \end{tikzcd}
\end{center}
% measJoinSpec
......@@ -67,12 +68,13 @@ equal to applying it after |map meas|:
< measJoinSpec : meas . join `ExtEq` meas . map meas
\begin{center}
\begin{tikzcd}[column sep=3cm]
%
M (M Val) \arrow[r, "map\ meas"] \arrow[d, "join", swap] & M Val \arrow[d, "meas" ] \\
M Val \arrow[r, "meas", swap] & Val\\
%
\end{tikzcd}
\includegraphics{img/diag2.eps}
% \begin{tikzcd}[column sep=3cm]
% %
% M (M Val) \arrow[r, "map\ meas"] \arrow[d, "join", swap] & M Val \arrow[d, "meas" ] \\
% M Val \arrow[r, "meas", swap] & Val\\
% %
% \end{tikzcd}
\end{center}
% measPlusSpec
......@@ -87,14 +89,15 @@ applying |(v <+>)| after the measure:
< (meas . map (v <+>)) mv = ((v <+>) . meas) mv
\begin{center}
\begin{tikzcd}[column sep=3cm]
%
M Val \arrow[r, "map (v\, \oplus)"] \arrow[d, "meas", swap] & M Val \arrow[d, "meas" ] \\
Val \arrow[r, "(v\, \oplus)", swap] & Val\\
\includegraphics{img/diag3.eps}
% \begin{tikzcd}[column sep=3cm]
% %
% M Val \arrow[r, "map (v\, \oplus)"] \arrow[d, "meas", swap] & M Val \arrow[d, "meas" ] \\
% Val \arrow[r, "(v\, \oplus)", swap] & Val\\
% %
% \end{tikzcd}
%
\end{tikzcd}
\vspace{0.1cm}
%\vspace{0.1cm}
\end{center}
\end{itemize}
......@@ -146,15 +149,19 @@ is neutral for |*|.
The second and the third condition require some arithmetic reasoning, so
let us just consider them for two examples.
Say we have distributions
Let |a, b, c, d| be variables of type |Double| and say we have distributions
%if False
> parameters (a, b, c, d, e : Double)
> parameters (a, b, c, d : Double)
%endif
> dps1 : Dist Double
> dps1 = [(a, 0.5), (b, 0.3), (c, 0.2)]
> dps2 : Dist Double
> dps2 = [(d, 0.4), (e, 0.6)]
> dps2 = [(a, 0.4), (d, 0.6)]
> dpdps : Dist (Dist Double)
> dpdps = [(dps1, 0.1), (dps2, 0.9)]
......@@ -170,13 +177,13 @@ addition and multiplication:
< (expected . distJoin) dpdps =
<
< expected [(a, 0.5 * 0.1), (b, 0.3 * 0.1), (c, 0.2 * 0.1), (d, 0.4 * 0.9), (e, 0.6 * 0.9)] =
< expected [(a, 0.5 * 0.1), (b, 0.3 * 0.1), (c, 0.2 * 0.1), (a, 0.4 * 0.9), (d, 0.6 * 0.9)] =
<
< (a * 0.5 * 0.1) + (b * 0.3 * 0.1) + (c * 0.2 * 0.1) + (d * 0.4 * 0.9) + (e * 0.6 * 0.9) =
< (a * 0.5 * 0.1) + (b * 0.3 * 0.1) + (c * 0.2 * 0.1) + (a * 0.4 * 0.9) + (d * 0.6 * 0.9) =
<
< ((a * 0.5 + b * 0.3 + c * 0.2) * 0.1 + (d * 0.4 + e * 0.6) * 0.9 =
< ((a * 0.5 + b * 0.3 + c * 0.2) * 0.1 + (a * 0.4 + d * 0.6) * 0.9 =
<
< expected [(a * 0.5 + b * 0.3 + c * 0.2, 0.1) , (d * 0.4 + e * 0.6, 0.9)] =
< expected [(a * 0.5 + b * 0.3 + c * 0.2, 0.1) , (a * 0.4 + d * 0.6, 0.9)] =
<
< (expected . distMap expected) dpdps
......
......@@ -120,9 +120,9 @@ The price to pay for including the non-emptiness premise is
the additional condition |nextNotEmpty| on the transition function
|next| that was already stated in Sec.~\ref{subsection:wrap-up}.
Moreover, we have to postulate non-emptiness preservation laws for the
monad operations (appendix~\ref{appendix:monadLaws}) and to prove an
monad operations (Appendix~\ref{appendix:monadLaws}) and to prove an
additional lemma about the preservation of non-emptiness
(appendix~\ref{appendix:lemmas}).
(Appendix~\ref{appendix:lemmas}).
%
Conceptually, it might seem cleaner to omit the non-emptiness
condition: In this case, the remaining conditions would only concern
......
......@@ -43,7 +43,8 @@ control theory for finite-horizon, discrete-time SDPs. It extends
mathematical formulations for stochastic
SDPs \citep{bertsekas1995, bertsekasShreve96, puterman2014markov}
to the general problem of optimal decision making under \emph{monadic}
uncertainty.\\
uncertainty.
For monadic SDPs, the framework provides a
generic implementation of backward induction. It has been applied to
study the impact of uncertainties on optimal emission policies
......@@ -156,7 +157,8 @@ Note that for deterministic problems it is unnecessary to parameterise
the reward function over the next state as it is unique and can thus be
obtained from the current state and control. But for non-deterministic
problems it is useful to be able to assign rewards depending on the
(uncertain) outcome of a transition.\\
(uncertain) outcome of a transition.
A few remarks are at place here.
\begin{itemize}
......@@ -477,7 +479,7 @@ Like the reference value |zero| discussed above, |plusMonSpec| and
|measMonSpec| are specification components of the BJI-framework that
we have not discussed in Sec.~\ref{subsection:specification_components}.
%
We provide a proof of |Bellman| in appendix~\ref{appendix:Bellman}. As one
We provide a proof of |Bellman| in Appendix~\ref{appendix:Bellman}. As one
would expect, the proof makes essential use of the recursive definition of
the function |val| discussed above.
%
......@@ -597,7 +599,7 @@ orthogonal to the purpose of the current paper.
For the same reason we have not addressed the question of how to make |bi|
more efficient by tabulation.
We briefly discuss the specification and implementation of optimal extensions
in the BJI-framework in appendix~\ref{appendix:optimal_extension}.
in the BJI-framework in Appendix~\ref{appendix:optimal_extension}.
We refer the reader interested in tabulation of |bi| to
\href{https://gitlab.pik-potsdam.de/botta/IdrisLibs/-/blob/master/SequentialDecisionProblems/TabBackwardsInduction.lidr}
{SequentialDecisionProblems.TabBackwardsInduction}
......
......@@ -156,5 +156,5 @@ The sources of this document have been written in literal Idris and are
available at \citep{IdrisLibsValVal}, together with some example code.
All source files can be type checked with Idris~1.3.2.
\vfill
\pagebreak
\ No newline at end of file
%\vfill
%\pagebreak
\ No newline at end of file
main: main.tex
latexmk -pdf main.tex
final: main
latex main.tex
dvips main.dvi
ps2pdf main.ps
LHS2TEX = lhs2TeX
......
......@@ -58,8 +58,8 @@ on total rewards which is usually aggregated using the familiar
\emph{expected value} measure. The value thus obtained is called the
\emph{expected total reward} \citep[ch.~4.1.2]{puterman2014markov} and
its role is central: It is the quantity that is to be optimised in
an SDP.\\
%
an SDP.
In monadic SDPs, the measure is generic, i.e. it is not fixed in advance
but has to be given as part of the specification of a concrete problem.
Therefore we will generalise the notion of \emph{expected total reward} to
......@@ -76,8 +76,8 @@ Similarly, we define that \emph{solving a monadic SDP} consists in
This means that when starting from any initial state at decision step
|t|, following the computed list of rules for selecting controls will
result in a value that is maximal as measure of the sum of rewards
along all possible trajectories rooted in that initial state.\\
%
along all possible trajectories rooted in that initial state.
Equivalently, rewards can instead be considered as \emph{costs}
that need to be \emph{minimised}. This dual perspective is taken e.g.
in \citep{bertsekas1995}. In the subsequent sections we will follow
......@@ -141,22 +141,27 @@ Fig.~\ref{fig:examplesGraph}.
\begin{subfigure}[b]{.35\textwidth-0.55cm}
\centering
\scalebox{0.7}{
%\scalebox{0.7}{
\small
\input{NFAexample.tex}
\includegraphics[height=0.26\textheight]{img/fig1a.eps}
% \small
% \input{NFAexample.tex}
}
%}
\caption{Example~1}
\label{fig:example1}
\end{subfigure}
\begin{subfigure}[b]{.65\textwidth}
\centering
\scalebox{0.7}{
%\scalebox{0.7}{
\input{SchedulingExample.tex}
\vspace{0.3cm}
}
\includegraphics[height=0.25\textheight]{img/fig1b.eps}
% \input{SchedulingExample.tex}
%\vspace{0.3cm}
% }
\caption{Example~2}
\label{fig:example2}
......@@ -218,8 +223,8 @@ these lines we refer the interested reader to~\citep{esd-9-525-2018}.
Scheduling
problems serve as canonical examples in control theory textbooks. The
one we present here is a slightly modified version of
\citep[Example~1.1.2]{bertsekas1995}.\\
%
\citep[Example~1.1.2]{bertsekas1995}.
Think of some machine in a factory that can perform different
operations, say $A, B, C$ and $D$. Each of these operations is supposed
to be performed once. The machine can only perform one operation at a
......
......@@ -71,8 +71,8 @@ starting at that state:
where we use |StateCtrlSeq| as type of trajectories. Essentially it is
a non-empty list of (dependent) state/control pairs, with the exception of the base case
which is a singleton just containing the last state reached.\\
%
which is a singleton just containing the last state reached.
Furthermore, we can compute the \emph{total reward} for a single
trajectory, i.e. its sum of rewards:
......@@ -114,7 +114,7 @@ of the sum of rewards along all possible trajectories of length |n|
that are rooted in an initial state at step |t|.
Again by analogy to the stochastic case, we define monadic
backward induction to be correct, if for a given SDP the policy
backward induction to be correct if, for a given SDP, the policy
sequence computed by |bi| is the solution to the SDP.
I.e., we consider |bi| to be correct if it meets the specification
%
......@@ -230,7 +230,8 @@ and it is not ``obviously clear'' that |val| and |val'| are
extensionally equal without further knowledge about |meas|.
In the deterministic case, i.e. for |M = Id| and
|meas = id|, they are indeed equal without imposing any further
|meas = id|, |val ps x| and |val' ps x| are indeed equal for all |ps|
and |x|, without imposing any further
conditions (as we will see in Sec.~\ref{section:valval}).
For the stochastic case, \cite[Theorem
4.2.1]{puterman2014markov} suggests that the equality
......
......@@ -85,9 +85,9 @@ interaction of the measure with the monad structure and the
|<+>|-operator on |Val|.
%
Machine-checked proofs are given in the
appendices~\ref{appendix:theorem},~\ref{appendix:biCorrectness} and
Appendices~\ref{appendix:theorem},~\ref{appendix:biCorrectness} and
\ref{appendix:lemmas}. The monad laws we use are stated in
appendix~\ref{appendix:monadLaws}. In the remainder of this section,
Appendix~\ref{appendix:monadLaws}. In the remainder of this section,
we discuss semi-formal versions of the proofs.\\
\paragraph*{Monad algebras.} \hspace{0.1cm} The first lemma allows us
......@@ -229,8 +229,8 @@ that the equality holds.\\
The induction hypothesis (|IH|) is:
for all |x : X t|, |val' ps x = val ps x|. We have to show that
|IH| implies that for all |p : Policy t| and |x : X t|, the equality
|val' (p :: ps) x = val (p :: ps) x| holds.\\
%
|val' (p :: ps) x = val (p :: ps) x| holds.
For brevity (and to economise on brackets), let in the following
|y = p x|, |mx' = next t x y|, |r = reward t x y|, |trjps = trj
ps|, and |consxy = ((x ** y) :::)|.
......@@ -295,7 +295,7 @@ may be uninteresting for a pen and paper proof, but turn out to be
crucial in the setting of an intensional type theory -- like Idris --
where function extensionality does not hold in general.
In particular, we have to postulate that the functorial |map|
preserves extensional equality (see appendix~\ref{appendix:monadLaws}
preserves extensional equality (see Appendix~\ref{appendix:monadLaws}
and \citep{botta2020extensional}) for Idris to accept the proof.
In fact, most of the reasoning proceeds by replacing functions that are mapped
onto monadic values by other functions that are only extensionally
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment