Commit 001d4f0b by Nicola Botta

### Final.

parent c1f2186a
 ... @@ -621,7 +621,7 @@ Optimal extensions: ... @@ -621,7 +621,7 @@ Optimal extensions: %endif %endif Together with the result of \bottaetal (|biOptVal|, see Together with the result of \bottaetal (|biOptVal|, see appendix~\ref{appendix:bilemma} below) we can prove the correctness Appendix~\ref{appendix:bilemma} below) we can prove the correctness of monadic backward induction as corollary, using a of monadic backward induction as corollary, using a generalised optimality of policy sequences predicate: generalised optimality of policy sequences predicate: ... ...
 ... @@ -44,18 +44,19 @@ extensional equality of |val| and |val'|: ... @@ -44,18 +44,19 @@ extensional equality of |val| and |val'|: \item[{\bf Condition 1.}] \item[{\bf Condition 1.}] The measure needs to be left-inverse to |pure|: The measure needs to be left-inverse to |pure|: \footnote{The symbol |ExtEq| denotes \emph{extensional} equality, \footnote{The symbol |ExtEq| denotes \emph{extensional} equality, see appendix~\ref{appendix:monadLaws}} see Appendix~\ref{appendix:monadLaws}} < measPureSpec : meas . pure ExtEq id < measPureSpec : meas . pure ExtEq id \begin{center} \begin{center} \begin{tikzcd}[column sep=2cm] \includegraphics{img/diag1.eps} % % \begin{tikzcd}[column sep=2cm] Val \arrow[d, "pure", swap] \arrow[dr, "id"] \\ % % M Val \arrow[r, "meas", swap] & Val\\ % Val \arrow[d, "pure", swap] \arrow[dr, "id"] \\ % % M Val \arrow[r, "meas", swap] & Val\\ \end{tikzcd} % % % \end{tikzcd} \end{center} \end{center} % measJoinSpec % measJoinSpec ... @@ -67,12 +68,13 @@ equal to applying it after |map meas|: ... @@ -67,12 +68,13 @@ equal to applying it after |map meas|: < measJoinSpec : meas . join ExtEq meas . map meas < measJoinSpec : meas . join ExtEq meas . map meas \begin{center} \begin{center} \begin{tikzcd}[column sep=3cm] \includegraphics{img/diag2.eps} % % \begin{tikzcd}[column sep=3cm] M (M Val) \arrow[r, "map\ meas"] \arrow[d, "join", swap] & M Val \arrow[d, "meas" ] \\ % % M Val \arrow[r, "meas", swap] & Val\\ % M (M Val) \arrow[r, "map\ meas"] \arrow[d, "join", swap] & M Val \arrow[d, "meas" ] \\ % % M Val \arrow[r, "meas", swap] & Val\\ \end{tikzcd} % % % \end{tikzcd} \end{center} \end{center} % measPlusSpec % measPlusSpec ... @@ -87,14 +89,15 @@ applying |(v <+>)| after the measure: ... @@ -87,14 +89,15 @@ applying |(v <+>)| after the measure: < (meas . map (v <+>)) mv = ((v <+>) . meas) mv < (meas . map (v <+>)) mv = ((v <+>) . meas) mv \begin{center} \begin{center} \begin{tikzcd}[column sep=3cm] \includegraphics{img/diag3.eps} % % \begin{tikzcd}[column sep=3cm] M Val \arrow[r, "map (v\, \oplus)"] \arrow[d, "meas", swap] & M Val \arrow[d, "meas" ] \\ % % Val \arrow[r, "(v\, \oplus)", swap] & Val\\ % M Val \arrow[r, "map (v\, \oplus)"] \arrow[d, "meas", swap] & M Val \arrow[d, "meas" ] \\ % Val \arrow[r, "(v\, \oplus)", swap] & Val\\ % % % \end{tikzcd} % % \end{tikzcd} %\vspace{0.1cm} \vspace{0.1cm} \end{center} \end{center} \end{itemize} \end{itemize} ... @@ -146,15 +149,19 @@ is neutral for |*|. ... @@ -146,15 +149,19 @@ is neutral for |*|. The second and the third condition require some arithmetic reasoning, so The second and the third condition require some arithmetic reasoning, so let us just consider them for two examples. let us just consider them for two examples. Say we have distributions Let |a, b, c, d| be variables of type |Double| and say we have distributions %if False > parameters (a, b, c, d, e : Double) > parameters (a, b, c, d : Double) %endif > dps1 : Dist Double > dps1 : Dist Double > dps1 = [(a, 0.5), (b, 0.3), (c, 0.2)] > dps1 = [(a, 0.5), (b, 0.3), (c, 0.2)] > dps2 : Dist Double > dps2 : Dist Double > dps2 = [(d, 0.4), (e, 0.6)] > dps2 = [(a, 0.4), (d, 0.6)] > dpdps : Dist (Dist Double) > dpdps : Dist (Dist Double) > dpdps = [(dps1, 0.1), (dps2, 0.9)] > dpdps = [(dps1, 0.1), (dps2, 0.9)] ... @@ -170,13 +177,13 @@ addition and multiplication: ... @@ -170,13 +177,13 @@ addition and multiplication: < (expected . distJoin) dpdps = < (expected . distJoin) dpdps = < < < expected [(a, 0.5 * 0.1), (b, 0.3 * 0.1), (c, 0.2 * 0.1), (d, 0.4 * 0.9), (e, 0.6 * 0.9)] = < expected [(a, 0.5 * 0.1), (b, 0.3 * 0.1), (c, 0.2 * 0.1), (a, 0.4 * 0.9), (d, 0.6 * 0.9)] = < < < (a * 0.5 * 0.1) + (b * 0.3 * 0.1) + (c * 0.2 * 0.1) + (d * 0.4 * 0.9) + (e * 0.6 * 0.9) = < (a * 0.5 * 0.1) + (b * 0.3 * 0.1) + (c * 0.2 * 0.1) + (a * 0.4 * 0.9) + (d * 0.6 * 0.9) = < < < ((a * 0.5 + b * 0.3 + c * 0.2) * 0.1 + (d * 0.4 + e * 0.6) * 0.9 = < ((a * 0.5 + b * 0.3 + c * 0.2) * 0.1 + (a * 0.4 + d * 0.6) * 0.9 = < < < expected [(a * 0.5 + b * 0.3 + c * 0.2, 0.1) , (d * 0.4 + e * 0.6, 0.9)] = < expected [(a * 0.5 + b * 0.3 + c * 0.2, 0.1) , (a * 0.4 + d * 0.6, 0.9)] = < < < (expected . distMap expected) dpdps < (expected . distMap expected) dpdps ... ...
 ... @@ -43,7 +43,8 @@ control theory for finite-horizon, discrete-time SDPs. It extends ... @@ -43,7 +43,8 @@ control theory for finite-horizon, discrete-time SDPs. It extends mathematical formulations for stochastic mathematical formulations for stochastic SDPs \citep{bertsekas1995, bertsekasShreve96, puterman2014markov} SDPs \citep{bertsekas1995, bertsekasShreve96, puterman2014markov} to the general problem of optimal decision making under \emph{monadic} to the general problem of optimal decision making under \emph{monadic} uncertainty.\\ uncertainty. For monadic SDPs, the framework provides a For monadic SDPs, the framework provides a generic implementation of backward induction. It has been applied to generic implementation of backward induction. It has been applied to study the impact of uncertainties on optimal emission policies study the impact of uncertainties on optimal emission policies ... @@ -156,7 +157,8 @@ Note that for deterministic problems it is unnecessary to parameterise ... @@ -156,7 +157,8 @@ Note that for deterministic problems it is unnecessary to parameterise the reward function over the next state as it is unique and can thus be the reward function over the next state as it is unique and can thus be obtained from the current state and control. But for non-deterministic obtained from the current state and control. But for non-deterministic problems it is useful to be able to assign rewards depending on the problems it is useful to be able to assign rewards depending on the (uncertain) outcome of a transition.\\ (uncertain) outcome of a transition. A few remarks are at place here. A few remarks are at place here. \begin{itemize} \begin{itemize} ... @@ -477,7 +479,7 @@ Like the reference value |zero| discussed above, |plusMonSpec| and ... @@ -477,7 +479,7 @@ Like the reference value |zero| discussed above, |plusMonSpec| and |measMonSpec| are specification components of the BJI-framework that |measMonSpec| are specification components of the BJI-framework that we have not discussed in Sec.~\ref{subsection:specification_components}. we have not discussed in Sec.~\ref{subsection:specification_components}. % % We provide a proof of |Bellman| in appendix~\ref{appendix:Bellman}. As one We provide a proof of |Bellman| in Appendix~\ref{appendix:Bellman}. As one would expect, the proof makes essential use of the recursive definition of would expect, the proof makes essential use of the recursive definition of the function |val| discussed above. the function |val| discussed above. % % ... @@ -597,7 +599,7 @@ orthogonal to the purpose of the current paper. ... @@ -597,7 +599,7 @@ orthogonal to the purpose of the current paper. For the same reason we have not addressed the question of how to make |bi| For the same reason we have not addressed the question of how to make |bi| more efficient by tabulation. more efficient by tabulation. We briefly discuss the specification and implementation of optimal extensions We briefly discuss the specification and implementation of optimal extensions in the BJI-framework in appendix~\ref{appendix:optimal_extension}. in the BJI-framework in Appendix~\ref{appendix:optimal_extension}. We refer the reader interested in tabulation of |bi| to We refer the reader interested in tabulation of |bi| to \href{https://gitlab.pik-potsdam.de/botta/IdrisLibs/-/blob/master/SequentialDecisionProblems/TabBackwardsInduction.lidr} \href{https://gitlab.pik-potsdam.de/botta/IdrisLibs/-/blob/master/SequentialDecisionProblems/TabBackwardsInduction.lidr} {SequentialDecisionProblems.TabBackwardsInduction} {SequentialDecisionProblems.TabBackwardsInduction} ... ...
 ... @@ -156,5 +156,5 @@ The sources of this document have been written in literal Idris and are ... @@ -156,5 +156,5 @@ The sources of this document have been written in literal Idris and are available at \citep{IdrisLibsValVal}, together with some example code. available at \citep{IdrisLibsValVal}, together with some example code. All source files can be type checked with Idris~1.3.2. All source files can be type checked with Idris~1.3.2. \vfill %\vfill \pagebreak %\pagebreak \ No newline at end of file \ No newline at end of file
 main: main.tex main: main.tex latexmk -pdf main.tex latexmk -pdf main.tex final: main latex main.tex dvips main.dvi ps2pdf main.ps LHS2TEX = lhs2TeX LHS2TEX = lhs2TeX ... ...
 ... @@ -58,8 +58,8 @@ on total rewards which is usually aggregated using the familiar ... @@ -58,8 +58,8 @@ on total rewards which is usually aggregated using the familiar \emph{expected value} measure. The value thus obtained is called the \emph{expected value} measure. The value thus obtained is called the \emph{expected total reward} \citep[ch.~4.1.2]{puterman2014markov} and \emph{expected total reward} \citep[ch.~4.1.2]{puterman2014markov} and its role is central: It is the quantity that is to be optimised in its role is central: It is the quantity that is to be optimised in an SDP.\\ an SDP. % In monadic SDPs, the measure is generic, i.e. it is not fixed in advance In monadic SDPs, the measure is generic, i.e. it is not fixed in advance but has to be given as part of the specification of a concrete problem. but has to be given as part of the specification of a concrete problem. Therefore we will generalise the notion of \emph{expected total reward} to Therefore we will generalise the notion of \emph{expected total reward} to ... @@ -76,8 +76,8 @@ Similarly, we define that \emph{solving a monadic SDP} consists in ... @@ -76,8 +76,8 @@ Similarly, we define that \emph{solving a monadic SDP} consists in This means that when starting from any initial state at decision step This means that when starting from any initial state at decision step |t|, following the computed list of rules for selecting controls will |t|, following the computed list of rules for selecting controls will result in a value that is maximal as measure of the sum of rewards result in a value that is maximal as measure of the sum of rewards along all possible trajectories rooted in that initial state.\\ along all possible trajectories rooted in that initial state. % Equivalently, rewards can instead be considered as \emph{costs} Equivalently, rewards can instead be considered as \emph{costs} that need to be \emph{minimised}. This dual perspective is taken e.g. that need to be \emph{minimised}. This dual perspective is taken e.g. in \citep{bertsekas1995}. In the subsequent sections we will follow in \citep{bertsekas1995}. In the subsequent sections we will follow ... @@ -141,22 +141,27 @@ Fig.~\ref{fig:examplesGraph}. ... @@ -141,22 +141,27 @@ Fig.~\ref{fig:examplesGraph}. \begin{subfigure}[b]{.35\textwidth-0.55cm} \begin{subfigure}[b]{.35\textwidth-0.55cm} \centering \centering \scalebox{0.7}{ %\scalebox{0.7}{ \small \includegraphics[height=0.26\textheight]{img/fig1a.eps} \input{NFAexample.tex} % \small % \input{NFAexample.tex} } %} \caption{Example~1} \caption{Example~1} \label{fig:example1} \label{fig:example1} \end{subfigure} \end{subfigure} \begin{subfigure}[b]{.65\textwidth} \begin{subfigure}[b]{.65\textwidth} \centering \centering \scalebox{0.7}{ %\scalebox{0.7}{ \input{SchedulingExample.tex} \vspace{0.3cm} \includegraphics[height=0.25\textheight]{img/fig1b.eps} } % \input{SchedulingExample.tex} %\vspace{0.3cm} % } \caption{Example~2} \caption{Example~2} \label{fig:example2} \label{fig:example2} ... @@ -218,8 +223,8 @@ these lines we refer the interested reader to~\citep{esd-9-525-2018}. ... @@ -218,8 +223,8 @@ these lines we refer the interested reader to~\citep{esd-9-525-2018}. Scheduling Scheduling problems serve as canonical examples in control theory textbooks. The problems serve as canonical examples in control theory textbooks. The one we present here is a slightly modified version of one we present here is a slightly modified version of \citep[Example~1.1.2]{bertsekas1995}.\\ \citep[Example~1.1.2]{bertsekas1995}. % Think of some machine in a factory that can perform different Think of some machine in a factory that can perform different operations, say $A, B, C$ and $D$. Each of these operations is supposed operations, say $A, B, C$ and $D$. Each of these operations is supposed to be performed once. The machine can only perform one operation at a to be performed once. The machine can only perform one operation at a ... ...
 ... @@ -71,8 +71,8 @@ starting at that state: ... @@ -71,8 +71,8 @@ starting at that state: where we use |StateCtrlSeq| as type of trajectories. Essentially it is where we use |StateCtrlSeq| as type of trajectories. Essentially it is a non-empty list of (dependent) state/control pairs, with the exception of the base case a non-empty list of (dependent) state/control pairs, with the exception of the base case which is a singleton just containing the last state reached.\\ which is a singleton just containing the last state reached. % Furthermore, we can compute the \emph{total reward} for a single Furthermore, we can compute the \emph{total reward} for a single trajectory, i.e. its sum of rewards: trajectory, i.e. its sum of rewards: ... @@ -114,7 +114,7 @@ of the sum of rewards along all possible trajectories of length |n| ... @@ -114,7 +114,7 @@ of the sum of rewards along all possible trajectories of length |n| that are rooted in an initial state at step |t|. that are rooted in an initial state at step |t|. Again by analogy to the stochastic case, we define monadic Again by analogy to the stochastic case, we define monadic backward induction to be correct, if for a given SDP the policy backward induction to be correct if, for a given SDP, the policy sequence computed by |bi| is the solution to the SDP. sequence computed by |bi| is the solution to the SDP. I.e., we consider |bi| to be correct if it meets the specification I.e., we consider |bi| to be correct if it meets the specification % % ... @@ -230,7 +230,8 @@ and it is not obviously clear'' that |val| and |val'| are ... @@ -230,7 +230,8 @@ and it is not obviously clear'' that |val| and |val'| are extensionally equal without further knowledge about |meas|. extensionally equal without further knowledge about |meas|. In the deterministic case, i.e. for |M = Id| and In the deterministic case, i.e. for |M = Id| and |meas = id|, they are indeed equal without imposing any further |meas = id|, |val ps x| and |val' ps x| are indeed equal for all |ps| and |x|, without imposing any further conditions (as we will see in Sec.~\ref{section:valval}). conditions (as we will see in Sec.~\ref{section:valval}). For the stochastic case, \cite[Theorem For the stochastic case, \cite[Theorem 4.2.1]{puterman2014markov} suggests that the equality 4.2.1]{puterman2014markov} suggests that the equality ... ...
This diff is collapsed.