% -*-Latex-*-
%if False
> module Theorem
Remark: The statements and proofs in this file are just "pseudo-code".
The whole theory including the proofs is contained in the
file Appendix.lidr (and in the supplementary material).
%endif
%\pagebreak
\section{Correctness Proofs}
\label{section:valval}
In this section we show that |val|
(Sec.~\ref{subsection:solution_components}) and
|val'| (Sec.~\ref{section:preparation}) are extensionally equal
< valMeasTotalReward : {t, n : Nat} -> (ps : PolicySeq t n) -> (x : X t) -> val' ps x = val ps x
given the three conditions from the previous section hold. As a
corollary we then obtain our correctness result for monadic backward
induction.
We can understand the proof of |valMeasTotalReward| as an optimising
program transformation from the less efficient but ``obviously
correct'' implementation |val'| to the more efficient implementation
|val|. Therefore the equational reasoning proofs in this section will
proceed from |val'| to |val|. In Sec.~\ref{section:conditions} we
have stated sufficient conditions for this transformation to be
possible: |measPureSpec|, |measJoinSpec|, |measPlusSpec|. We also have
seen the different computational patterns that the two implementations
exhibit: While |val'| first computes all possible trajectories for the
given policy sequence and initial state, then computes their
individual sum of rewards and finally applies the measure once, |val|
computes its final result by adding the current reward to an
intermediate outcome and applying the measure locally at each decision
step. This suggests that a transformation from |val'| to |val| will
essentially have to push the application of the measure into the
recursive computation of the sum of rewards. The proof will be carried
out by induction on the structure of policy sequences.
\subsection{Deterministic Case}
\label{subsection:detCase}
%
To get a first intuition, let's have a look at what
the induction step looks like in the deterministic case
(i.e. if we fix monad and measure to be identities):
{\linespread{1.5}
< valMeasTotalReward (p :: ps) x =
< ( val' (p :: ps) x ) ={ definitionprf val' }=
< ( sumR ((x ** y) ## trj ps x')) ={ definitionprf sumR }=
< ( r (head (trj ps x')) <+> val' ps x' ) ={ useprf headLemma }=
< ( r x' <+> val' ps x' ) ={ useprf IHprf }=
< ( r x' <+> val ps x' ) ={ definitionprf val }=
< ( val (p :: ps) x ) QED
}
\noindent where |y = p x|, |x' = next t x y| and |r = reward t x y|. In the
proof sketch, we have first applied the definitions of |val'| and
|sumR|. Using the fact that in the deterministic case |trj| returns
exactly one state-control sequence and that the |head| of any
trajectory starting in |x'| is just |x'| (let us call the latter
|headLemma|), the left-hand side of the sum simplifies to |r x'|. Its
right-hand side amounts to |val' ps x'| so that we can apply the
induction hypothesis. The rest of the proof
only relies on definitional equalities. Thus in the deterministic case
|val| and |val'| are unconditionally extensionally equal -- or rather,
the conditions of Sec.~\ref{section:conditions} are trivially fulfilled.
\subsection{Lemmas}
\label{subsection:lemmas}
%
To prove the general, monadic case, we proceed similarly.
This time, however, the situation is complicated by the presence of
the abstract monad |M|. Instead of being able to use the type structure
of some concrete monad, we need to leverage on the properties of |M|,
|meas| and |<+>| postulated in Sec.~\ref{section:conditions}. To
facilitate the main proof, we first prove three lemmas about the
interaction of the measure with the monad structure and the
|<+>|-operator on |Val|.
%
Machine-checked proofs are given in the
Appendices~\ref{appendix:theorem},~\ref{appendix:biCorrectness} and
\ref{appendix:lemmas}. The monad laws we use are stated in
Appendix~\ref{appendix:monadLaws}. In the remainder of this section,
we discuss semi-formal versions of the proofs.\\
\paragraph*{Monad algebras.} \hspace{0.1cm} The first lemma allows us
to lift and eliminate an application of the monad's |join| operation:
< measAlgLemma : {A, B: Type} -> (f : B -> Val ) -> (g : A -> M B) ->
< (meas . map (meas . map f . g)) `ExtEq` (meas . map f . join . map g)
The proof of this lemma hinges on the condition |measJoinSpec|. It
allows to trade the application of |join| against an application of
|map meas|. The rest is just standard reasoning with monad and functor
laws, i.e. we use that the functorial map for |M| preserves
composition and that |join| is a natural transformation:
< measAlgLemma f g ma =
<
< ( (meas . map (meas . map f . g)) ma ) ={ mapPresCompPrf }=
<
< ( (meas . map (meas . map f) . map g) ma ) ={ mapPresCompPrf }=
<
< ( (meas . map meas . map (map f) . map g) ma ) ={ useprf measJoinSpec }=
<
< ( (meas . join . map (map f) . map g) ma ) ={ joinNatTransPrf }=
<
< ( (meas . map f . join . map g) ma ) QED
This lemma is generic in the sense that it holds for arbitrary
Eilenberg-Moore algebras of a monad. Here we prove it for the
framework's measure |meas|, but note that in the appendix we prove a
generic version that is then appropriately instantiated.
\paragraph*{Head/trajectory interaction.} \hspace{0.1cm}
The second lemma amounts to a lifted version of |headLemma| in the
deterministic case. Mapping |head| onto an |M|-structure of
trajectories computed with |trj| results in an |M|-structure filled
with the initial states of these trajectories; similarly, mapping |(r
. head <++> s)| onto |trj ps x| for functions |r| and |s| of
appropriate type is the same as mapping |(const (r x) <++> s)| onto
|trj ps x| (where |const| is the constant function). We can prove
< headTrjLemma : {t, n : Nat} -> (ps : PolicySeq t n) -> (r : X t -> Val) ->
< (s : StateCtrlSeq t (S n) -> Val) -> (x : X t) ->
< (map (r . head <++> s) . trj ps) x =
< (map (const (r x) <++> s) . trj ps) x
by doing a case split on |ps|. In case |ps = Nil|, the equality holds
because the monad's |pure| is a natural transformation and in case |ps
= p :: ps'| because |M|'s functorial |map| preserves composition.
\paragraph*{Measure/sum interaction.} \hspace{0.1cm}
The third lemma allows us to both commute the measure into the right
summand of an |<++>|-sum and to perform the head/trajectory
simplification. It lies at the core of the relationship between |val| and |val'|.
< measSumLemma : {t, n : Nat} -> (ps : PolicySeq t n) ->
< (r : X t -> Val) ->
< (s : StateCtrlSeq t (S n) -> Val) ->
< (meas . map (r . head <++> s) . trj ps) `ExtEq`
< (r <++> meas . map s . trj ps)
Recall that our third condition from
Sec.~\ref{section:conditions}, |measPlusSpec|, plays the role of a
distributive law and allows us to ``factor out'' a partially applied
sum |(v <+>)| for arbitrary |v : Val|.
Given that |measPlusSpec| holds, the lemma is provable by simple
equational reasoning using the above head-trajectory lemma and the fact
that map preserves composition:
< measSumLemma ps r s x' =
<
< ( (meas . map (r . head <++> s) . trj ps) x' ) ={ useprf headTrjLemma }=
<
< ( (meas . map (const (r x') <++> s) . trj ps) x' ) ={ definitionprf <++>, . }=
<
< ( (meas . map ((const (r x') <++> id) . s) . trj ps) x' ) ={ mapPresCompPrf }=
<
< ( (meas . map (const (r x') <++> id) . map s . trj ps) x' ) ={ definitionprf <++> }=
<
< ( (meas . map ((r x') <+>) . map s . trj ps) x' ) ={ useprf measPlusSpec }=
<
< ( (((r x') <+>) . meas . map s . trj ps) x' ) ={ definitionprf <++> }=
<
< ( (r <++> meas . map s . trj ps) x' ) QED
Notice how |measPlusSpec| is used to transform an application of
|meas . map ((r x') <+>)| into an application of |((r x') <+>) . meas|.
This is essential to simplify the computation of the measured total reward:
instead of first adding the current reward to the intermediate outcome of each
individual trajectory and then measuring the outcomes, one can first measure the
intermediate outcomes of the trajectories and then add the current reward.
\subsection{Correctness of the BJI-value function}
\label{subsection:valvalTh}
%
With the above lemmas in place, we now prove that |val| is
extensionally equal to |val'|.
\vspace{0.15cm}
Let |t, n : Nat|, |ps : PolicySeq t n|. We prove |valMeasTotalReward|
by induction on |ps|.
\vspace{0.15cm}
\paragraph*{Base case.}\hspace{0.1cm}
We need to show that for all |x : X t|, |val'
Nil x| = |val Nil x|. The right hand side of this equation
reduces to |zero| by definition.
The left hand side can be simplified to |meas (pure zero)|
since |pure| is a natural transformation. At this point,
our first condition, |measPureSpec|, comes into play: Using that
|meas| is inverse to |pure| on the left, we can conclude
that the equality holds.\\
\noindent In equational reasoning style: For all |x : X t|,
{\linespread{1.2}
< valMeasTotalReward Nil x =
<
< ( val' Nil x ) ={ definitionprf val' }=
<
< ( meas (map sumR (trj Nil x)) ) ={ definitionprf trj }=
<
< ( meas (map sumR (pure (Last x))) ) ={ pureNatTransPrf }=
<
< ( meas (pure (sumR (Last x))) ) ={ definitionprf sumR }=
<
< ( meas (pure zero) ) ={ useprf measPureSpec }=
<
< ( zero ) ={ definitionprf val }=
<
< ( val Nil x )
}
\paragraph*{Step case.}\hspace{0.1cm}
The induction hypothesis (|IH|) is:
for all |x : X t|, |val' ps x = val ps x|. We have to show that
|IH| implies that for all |p : Policy t| and |x : X t|, the equality
|val' (p :: ps) x = val (p :: ps) x| holds.
For brevity (and to economise on brackets), let in the following
|y = p x|, |mx' = next t x y|, |r = reward t x y|, |trjps = trj
ps|, and |consxy = ((x ** y) :::)|.
As in the base case, all that has to be done on the |val|-side of the
equation only depends on definitional equality. However it is more
involved to bring the |val'|-side into a form in which the induction
hypothesis can be applied. This is where we leverage on the lemmas
proved above.
By definition and because |map| preserves
composition, we know that |val' (p :: ps) x| is equal to
|(meas . map ((r . head) <++> sumR)) (mx' >>= trjps)|.
We use the relation between the monad's |bind| and |join| to eliminate
the |bind|-operator from the term.
Now we can apply the first lemma from above, |measAlgLemma|, to lift
and eliminate the |join| operation.
To commute the measure under the |<++>| and get rid of the application
of |head|, we use our third lemma, |measSumLemma|.
At this point we can apply the induction hypothesis and the resulting
term is equal to |val ps x| by definition.\\
\noindent The more detailed equational reasoning proof:
%
\footnote{We are very grateful to the anonymous reviewer who suggested
an alternative proof for the induction step. The proof presented
here is based on his proof, and his suggestions have lead to
significantly weaker conditions on the measure and thus a stronger
result.}
< valMeasTotalReward (p :: ps) x =
<
< ( val' (p :: ps) x ) ={ definitionprf val' }=
<
< ( meas (map sumR (trj (p :: ps) x)) ) ={ definitionprf trj }=
<
< ( meas (map sumR (map consxy (mx' >>= trjps))) ) ={ mapPresCompPrf }=
<
< ( meas (map (sumR . consxy) (mx' >>= trjps)) ) ={ definitionprf sumR}=
<
< ( meas (map ((r . head) <++> sumR) (mx' >>= trjps)) ) ={ bindJoinSpecPrf }=
<
< ( meas (map ((r . head) <++> sumR) (join (map trjps mx'))) ) ={ useprf measAlgLemma }=
<
< ( meas (map (meas . map (r . head <++> sumR) . trjps) mx') ) ={ useprf measSumLemma }=
<
< ( meas (map (r <++> meas . map sumR . trjps) mx') ) ={ definitionprf val' }=
<
< ( meas (map (r <++> val' ps) mx') ) ={ useprf IHprf }=
<
< ( meas (map (r <++> val ps) mx') ) ={ definitionprf val }=
<
< ( val (p :: ps) x )
\hfill $\Box$
\paragraph*{Technical remarks.} \hspace{0.1cm}
The above proof of |valMeasTotalReward| omits some technical details that
may be uninteresting for a pen and paper proof, but turn out to be
crucial in the setting of an intensional type theory -- like Idris --
where function extensionality does not hold in general.
In particular, we have to postulate that the functorial |map|
preserves extensional equality (see Appendix~\ref{appendix:monadLaws}
and \citep{botta2020extensional}) for Idris to accept the proof.
In fact, most of the reasoning proceeds by replacing functions that are mapped
onto monadic values by other functions that are only extensionally
equal. Using that |map| preserves extensional equality
allows to carry out such proofs generically without knowledge of
the concrete structure of the functor.
\subsection{Correctness of monadic backward induction}
\label{subsection:biCorrectness}
As corollary, we can now prove the correctness of monadic backward induction,
namely that the policy sequences computed by |bi| are optimal with
respect to the measured total reward computed by |val'|:
< biOptMeasTotalReward : (t, n : Nat) -> GenOptPolicySeq val' (bi t n)
<
< biOptMeasTotalReward t n ps' x =
< let vvEqL = sym (valMeasTotalReward ps' x) in
< let vvEqR = sym (valMeasTotalReward (bi t n) x) in
< let biOpt = biOptVal t n ps' x in
< replace vvEqR (replace vvEqL biOpt)