Theorem.lidr 13.4 KB
Newer Older
Nuria Brede's avatar
Nuria Brede committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
% -*-Latex-*-

%if False

> module Theorem

Remark: The statements and proofs in this file are just "pseudo-code".
        The whole theory including the proofs is contained in the
        file Appendix.lidr (and in the supplementary material).

%endif

%\pagebreak

\section{Correctness Proofs}
\label{section:valval}

In this section we show that |val|
Nuria Brede's avatar
Nuria Brede committed
19
20
(Sec.~\ref{subsection:solution_components}) and
|val'| (Sec.~\ref{section:preparation}) are extensionally equal
Nuria Brede's avatar
Nuria Brede committed
21
22
23

< valMeasTotalReward  :  {t, n : Nat} -> (ps : PolicySeq t n) -> (x : X t) -> val' ps x = val ps x

Nuria Brede's avatar
Nuria Brede committed
24
given the three conditions from the previous section hold. As a
Nuria Brede's avatar
Nuria Brede committed
25
26
27
corollary we then obtain our correctness result for monadic backward
induction.

Nuria Brede's avatar
Nuria Brede committed
28
We can understand the proof of |valMeasTotalReward| as an optimising
Nuria Brede's avatar
Nuria Brede committed
29
30
31
program transformation from the less efficient but ``obviously
correct'' implementation |val'| to the more efficient implementation
|val|. Therefore the equational reasoning proofs in this section will
Nuria Brede's avatar
Nuria Brede committed
32
proceed from |val'| to |val|. In Sec.~\ref{section:conditions} we
Nuria Brede's avatar
Nuria Brede committed
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
have stated sufficient conditions for this transformation to be
possible: |measPureSpec|, |measJoinSpec|, |measPlusSpec|. We also have
seen the different computational patterns that the two implementations
exhibit: While |val'| first computes all possible trajectories for the
given policy sequence and initial state, then computes their
individual sum of rewards and finally applies the measure once, |val|
computes its final result by adding the current reward to an
intermediate outcome and applying the measure locally at each decision
step. This suggests that a transformation from |val'| to |val| will
essentially have to push the application of the measure into the
recursive computation of the sum of rewards. The proof will be carried
out by induction on the structure of policy sequences.

\subsection{Deterministic Case}
\label{subsection:detCase}
%
To get a first intuition, let's have a look at what
the induction step looks like in the deterministic case
(i.e. if we fix monad and measure to be identities):

{\linespread{1.5}
< valMeasTotalReward (p :: ps) x =
<     ( val' (p :: ps) x )                     ={ definitionprf val' }=
Nuria Brede's avatar
Nuria Brede committed
56
<     ( sumR ((x ** y) ## trj ps x'))          ={ definitionprf sumR }=
Nuria Brede's avatar
Nuria Brede committed
57
58
59
60
61
62
63
64
65
66
67
<     ( r (head (trj ps x')) <+> val' ps x' )  ={ useprf headLemma }=
<     ( r x' <+> val' ps x' )                  ={ useprf IHprf }=
<     ( r x' <+> val ps x' )                   ={ definitionprf val }=
<     ( val (p :: ps) x )                      QED
}

\noindent where |y = p x|, |x' = next t x y| and |r = reward t x y|. In the
proof sketch, we have first applied the definitions of |val'| and
|sumR|. Using the fact that in the deterministic case |trj| returns
exactly one state-control sequence and that the |head| of any
trajectory starting in |x'| is just |x'| (let us call the latter
Nuria Brede's avatar
Nuria Brede committed
68
69
|headLemma|), the left-hand side of the sum simplifies to |r x'|. Its
right-hand side amounts to |val' ps x'| so that we can apply the
Nuria Brede's avatar
Nuria Brede committed
70
71
induction hypothesis. The rest of the proof
only relies on definitional equalities. Thus in the deterministic case
Nuria Brede's avatar
Nuria Brede committed
72
73
|val| and |val'| are unconditionally extensionally equal -- or rather,
the conditions of Sec.~\ref{section:conditions} are trivially fulfilled.
Nuria Brede's avatar
Nuria Brede committed
74
75
76
77
78
79
80
81

\subsection{Lemmas}
\label{subsection:lemmas}
%
To prove the general, monadic case, we proceed similarly.
This time, however, the situation is complicated by the presence of
the abstract monad |M|. Instead of being able to use the type structure
of some concrete monad, we need to leverage on the properties of |M|,
Nuria Brede's avatar
Nuria Brede committed
82
|meas| and |<+>| postulated in Sec.~\ref{section:conditions}. To
Nuria Brede's avatar
Nuria Brede committed
83
84
85
86
87
facilitate the main proof, we first prove three lemmas about the
interaction of the measure with the monad structure and the
|<+>|-operator on |Val|.
%
Machine-checked proofs are given in the
Nicola Botta's avatar
Final.    
Nicola Botta committed
88
Appendices~\ref{appendix:theorem},~\ref{appendix:biCorrectness} and
Nuria Brede's avatar
Nuria Brede committed
89
\ref{appendix:lemmas}. The monad laws we use are stated in
Nicola Botta's avatar
Final.    
Nicola Botta committed
90
Appendix~\ref{appendix:monadLaws}. In the remainder of this section,
Nuria Brede's avatar
Nuria Brede committed
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
we discuss semi-formal versions of the proofs.\\

\paragraph*{Monad algebras.} \hspace{0.1cm} The first lemma allows us
to lift and eliminate an application of the monad's |join| operation:

< measAlgLemma  :  {A, B: Type} -> (f : B -> Val ) -> (g : A -> M B) ->
<                  (meas . map (meas . map f . g)) `ExtEq` (meas . map f . join . map g)

The proof of this lemma hinges on the condition |measJoinSpec|. It
allows to trade the application of |join| against an application of
|map meas|. The rest is just standard reasoning with monad and functor
laws, i.e. we use that the functorial map for |M| preserves
composition and that |join| is a natural transformation:

< measAlgLemma f g ma =
<
Nuria Brede's avatar
Nuria Brede committed
107
<   ( (meas . map (meas . map f . g)) ma )          ={ mapPresCompPrf }=
Nuria Brede's avatar
Nuria Brede committed
108
<
Nuria Brede's avatar
Nuria Brede committed
109
<   ( (meas . map (meas . map f) . map g) ma )      ={ mapPresCompPrf }=
Nuria Brede's avatar
Nuria Brede committed
110
111
112
<
<   ( (meas . map meas . map (map f) . map g) ma )  ={ useprf measJoinSpec }=
<
Nuria Brede's avatar
Nuria Brede committed
113
<   ( (meas . join . map (map f) . map g) ma )      ={ joinNatTransPrf }=
Nuria Brede's avatar
Nuria Brede committed
114
115
116
117
118
119
120
121
122
123
124
125
126
<
<   ( (meas . map f . join . map g) ma )            QED


This lemma is generic in the sense that it holds for arbitrary
Eilenberg-Moore algebras of a monad. Here we prove it for the
framework's measure |meas|, but note that in the appendix we prove a
generic version that is then appropriately instantiated.

\paragraph*{Head/trajectory interaction.} \hspace{0.1cm}
The second lemma amounts to a lifted version of |headLemma| in the
deterministic case. Mapping |head| onto an |M|-structure of
trajectories computed with |trj| results in an |M|-structure filled
Nuria Brede's avatar
Nuria Brede committed
127
with the initial states of these trajectories; similarly, mapping |(r
Nuria Brede's avatar
Nuria Brede committed
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
. head <++> s)| onto |trj ps x| for functions |r| and |s| of
appropriate type is the same as mapping |(const (r x) <++> s)| onto
|trj ps x| (where |const| is the constant function). We can prove

< headTrjLemma  :  {t, n : Nat} -> (ps : PolicySeq t n) -> (r : X t -> Val) ->
<                  (s : StateCtrlSeq t (S n) -> Val) -> (x : X t) ->
<                  (map (r . head <++> s) . trj ps) x =
<                  (map (const (r x) <++> s) . trj ps) x

by doing a case split on |ps|. In case |ps = Nil|, the equality holds
because the monad's |pure| is a natural transformation and in case |ps
= p :: ps'| because |M|'s functorial |map| preserves composition.

\paragraph*{Measure/sum interaction.} \hspace{0.1cm}
 The third lemma allows us to both commute the measure into the right
 summand of an |<++>|-sum and to perform the head/trajectory
Nuria Brede's avatar
Nuria Brede committed
144
 simplification. It lies at the core of the relationship between |val| and |val'|.
Nuria Brede's avatar
Nuria Brede committed
145
146
147
148
149
150
151
152

< measSumLemma  :  {t, n : Nat} -> (ps : PolicySeq t n) ->
<                  (r : X t -> Val) ->
<                  (s : StateCtrlSeq t (S n) -> Val) ->
<                  (meas . map (r . head <++> s) . trj ps) `ExtEq`
<                  (r <++> meas . map s . trj ps)

Recall that our third condition from
Nuria Brede's avatar
Nuria Brede committed
153
Sec.~\ref{section:conditions}, |measPlusSpec|, plays the role of a
Nuria Brede's avatar
Nuria Brede committed
154
155
156
distributive law and allows us to ``factor out'' a partially applied
sum |(v <+>)| for arbitrary |v : Val|.
Given that |measPlusSpec| holds, the lemma is provable by simple
Nuria Brede's avatar
Nuria Brede committed
157
158
equational reasoning using the above head-trajectory lemma and the fact
that map preserves composition:
Nuria Brede's avatar
Nuria Brede committed
159
160
161
162
163
164
165

< measSumLemma ps r s x' =
<
<   ( (meas . map (r . head <++> s) . trj ps) x' )               ={ useprf headTrjLemma }=
<
<   ( (meas . map (const (r x') <++> s) . trj ps) x' )           ={ definitionprf <++>, . }=
<
Nuria Brede's avatar
Nuria Brede committed
166
<   ( (meas . map ((const (r x') <++> id) . s) . trj ps) x' )    ={ mapPresCompPrf }=
Nuria Brede's avatar
Nuria Brede committed
167
168
169
170
171
172
173
174
175
<
<   ( (meas . map (const (r x') <++> id) . map s . trj ps) x' )  ={ definitionprf <++> }=
<
<   ( (meas . map ((r x') <+>) . map s . trj ps) x' )            ={ useprf measPlusSpec }=
<
<   ( (((r x') <+>) . meas . map s . trj ps) x' )                ={ definitionprf <++> }=
<
<   ( (r <++> meas . map s . trj ps) x' )                        QED

Nuria Brede's avatar
Nuria Brede committed
176
177
178
179
180
181
182
Notice how |measPlusSpec| is used to transform an application of
|meas . map ((r x') <+>)| into an application of |((r x') <+>) . meas|.
This is essential to simplify the computation of the measured total reward:
instead of first adding the current reward to the intermediate outcome of each
individual trajectory and then measuring the outcomes, one can first measure the
intermediate outcomes of the trajectories and then add the current reward.  

Nuria Brede's avatar
Nuria Brede committed
183
184
185
186
187
188
189
190
191
192
193
194
195
196

\subsection{Correctness of the BJI-value function}
\label{subsection:valvalTh}
%
With the above lemmas in place, we now prove that |val| is
extensionally equal to |val'|.

\vspace{0.15cm}
Let |t, n : Nat|, |ps : PolicySeq t n|. We prove |valMeasTotalReward|
by induction on |ps|.

\vspace{0.15cm}
\paragraph*{Base case.}\hspace{0.1cm}
We need to show that for all |x : X t|, |val'
Nuria Brede's avatar
Nuria Brede committed
197
198
199
200
201
202
203
Nil x| = |val Nil x|. The right hand side of this equation
reduces to |zero| by definition.
The left hand side can be simplified to |meas (pure zero)|
since |pure| is a natural transformation. At this point,
our first condition, |measPureSpec|, comes into play: Using that
|meas| is inverse to |pure| on the left, we can conclude
that the equality holds.\\
Nuria Brede's avatar
Nuria Brede committed
204
205
206
207
208
209
210
211
212
213
214

\noindent In equational reasoning style: For all |x : X t|,

{\linespread{1.2}

< valMeasTotalReward Nil x =
<
<   ( val' Nil x )                         ={ definitionprf val' }=
<
<   ( meas (map sumR (trj Nil x)) )        ={ definitionprf trj }=
<
Nuria Brede's avatar
Nuria Brede committed
215
<   ( meas (map sumR (pure (Last x))) )    ={ pureNatTransPrf }=
Nuria Brede's avatar
Nuria Brede committed
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
<
<   ( meas (pure (sumR (Last x))) )        ={ definitionprf sumR }=
<
<   ( meas (pure zero) )                   ={ useprf measPureSpec }=
<
<   ( zero )                               ={ definitionprf val }=
<
<   ( val Nil x )

}


\paragraph*{Step case.}\hspace{0.1cm}
The induction hypothesis (|IH|) is:
for all |x : X t|, |val' ps x = val ps x|. We have to show that
|IH| implies that for all |p : Policy t| and |x : X t|, the equality
Nicola Botta's avatar
Final.    
Nicola Botta committed
232
233
|val' (p :: ps) x = val (p :: ps) x| holds.

Nuria Brede's avatar
Nuria Brede committed
234
For brevity (and to economise on brackets), let in the following
Nuria Brede's avatar
Nuria Brede committed
235
|y = p x|, |mx' = next t x y|, |r = reward t x y|, |trjps = trj
Nuria Brede's avatar
Nuria Brede committed
236
ps|, and |consxy = ((x ** y) :::)|.
Nuria Brede's avatar
Nuria Brede committed
237
238
239
240
241
242
243
244
245
246
247

As in the base case, all that has to be done on the |val|-side of the
equation only depends on definitional equality. However it is more
involved to bring the |val'|-side into a form in which the induction
hypothesis can be applied. This is where we leverage on the lemmas
proved above.

By definition and because |map| preserves
composition, we know that |val' (p :: ps) x| is equal to
|(meas . map ((r . head) <++> sumR)) (mx' >>= trjps)|.
We use the relation between the monad's |bind| and |join| to eliminate
Nuria Brede's avatar
Nuria Brede committed
248
the |bind|-operator from the term.
Nuria Brede's avatar
Nuria Brede committed
249
Now we can apply the first lemma from above, |measAlgLemma|, to lift
Nuria Brede's avatar
Nuria Brede committed
250
and eliminate the |join| operation.
Nuria Brede's avatar
Nuria Brede committed
251
252

To commute the measure under the |<++>| and get rid of the application
Nuria Brede's avatar
Nuria Brede committed
253
254
255
of |head|, we use our third lemma, |measSumLemma|.
At this point we can apply the induction hypothesis and the resulting
term is equal to |val ps x| by definition.\\
Nuria Brede's avatar
Nuria Brede committed
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270

\noindent The more detailed equational reasoning proof:
%
\footnote{We are very grateful to the anonymous reviewer who suggested
  an alternative proof for the induction step. The proof presented
  here is based on his proof, and his suggestions have lead to
  significantly weaker conditions on the measure and thus a stronger
  result.}

< valMeasTotalReward (p :: ps) x =
<
<  ( val' (p :: ps) x )                                          ={ definitionprf val' }=
<
<  ( meas (map sumR (trj (p :: ps) x)) )                         ={ definitionprf trj }=
<
Nuria Brede's avatar
Nuria Brede committed
271
<  ( meas (map sumR (map consxy (mx' >>= trjps))) )              ={ mapPresCompPrf }=
Nuria Brede's avatar
Nuria Brede committed
272
273
274
<
<  ( meas (map (sumR . consxy) (mx' >>= trjps)) )                ={ definitionprf sumR}=
<
Nuria Brede's avatar
Nuria Brede committed
275
<  ( meas (map ((r . head) <++> sumR) (mx' >>= trjps)) )         ={ bindJoinSpecPrf }=
Nuria Brede's avatar
Nuria Brede committed
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
<
<  ( meas (map ((r . head) <++> sumR) (join (map trjps mx'))) )  ={ useprf measAlgLemma }=
<
<  ( meas (map (meas . map (r . head <++> sumR) . trjps) mx') )  ={ useprf measSumLemma }=
<
<  ( meas (map (r <++> meas . map sumR . trjps) mx') )           ={ definitionprf val' }=
<
<  ( meas (map (r <++> val' ps) mx') )                           ={ useprf IHprf }=
<
<  ( meas (map (r <++> val ps) mx') )                            ={ definitionprf val }=
<
<  ( val (p :: ps) x )

\hfill $\Box$


\paragraph*{Technical remarks.} \hspace{0.1cm}
The above proof of |valMeasTotalReward| omits some technical details that
may be uninteresting for a pen and paper proof, but turn out to be
crucial in the setting of an intensional type theory -- like Idris --
where function extensionality does not hold in general.
In particular, we have to postulate that the functorial |map|
Nicola Botta's avatar
Final.    
Nicola Botta committed
298
preserves extensional equality (see Appendix~\ref{appendix:monadLaws}
Nuria Brede's avatar
Nuria Brede committed
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
and \citep{botta2020extensional}) for Idris to accept the proof.
In fact, most of the reasoning proceeds by replacing functions that are mapped
onto monadic values by other functions that are only extensionally
equal. Using that |map| preserves extensional equality
allows to carry out such proofs generically without knowledge of
the concrete structure of the functor.


\subsection{Correctness of monadic backward induction}
\label{subsection:biCorrectness}

As corollary, we can now prove the correctness of monadic backward induction,
namely that the policy sequences computed by |bi| are optimal with
respect to the measured total reward computed by |val'|:  

< biOptMeasTotalReward  :  (t, n : Nat) -> GenOptPolicySeq val' (bi t n)
<
< biOptMeasTotalReward t n ps' x =
<    let vvEqL   = sym (valMeasTotalReward ps' x)       in
<    let vvEqR   = sym (valMeasTotalReward (bi t n) x)  in
Nuria Brede's avatar
Nuria Brede committed
319
<    let biOpt   = biOptVal t n ps' x                   in
Nuria Brede's avatar
Nuria Brede committed
320
<    replace vvEqR (replace vvEqL biOpt)