Theorem.lidr 13.4 KB
Newer Older
 Nuria Brede committed Feb 23, 2021 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 % -*-Latex-*- %if False > module Theorem Remark: The statements and proofs in this file are just "pseudo-code". The whole theory including the proofs is contained in the file Appendix.lidr (and in the supplementary material). %endif %\pagebreak \section{Correctness Proofs} \label{section:valval} In this section we show that |val|  Nuria Brede committed Jul 23, 2021 19 20 (Sec.~\ref{subsection:solution_components}) and |val'| (Sec.~\ref{section:preparation}) are extensionally equal  Nuria Brede committed Feb 23, 2021 21 22 23  < valMeasTotalReward : {t, n : Nat} -> (ps : PolicySeq t n) -> (x : X t) -> val' ps x = val ps x  Nuria Brede committed Jul 23, 2021 24 given the three conditions from the previous section hold. As a  Nuria Brede committed Feb 23, 2021 25 26 27 corollary we then obtain our correctness result for monadic backward induction.  Nuria Brede committed Jul 23, 2021 28 We can understand the proof of |valMeasTotalReward| as an optimising  Nuria Brede committed Feb 23, 2021 29 30 31 program transformation from the less efficient but obviously correct'' implementation |val'| to the more efficient implementation |val|. Therefore the equational reasoning proofs in this section will  Nuria Brede committed Jul 23, 2021 32 proceed from |val'| to |val|. In Sec.~\ref{section:conditions} we  Nuria Brede committed Feb 23, 2021 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 have stated sufficient conditions for this transformation to be possible: |measPureSpec|, |measJoinSpec|, |measPlusSpec|. We also have seen the different computational patterns that the two implementations exhibit: While |val'| first computes all possible trajectories for the given policy sequence and initial state, then computes their individual sum of rewards and finally applies the measure once, |val| computes its final result by adding the current reward to an intermediate outcome and applying the measure locally at each decision step. This suggests that a transformation from |val'| to |val| will essentially have to push the application of the measure into the recursive computation of the sum of rewards. The proof will be carried out by induction on the structure of policy sequences. \subsection{Deterministic Case} \label{subsection:detCase} % To get a first intuition, let's have a look at what the induction step looks like in the deterministic case (i.e. if we fix monad and measure to be identities): {\linespread{1.5} < valMeasTotalReward (p :: ps) x = < ( val' (p :: ps) x ) ={ definitionprf val' }=  Nuria Brede committed Jul 23, 2021 56 < ( sumR ((x ** y) ## trj ps x')) ={ definitionprf sumR }=  Nuria Brede committed Feb 23, 2021 57 58 59 60 61 62 63 64 65 66 67 < ( r (head (trj ps x')) <+> val' ps x' ) ={ useprf headLemma }= < ( r x' <+> val' ps x' ) ={ useprf IHprf }= < ( r x' <+> val ps x' ) ={ definitionprf val }= < ( val (p :: ps) x ) QED } \noindent where |y = p x|, |x' = next t x y| and |r = reward t x y|. In the proof sketch, we have first applied the definitions of |val'| and |sumR|. Using the fact that in the deterministic case |trj| returns exactly one state-control sequence and that the |head| of any trajectory starting in |x'| is just |x'| (let us call the latter  Nuria Brede committed Jul 23, 2021 68 69 |headLemma|), the left-hand side of the sum simplifies to |r x'|. Its right-hand side amounts to |val' ps x'| so that we can apply the  Nuria Brede committed Feb 23, 2021 70 71 induction hypothesis. The rest of the proof only relies on definitional equalities. Thus in the deterministic case  Nuria Brede committed Jul 23, 2021 72 73 |val| and |val'| are unconditionally extensionally equal -- or rather, the conditions of Sec.~\ref{section:conditions} are trivially fulfilled.  Nuria Brede committed Feb 23, 2021 74 75 76 77 78 79 80 81  \subsection{Lemmas} \label{subsection:lemmas} % To prove the general, monadic case, we proceed similarly. This time, however, the situation is complicated by the presence of the abstract monad |M|. Instead of being able to use the type structure of some concrete monad, we need to leverage on the properties of |M|,  Nuria Brede committed Jul 23, 2021 82 |meas| and |<+>| postulated in Sec.~\ref{section:conditions}. To  Nuria Brede committed Feb 23, 2021 83 84 85 86 87 facilitate the main proof, we first prove three lemmas about the interaction of the measure with the monad structure and the |<+>|-operator on |Val|. % Machine-checked proofs are given in the  Nicola Botta committed Sep 23, 2021 88 Appendices~\ref{appendix:theorem},~\ref{appendix:biCorrectness} and  Nuria Brede committed Feb 23, 2021 89 \ref{appendix:lemmas}. The monad laws we use are stated in  Nicola Botta committed Sep 23, 2021 90 Appendix~\ref{appendix:monadLaws}. In the remainder of this section,  Nuria Brede committed Feb 23, 2021 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 we discuss semi-formal versions of the proofs.\\ \paragraph*{Monad algebras.} \hspace{0.1cm} The first lemma allows us to lift and eliminate an application of the monad's |join| operation: < measAlgLemma : {A, B: Type} -> (f : B -> Val ) -> (g : A -> M B) -> < (meas . map (meas . map f . g)) ExtEq (meas . map f . join . map g) The proof of this lemma hinges on the condition |measJoinSpec|. It allows to trade the application of |join| against an application of |map meas|. The rest is just standard reasoning with monad and functor laws, i.e. we use that the functorial map for |M| preserves composition and that |join| is a natural transformation: < measAlgLemma f g ma = <  Nuria Brede committed Jul 23, 2021 107 < ( (meas . map (meas . map f . g)) ma ) ={ mapPresCompPrf }=  Nuria Brede committed Feb 23, 2021 108 <  Nuria Brede committed Jul 23, 2021 109 < ( (meas . map (meas . map f) . map g) ma ) ={ mapPresCompPrf }=  Nuria Brede committed Feb 23, 2021 110 111 112 < < ( (meas . map meas . map (map f) . map g) ma ) ={ useprf measJoinSpec }= <  Nuria Brede committed Jul 23, 2021 113 < ( (meas . join . map (map f) . map g) ma ) ={ joinNatTransPrf }=  Nuria Brede committed Feb 23, 2021 114 115 116 117 118 119 120 121 122 123 124 125 126 < < ( (meas . map f . join . map g) ma ) QED This lemma is generic in the sense that it holds for arbitrary Eilenberg-Moore algebras of a monad. Here we prove it for the framework's measure |meas|, but note that in the appendix we prove a generic version that is then appropriately instantiated. \paragraph*{Head/trajectory interaction.} \hspace{0.1cm} The second lemma amounts to a lifted version of |headLemma| in the deterministic case. Mapping |head| onto an |M|-structure of trajectories computed with |trj| results in an |M|-structure filled  Nuria Brede committed Jul 23, 2021 127 with the initial states of these trajectories; similarly, mapping |(r  Nuria Brede committed Feb 23, 2021 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 . head <++> s)| onto |trj ps x| for functions |r| and |s| of appropriate type is the same as mapping |(const (r x) <++> s)| onto |trj ps x| (where |const| is the constant function). We can prove < headTrjLemma : {t, n : Nat} -> (ps : PolicySeq t n) -> (r : X t -> Val) -> < (s : StateCtrlSeq t (S n) -> Val) -> (x : X t) -> < (map (r . head <++> s) . trj ps) x = < (map (const (r x) <++> s) . trj ps) x by doing a case split on |ps|. In case |ps = Nil|, the equality holds because the monad's |pure| is a natural transformation and in case |ps = p :: ps'| because |M|'s functorial |map| preserves composition. \paragraph*{Measure/sum interaction.} \hspace{0.1cm} The third lemma allows us to both commute the measure into the right summand of an |<++>|-sum and to perform the head/trajectory  Nuria Brede committed Jul 23, 2021 144  simplification. It lies at the core of the relationship between |val| and |val'|.  Nuria Brede committed Feb 23, 2021 145 146 147 148 149 150 151 152  < measSumLemma : {t, n : Nat} -> (ps : PolicySeq t n) -> < (r : X t -> Val) -> < (s : StateCtrlSeq t (S n) -> Val) -> < (meas . map (r . head <++> s) . trj ps) ExtEq < (r <++> meas . map s . trj ps) Recall that our third condition from  Nuria Brede committed Jul 23, 2021 153 Sec.~\ref{section:conditions}, |measPlusSpec|, plays the role of a  Nuria Brede committed Feb 23, 2021 154 155 156 distributive law and allows us to factor out'' a partially applied sum |(v <+>)| for arbitrary |v : Val|. Given that |measPlusSpec| holds, the lemma is provable by simple  Nuria Brede committed Jul 23, 2021 157 158 equational reasoning using the above head-trajectory lemma and the fact that map preserves composition:  Nuria Brede committed Feb 23, 2021 159 160 161 162 163 164 165  < measSumLemma ps r s x' = < < ( (meas . map (r . head <++> s) . trj ps) x' ) ={ useprf headTrjLemma }= < < ( (meas . map (const (r x') <++> s) . trj ps) x' ) ={ definitionprf <++>, . }= <  Nuria Brede committed Jul 23, 2021 166 < ( (meas . map ((const (r x') <++> id) . s) . trj ps) x' ) ={ mapPresCompPrf }=  Nuria Brede committed Feb 23, 2021 167 168 169 170 171 172 173 174 175 < < ( (meas . map (const (r x') <++> id) . map s . trj ps) x' ) ={ definitionprf <++> }= < < ( (meas . map ((r x') <+>) . map s . trj ps) x' ) ={ useprf measPlusSpec }= < < ( (((r x') <+>) . meas . map s . trj ps) x' ) ={ definitionprf <++> }= < < ( (r <++> meas . map s . trj ps) x' ) QED  Nuria Brede committed Jul 23, 2021 176 177 178 179 180 181 182 Notice how |measPlusSpec| is used to transform an application of |meas . map ((r x') <+>)| into an application of |((r x') <+>) . meas|. This is essential to simplify the computation of the measured total reward: instead of first adding the current reward to the intermediate outcome of each individual trajectory and then measuring the outcomes, one can first measure the intermediate outcomes of the trajectories and then add the current reward.  Nuria Brede committed Feb 23, 2021 183 184 185 186 187 188 189 190 191 192 193 194 195 196  \subsection{Correctness of the BJI-value function} \label{subsection:valvalTh} % With the above lemmas in place, we now prove that |val| is extensionally equal to |val'|. \vspace{0.15cm} Let |t, n : Nat|, |ps : PolicySeq t n|. We prove |valMeasTotalReward| by induction on |ps|. \vspace{0.15cm} \paragraph*{Base case.}\hspace{0.1cm} We need to show that for all |x : X t|, |val'  Nuria Brede committed Jul 23, 2021 197 198 199 200 201 202 203 Nil x| = |val Nil x|. The right hand side of this equation reduces to |zero| by definition. The left hand side can be simplified to |meas (pure zero)| since |pure| is a natural transformation. At this point, our first condition, |measPureSpec|, comes into play: Using that |meas| is inverse to |pure| on the left, we can conclude that the equality holds.\\  Nuria Brede committed Feb 23, 2021 204 205 206 207 208 209 210 211 212 213 214  \noindent In equational reasoning style: For all |x : X t|, {\linespread{1.2} < valMeasTotalReward Nil x = < < ( val' Nil x ) ={ definitionprf val' }= < < ( meas (map sumR (trj Nil x)) ) ={ definitionprf trj }= <  Nuria Brede committed Jul 23, 2021 215 < ( meas (map sumR (pure (Last x))) ) ={ pureNatTransPrf }=  Nuria Brede committed Feb 23, 2021 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 < < ( meas (pure (sumR (Last x))) ) ={ definitionprf sumR }= < < ( meas (pure zero) ) ={ useprf measPureSpec }= < < ( zero ) ={ definitionprf val }= < < ( val Nil x ) } \paragraph*{Step case.}\hspace{0.1cm} The induction hypothesis (|IH|) is: for all |x : X t|, |val' ps x = val ps x|. We have to show that |IH| implies that for all |p : Policy t| and |x : X t|, the equality  Nicola Botta committed Sep 23, 2021 232 233 |val' (p :: ps) x = val (p :: ps) x| holds.  Nuria Brede committed Jul 23, 2021 234 For brevity (and to economise on brackets), let in the following  Nuria Brede committed Feb 23, 2021 235 |y = p x|, |mx' = next t x y|, |r = reward t x y|, |trjps = trj  Nuria Brede committed Jul 23, 2021 236 ps|, and |consxy = ((x ** y) :::)|.  Nuria Brede committed Feb 23, 2021 237 238 239 240 241 242 243 244 245 246 247  As in the base case, all that has to be done on the |val|-side of the equation only depends on definitional equality. However it is more involved to bring the |val'|-side into a form in which the induction hypothesis can be applied. This is where we leverage on the lemmas proved above. By definition and because |map| preserves composition, we know that |val' (p :: ps) x| is equal to |(meas . map ((r . head) <++> sumR)) (mx' >>= trjps)|. We use the relation between the monad's |bind| and |join| to eliminate  Nuria Brede committed Jul 23, 2021 248 the |bind|-operator from the term.  Nuria Brede committed Feb 23, 2021 249 Now we can apply the first lemma from above, |measAlgLemma|, to lift  Nuria Brede committed Jul 23, 2021 250 and eliminate the |join| operation.  Nuria Brede committed Feb 23, 2021 251 252  To commute the measure under the |<++>| and get rid of the application  Nuria Brede committed Jul 23, 2021 253 254 255 of |head|, we use our third lemma, |measSumLemma|. At this point we can apply the induction hypothesis and the resulting term is equal to |val ps x| by definition.\\  Nuria Brede committed Feb 23, 2021 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270  \noindent The more detailed equational reasoning proof: % \footnote{We are very grateful to the anonymous reviewer who suggested an alternative proof for the induction step. The proof presented here is based on his proof, and his suggestions have lead to significantly weaker conditions on the measure and thus a stronger result.} < valMeasTotalReward (p :: ps) x = < < ( val' (p :: ps) x ) ={ definitionprf val' }= < < ( meas (map sumR (trj (p :: ps) x)) ) ={ definitionprf trj }= <  Nuria Brede committed Jul 23, 2021 271 < ( meas (map sumR (map consxy (mx' >>= trjps))) ) ={ mapPresCompPrf }=  Nuria Brede committed Feb 23, 2021 272 273 274 < < ( meas (map (sumR . consxy) (mx' >>= trjps)) ) ={ definitionprf sumR}= <  Nuria Brede committed Jul 23, 2021 275 < ( meas (map ((r . head) <++> sumR) (mx' >>= trjps)) ) ={ bindJoinSpecPrf }=  Nuria Brede committed Feb 23, 2021 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 < < ( meas (map ((r . head) <++> sumR) (join (map trjps mx'))) ) ={ useprf measAlgLemma }= < < ( meas (map (meas . map (r . head <++> sumR) . trjps) mx') ) ={ useprf measSumLemma }= < < ( meas (map (r <++> meas . map sumR . trjps) mx') ) ={ definitionprf val' }= < < ( meas (map (r <++> val' ps) mx') ) ={ useprf IHprf }= < < ( meas (map (r <++> val ps) mx') ) ={ definitionprf val }= < < ( val (p :: ps) x ) \hfill $\Box$ \paragraph*{Technical remarks.} \hspace{0.1cm} The above proof of |valMeasTotalReward| omits some technical details that may be uninteresting for a pen and paper proof, but turn out to be crucial in the setting of an intensional type theory -- like Idris -- where function extensionality does not hold in general. In particular, we have to postulate that the functorial |map|  Nicola Botta committed Sep 23, 2021 298 preserves extensional equality (see Appendix~\ref{appendix:monadLaws}  Nuria Brede committed Feb 23, 2021 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 and \citep{botta2020extensional}) for Idris to accept the proof. In fact, most of the reasoning proceeds by replacing functions that are mapped onto monadic values by other functions that are only extensionally equal. Using that |map| preserves extensional equality allows to carry out such proofs generically without knowledge of the concrete structure of the functor. \subsection{Correctness of monadic backward induction} \label{subsection:biCorrectness} As corollary, we can now prove the correctness of monadic backward induction, namely that the policy sequences computed by |bi| are optimal with respect to the measured total reward computed by |val'|: < biOptMeasTotalReward : (t, n : Nat) -> GenOptPolicySeq val' (bi t n) < < biOptMeasTotalReward t n ps' x = < let vvEqL = sym (valMeasTotalReward ps' x) in < let vvEqR = sym (valMeasTotalReward (bi t n) x) in  Nuria Brede committed Jul 23, 2021 319 < let biOpt = biOptVal t n ps' x in  Nuria Brede committed Feb 23, 2021 320 < replace vvEqR (replace vvEqL biOpt)