Commit e6c7245c by Nuria Brede

### Initial.

parent 8e0120fa
 > module Equality > %default total > %auto_implicits off > %access public export > ExtEq : {A, B : Type} -> (f, g : A -> B) -> Type > ExtEq {A} f g = (a : A) -> f a = g a
 > module Framework > import Syntax.PreorderReasoning > import Equality > import Sigma > %default total > %auto_implicits off > %access public export > infixr 7 ::: > infixr 7 <++> > %hide map > %hide pure > %hide join > %hide (>>=) > %hide head > %hide lteRefl > %hide Prelude.Algebra.(<+>) %-----------------------------------------------------------------------------% %-----------------------------------------------------------------------------% Botta et al. framework for Monadic Sequential Decision Problems %-----------------------------------------------------------------------------% %-----------------------------------------------------------------------------% For readability this implementation is a simplified version of the general framework (notably omitting the *viability* checks), and uses a non-emptiness condition on the transition function instead. %-----------------------------------------------------------------------------% Monadic dynamical system %-----------------------------------------------------------------------------% The components of a monadic dynamical system: > M : Type -> Type |M| is a functor: > map : {A, B : Type} -> (A -> B) -> M A -> M B |M| is a container monad: > pure : {A : Type} -> A -> M A > (>>=) : {A, B : Type} -> M A -> (A -> M B) -> M B > join : {A : Type} -> M (M A) -> M A > NotEmpty : {A : Type} -> M A -> Type Families of state and control spaces indexed by decision steps: > X : (t : Nat) -> Type > Y : (t : Nat) -> X t -> Type The transition function: > next : (t : Nat) -> (x : X t) -> Y t x -> M (X (S t)) > nextNotEmpty : {t : Nat} -> (x : X t) -> (y : Y t x) -> > NotEmpty (next t x y) %-----------------------------------------------------------------------------% Monadic sequential decision problem %-----------------------------------------------------------------------------% The components of a decision problem: > Val : Type > reward : (t : Nat) -> (x : X t) -> Y t x -> X (S t) -> Val > (<+>) : Val -> Val -> Val > zero : Val > meas : M Val -> Val A helper function: > (<++>) : {A : Type} -> (f, g : A -> Val) -> A -> Val > f <++> g = \ a => f a <+> g a Policies and policy sequences: > Policy : (t : Nat) -> Type > Policy t = (x : X t) -> Y t x > data PolicySeq : (t, n : Nat) -> Type where > Nil : {t : Nat} -> PolicySeq t Z > (::) : {t, n : Nat} -> Policy t -> PolicySeq (S t) n -> PolicySeq t (S n) The value of policy sequences (Bellman equation): > val : {t, n : Nat} -> PolicySeq t n -> X t -> Val > val {t} Nil x = zero > val {t} (p :: ps) x = let y = p x in > let mx' = next t x y in > (meas . map (reward t x y <++> val ps)) mx' %-----------------------------------------------------------------------------% % Backward Induction %-----------------------------------------------------------------------------% Preliminairies -------------- Basic requirements for verified backward induction: > (<=) : Val -> Val -> Type > lteRefl : {a : Val} -> a <= a > lteTrans : {a, b, c : Val} -> a <= b -> b <= c -> a <= c > plusMonSpec : {a, b, c, d : Val} -> a <= b -> c <= d -> (a <+> c) <= (b <+> d) > measMonSpec : {A : Type} -> (f, g : A -> Val) -> ((a : A) -> (f a) <= (g a)) -> > (ma : M A) -> meas (map f ma) <= meas (map g ma) Optimality of policy sequences: > OptPolicySeq : {t, n : Nat} -> PolicySeq t n -> Type > OptPolicySeq {t} {n} ps = (ps' : PolicySeq t n) -> (x : X t) -> > val ps' x <= val ps x Optimality of extensions of policy sequences: > OptExt : {t, n : Nat} -> PolicySeq (S t) n -> Policy t -> Type > OptExt {t} ps p = (p' : Policy t) -> (x : X t) -> > val (p' :: ps) x <= val (p :: ps) x Bellman's principle of optimality: > Bellman : {t, n : Nat} -> > (ps : PolicySeq (S t) n) -> OptPolicySeq ps -> > (p : Policy t) -> OptExt ps p -> > OptPolicySeq (p :: ps) > > Bellman {t} ps ops p oep (p' :: ps') x = > ------------------------------------------------------- > let y' = p' x in > let mx' = next t x y' in > let f' = reward t x y' <++> val ps' in > let f = reward t x y' <++> val ps in > let s0 = \ x' => plusMonSpec lteRefl (ops ps' x') in > let s1 = measMonSpec f' f s0 mx' in -- |val (p' :: ps') x <= val (p' :: ps) x| > let s2 = oep p' x in -- |val (p' :: ps) x <= val (p :: ps) x| > ------------------------------------------------------- > lteTrans s1 s2 Verification of backward induction with respect to |val| --------------------------------------------------------- The empty policy sequence is optimal: > nilOptPolicySeq : OptPolicySeq Nil > nilOptPolicySeq Nil x = lteRefl Now, provided that we can implement > optExt : {t, n : Nat} -> PolicySeq (S t) n -> Policy t > optExtSpec : {t, n : Nat} -> (ps : PolicySeq (S t) n) -> > OptExt ps (optExt ps) then > bi : (t : Nat) -> (n : Nat) -> PolicySeq t n > bi t Z = Nil > bi t (S n) = let ps = bi (S t) n in > optExt ps :: ps is correct "by construction" > biOptVal : (t : Nat) -> (n : Nat) -> OptPolicySeq (bi t n) > biOptVal t Z = nilOptPolicySeq > biOptVal t (S n) = > ------------------------------ > let ps = bi (S t) n in > let ops = biOptVal (S t) n in > let p = optExt ps in > let oep = optExtSpec ps in > ------------------------------ > Bellman ps ops p oep Optimal Extensions ------------------ The generic implementation of backward induction |bi| naturally raises the question under which conditions one can implement < optExt : Functor M => {t, n : Nat} -> PolicySeq (S t) n -> Policy t such that < optExtSpec : Functor M => {t, n : Nat} -> < (ps : PolicySeq (S t) n) -> OptExt ps (optExt ps) To this end, consider the function > cval : {t, n : Nat} -> > PolicySeq (S t) n -> (x : X t) -> Y t x -> Val > cval {t} ps x y = let mx' = next t x y in > meas (map (reward t x y <++> val ps) mx') By definition of |val| and |cval|, one has val (p :: ps) x = meas (map (reward t x (p x) <++> val ps) (next t x (p x))) = cval ps x (p x) This suggests that, if we can maximize |cval| that is, implement > cvalmax : {t, n : Nat} -> > PolicySeq (S t) n -> (x : X t) -> Val > cvalargmax : {t, n : Nat} -> > PolicySeq (S t) n -> (x : X t) -> Y t x that fulfill > cvalmaxSpec : {t, n : Nat} -> > (ps : PolicySeq (S t) n) -> (x : X t) -> > (y : Y t x) -> cval ps x y <= cvalmax ps x > cvalargmaxSpec : {t, n : Nat} -> > (ps : PolicySeq (S t) n) -> (x : X t) -> > cvalmax ps x = cval ps x (cvalargmax ps x) then we can implement optimal extensions of arbitrary policy sequences. As it turns out, this intuition is correct. With > optExt = cvalargmax one has > optExtSpec {t} {n} ps p' x = s4 where > p : Policy t > p = optExt ps > y : Y t x > y = p x > y' : Y t x > y' = p' x > s1 : cval ps x y' <= cvalmax ps x > s1 = cvalmaxSpec ps x y' > s2 : cval ps x y' <= cval ps x (cvalargmax ps x) > s2 = replace {P = \ z => (cval ps x y' <= z)} (cvalargmaxSpec ps x) s1 > s3 : cval ps x y' <= cval ps x y > s3 = s2 > s4 : val (p' :: ps) x <= val (p :: ps) x > s4 = s3 |cvalmax| and |cvalargmax| -------------------------- The observation that functions |cvalmax| and |cvalargmax| that fulfill |cvalmaxSpec| and |cvalargmaxSpec| are sufficient to implement an optimal extension |optExt| that fulfills |optExtSpec| naturally raises the question of what are necessary and sufficient conditions for |cvalmax| and |cvalargmax|. Answering this question necessarily requires discussing properties of |cval| and goes well beyond the scope of formulating a theory of SDPs. Here, we limit ourselves to remark that if |Y t x| is finite and non-empty one can implement |cvalmax|, |cvalargmax|, |cvalmaxSpec| and |cvalargmaxSpec| by linear search. Concrete implementations can be found in the SequentialDecisionProblems folder of the IdrisLibs git repository [2], together with the original BJI-framework, tabulated backward induction and several examples. Backward induction for some examples and counter-examples from [1] can be tried out by compiling |LeightweightTheory.lidr| in said folder. %-----------------------------------------------------------------------------% %-----------------------------------------------------------------------------% Correctness of Monadic Backward Induction %-----------------------------------------------------------------------------% %-----------------------------------------------------------------------------% This part of the file contains the results of [1]. %-----------------------------------------------------------------------------% Extension of the BJI-framework %-----------------------------------------------------------------------------% The type of state-control sequences: > data StateCtrlSeq : (t, n : Nat) -> Type where > Last : {t : Nat} -> X t -> StateCtrlSeq t (S Z) > (:::) : {t, n : Nat} -> Sigma (X t) (Y t) -> > StateCtrlSeq (S t) (S n) -> StateCtrlSeq t (S (S n)) A function that extracts the head state from a state-control sequence: > head : {t, n : Nat} -> StateCtrlSeq t (S n) -> X t > head (Last x) = x > head (MkSigma x y ::: xys) = x The reward accumulated along a state-control sequence: > sumR : {t, n : Nat} -> StateCtrlSeq t n -> Val > sumR {t} (Last x) = zero > sumR {t} (MkSigma x y ::: xys) = reward t x y (head xys) <+> sumR xys Computation of an |M|-structure of trajectories from a given policy sequence: > trj : {t, n : Nat} -> PolicySeq t n -> X t -> M (StateCtrlSeq t (S n)) > trj {t} Nil x = (pure . Last) x > trj {t} (p :: ps) x = let y = p x in > let mx' = next t x y in > map ((MkSigma x y) :::) (mx' >>= trj ps) The function that computes the measured total reward: > val' : {t, n : Nat} -> PolicySeq t n -> X t -> Val > val' ps = meas . map sumR . trj ps %-----------------------------------------------------------------------------% Functor and monad properties %-----------------------------------------------------------------------------% Functor properties ------------------ |map| preserves composition: > mapPresComp : {A, B, C : Type} -> > (f : A -> B) -> (g : B -> C) -> > map (g . f) ExtEq (map g . map f) |map| preserves extensional (pointwise) equality: > mapPresExtEq : {A, B : Type} -> > (f, g : A -> B) -> f ExtEq g -> (map f) ExtEq (map g) Basic monad properties ---------------------- |pure| and |join| are natural transformations: > pureNatTrans : {A, B : Type} -> (f : A -> B) -> > map f . pure ExtEq pure . f > joinNatTrans : {A, B : Type} -> (f : A -> B) -> > map f . join ExtEq join . map (map f) |bind|, |join| and |map| need to be inter-related: > bindJoinSpec : {A, B : Type} -> (f : A -> M B) -> > (>>= f) ExtEq join . map f |NonEmpty| conditions --------------------- Condition on |pure|: > pureNotEmpty : {A : Type} -> (a : A) -> NotEmpty (pure a) > Conditions on |map| and |bind|: > mapPresNotEmpty : {A, B : Type} -> (f : A -> B) -> (ma : M A) -> > NotEmpty ma -> NotEmpty (map f ma) > bindPresNotEmpty : {A, B : Type} -> (f : A -> M B) -> (ma : M A) -> > NotEmpty ma -> ((a : A) -> NotEmpty (f a)) -> > NotEmpty (ma >>= f) Properties of monad algebras ---------------------------- Condition for a function |f| to be an |M|-algebra homomorphism: map f M A ---------------- > M B | | | alpha | beta | | v v A ------------------> B f > algMorSpec : {A, B : Type} -> > (alpha : M A -> A) -> (beta : M B -> B) -> (f : A -> B) -> Type > algMorSpec {A} {B} alpha beta f = (beta . map f) ExtEq (f . alpha) Structure maps of |M|-algebras are left inverses of |pure|: > algPureSpec : {A : Type} -> (alpha : M A -> A) -> Type > algPureSpec {A} alpha = alpha . pure ExtEq id Structure maps of |M|-algebras are themselves |M|-algebra homomorphisms: > -- (alpha . map alpha) ExtEq (alpha . join) > algJoinSpec : {A : Type} -> (alpha : M A -> A) -> Type > algJoinSpec {A} alpha = algMorSpec join alpha alpha > A lemma about computation with |M|-algebras: > algLemma : {A, B, C : Type} -> (alpha: M C -> C) -> (ee : algJoinSpec alpha) -> > (f : B -> C ) -> (g : A -> M B) -> > (alpha . map (alpha . map f . g)) ExtEq (alpha . map f . join . map g) > algLemma {A} {B} {C} alpha ee f g ma = > > ( (alpha . map (alpha . map f . g)) ma ) > > ={ cong (mapPresComp g (alpha . map f) ma) }= > > ( (alpha . map (alpha . map f) . map g) ma ) > > ={ cong (mapPresComp (map f) alpha (map g ma)) }= > > ( (alpha . map alpha . map (map f) . map g) ma ) > > ={ ee (map (map f) (map g ma)) }= > > ( (alpha . join . map (map f) . map g) ma ) > > ={ cong (sym (joinNatTrans f (map g ma))) }= > > ( (alpha . map f . join . map g) ma ) > > QED %-----------------------------------------------------------------------------% Measure requirements %-----------------------------------------------------------------------------% Three sufficient requirements the measure |meas| needs to fulfill for the theorem to hold: The measure needs to be the structure map of an |M|-algebra on |Val|. This means it has to fulfill two conditions. 1. It is a left inverse to |pure|: > measPureSpec : algPureSpec meas -- |meas . pure ExtEq id| 2. It is an |M|-algebra homomorphism from |join| to itself: > measJoinSpec : algJoinSpec meas -- |meas . join ExtEq meas . map meas| 3. For |v : Val|, the function |(v <+>)| is an |M|-algebra homomorphism: For all |v : Val|, map (v <+>) M Val ---------------> M Val | | | meas | meas | | v v Val ----------------> Val (v <+>) > measPlusSpec : (v : Val) -> (mv : M Val) -> (NotEmpty mv) -> > (meas . map (v <+>)) mv = ((v <+>) . meas) mv We can omit the non-emptiness condition but this means we implicitly restrict the class of monads for which the result applies |M| (because |measPlusSpec'| might not be provable). > -- ∀ v : Val, meas . map (v <+>) ExtEq (v <+>) . meas > measPlusSpec' : (v : Val) -> algMorSpec meas meas ((<+>) v) %-----------------------------------------------------------------------------% Lemmas %-----------------------------------------------------------------------------% The lemmas will be implemented below the proof of the main theorem. > oplusLiftEERight : {A : Type} -> (f, g, r : A -> Val) -> > ExtEq {A} f g -> ExtEq {A} (r <++> f) (r <++> g) > trjNotEmptyLemma : {t, n : Nat} -> (ps : PolicySeq t n) -> (x : X t) -> > NotEmpty (trj ps x) > hdTrjLemma : {t, n : Nat} -> (ps : PolicySeq t n) -> > (r : X t -> Val) -> > (s : StateCtrlSeq t (S n) -> Val) -> (x' : X t) -> > (map (r . head <++> s) . trj ps) x' = > (map (const (r x') <++> s) . trj ps) x' > measSumLemma : {t, n : Nat} -> (ps : PolicySeq t n) -> > (r : X t -> Val) -> > (s : StateCtrlSeq t (S n) -> Val) -> > (meas . map (r . head <++> s) . trj ps) ExtEq > (r <++> meas . map s . trj ps) %-----------------------------------------------------------------------------% Proof of the main theorem %-----------------------------------------------------------------------------% The proof of the main theorem from section \ref{section:valval} that |val| computes the measured total reward because it is extensionally equal to |val'|: > valMeasTotalReward : {t, n : Nat} -> (ps : PolicySeq t n) -> (x : X t) -> > val' ps x = val ps x