Commit e6c7245c authored by Nuria Brede's avatar Nuria Brede
Browse files

Initial.

parent 8e0120fa
> module Equality
> %default total
> %auto_implicits off
> %access public export
> ExtEq : {A, B : Type} -> (f, g : A -> B) -> Type
> ExtEq {A} f g = (a : A) -> f a = g a
> module Framework
> import Syntax.PreorderReasoning
> import Equality
> import Sigma
> %default total
> %auto_implicits off
> %access public export
> infixr 7 :::
> infixr 7 <++>
> %hide map
> %hide pure
> %hide join
> %hide (>>=)
> %hide head
> %hide lteRefl
> %hide Prelude.Algebra.(<+>)
%-----------------------------------------------------------------------------%
%-----------------------------------------------------------------------------%
Botta et al. framework for Monadic Sequential Decision Problems
%-----------------------------------------------------------------------------%
%-----------------------------------------------------------------------------%
For readability this implementation is a simplified version of the
general framework (notably omitting the *viability* checks),
and uses a non-emptiness condition on the transition
function instead.
%-----------------------------------------------------------------------------%
Monadic dynamical system
%-----------------------------------------------------------------------------%
The components of a monadic dynamical system:
> M : Type -> Type
|M| is a functor:
> map : {A, B : Type} -> (A -> B) -> M A -> M B
|M| is a container monad:
> pure : {A : Type} -> A -> M A
> (>>=) : {A, B : Type} -> M A -> (A -> M B) -> M B
> join : {A : Type} -> M (M A) -> M A
> NotEmpty : {A : Type} -> M A -> Type
Families of state and control spaces indexed by decision steps:
> X : (t : Nat) -> Type
> Y : (t : Nat) -> X t -> Type
The transition function:
> next : (t : Nat) -> (x : X t) -> Y t x -> M (X (S t))
> nextNotEmpty : {t : Nat} -> (x : X t) -> (y : Y t x) ->
> NotEmpty (next t x y)
%-----------------------------------------------------------------------------%
Monadic sequential decision problem
%-----------------------------------------------------------------------------%
The components of a decision problem:
> Val : Type
> reward : (t : Nat) -> (x : X t) -> Y t x -> X (S t) -> Val
> (<+>) : Val -> Val -> Val
> zero : Val
> meas : M Val -> Val
A helper function:
> (<++>) : {A : Type} -> (f, g : A -> Val) -> A -> Val
> f <++> g = \ a => f a <+> g a
Policies and policy sequences:
> Policy : (t : Nat) -> Type
> Policy t = (x : X t) -> Y t x
> data PolicySeq : (t, n : Nat) -> Type where
> Nil : {t : Nat} -> PolicySeq t Z
> (::) : {t, n : Nat} -> Policy t -> PolicySeq (S t) n -> PolicySeq t (S n)
The value of policy sequences (Bellman equation):
> val : {t, n : Nat} -> PolicySeq t n -> X t -> Val
> val {t} Nil x = zero
> val {t} (p :: ps) x = let y = p x in
> let mx' = next t x y in
> (meas . map (reward t x y <++> val ps)) mx'
%-----------------------------------------------------------------------------%
% Backward Induction
%-----------------------------------------------------------------------------%
Preliminairies
--------------
Basic requirements for verified backward induction:
> (<=) : Val -> Val -> Type
> lteRefl : {a : Val} -> a <= a
> lteTrans : {a, b, c : Val} -> a <= b -> b <= c -> a <= c
> plusMonSpec : {a, b, c, d : Val} -> a <= b -> c <= d -> (a <+> c) <= (b <+> d)
> measMonSpec : {A : Type} -> (f, g : A -> Val) -> ((a : A) -> (f a) <= (g a)) ->
> (ma : M A) -> meas (map f ma) <= meas (map g ma)
Optimality of policy sequences:
> OptPolicySeq : {t, n : Nat} -> PolicySeq t n -> Type
> OptPolicySeq {t} {n} ps = (ps' : PolicySeq t n) -> (x : X t) ->
> val ps' x <= val ps x
Optimality of extensions of policy sequences:
> OptExt : {t, n : Nat} -> PolicySeq (S t) n -> Policy t -> Type
> OptExt {t} ps p = (p' : Policy t) -> (x : X t) ->
> val (p' :: ps) x <= val (p :: ps) x
Bellman's principle of optimality:
> Bellman : {t, n : Nat} ->
> (ps : PolicySeq (S t) n) -> OptPolicySeq ps ->
> (p : Policy t) -> OptExt ps p ->
> OptPolicySeq (p :: ps)
>
> Bellman {t} ps ops p oep (p' :: ps') x =
> -------------------------------------------------------
> let y' = p' x in
> let mx' = next t x y' in
> let f' = reward t x y' <++> val ps' in
> let f = reward t x y' <++> val ps in
> let s0 = \ x' => plusMonSpec lteRefl (ops ps' x') in
> let s1 = measMonSpec f' f s0 mx' in -- |val (p' :: ps') x <= val (p' :: ps) x|
> let s2 = oep p' x in -- |val (p' :: ps) x <= val (p :: ps) x|
> -------------------------------------------------------
> lteTrans s1 s2
Verification of backward induction with respect to |val|
---------------------------------------------------------
The empty policy sequence is optimal:
> nilOptPolicySeq : OptPolicySeq Nil
> nilOptPolicySeq Nil x = lteRefl
Now, provided that we can implement
> optExt : {t, n : Nat} -> PolicySeq (S t) n -> Policy t
> optExtSpec : {t, n : Nat} -> (ps : PolicySeq (S t) n) ->
> OptExt ps (optExt ps)
then
> bi : (t : Nat) -> (n : Nat) -> PolicySeq t n
> bi t Z = Nil
> bi t (S n) = let ps = bi (S t) n in
> optExt ps :: ps
is correct "by construction"
> biOptVal : (t : Nat) -> (n : Nat) -> OptPolicySeq (bi t n)
> biOptVal t Z = nilOptPolicySeq
> biOptVal t (S n) =
> ------------------------------
> let ps = bi (S t) n in
> let ops = biOptVal (S t) n in
> let p = optExt ps in
> let oep = optExtSpec ps in
> ------------------------------
> Bellman ps ops p oep
Optimal Extensions
------------------
The generic implementation of backward induction |bi|
naturally raises the question under which conditions one can
implement
< optExt : Functor M => {t, n : Nat} ->
PolicySeq (S t) n -> Policy t
such that
< optExtSpec : Functor M => {t, n : Nat} ->
< (ps : PolicySeq (S t) n) -> OptExt ps (optExt ps)
To this end, consider the function
> cval : {t, n : Nat} ->
> PolicySeq (S t) n -> (x : X t) -> Y t x -> Val
> cval {t} ps x y = let mx' = next t x y in
> meas (map (reward t x y <++> val ps) mx')
By definition of |val| and |cval|, one has
val (p :: ps) x
=
meas (map (reward t x (p x) <++> val ps) (next t x (p x)))
=
cval ps x (p x)
This suggests that, if we can maximize |cval| that is, implement
> cvalmax : {t, n : Nat} ->
> PolicySeq (S t) n -> (x : X t) -> Val
> cvalargmax : {t, n : Nat} ->
> PolicySeq (S t) n -> (x : X t) -> Y t x
that fulfill
> cvalmaxSpec : {t, n : Nat} ->
> (ps : PolicySeq (S t) n) -> (x : X t) ->
> (y : Y t x) -> cval ps x y <= cvalmax ps x
> cvalargmaxSpec : {t, n : Nat} ->
> (ps : PolicySeq (S t) n) -> (x : X t) ->
> cvalmax ps x = cval ps x (cvalargmax ps x)
then we can implement optimal extensions of arbitrary policy
sequences. As it turns out, this intuition is correct. With
> optExt = cvalargmax
one has
> optExtSpec {t} {n} ps p' x = s4 where
> p : Policy t
> p = optExt ps
> y : Y t x
> y = p x
> y' : Y t x
> y' = p' x
> s1 : cval ps x y' <= cvalmax ps x
> s1 = cvalmaxSpec ps x y'
> s2 : cval ps x y' <= cval ps x (cvalargmax ps x)
> s2 = replace {P = \ z => (cval ps x y' <= z)} (cvalargmaxSpec ps x) s1
> s3 : cval ps x y' <= cval ps x y
> s3 = s2
> s4 : val (p' :: ps) x <= val (p :: ps) x
> s4 = s3
|cvalmax| and |cvalargmax|
--------------------------
The observation that functions |cvalmax| and |cvalargmax|
that fulfill |cvalmaxSpec| and |cvalargmaxSpec|
are sufficient to implement an optimal extension |optExt| that fulfills
|optExtSpec| naturally raises the question of what are necessary and
sufficient conditions for |cvalmax| and |cvalargmax|.
Answering this question necessarily requires discussing properties of
|cval| and goes well beyond the scope of formulating a theory of SDPs.
Here, we limit ourselves to remark that if |Y t x| is finite and
non-empty one can implement |cvalmax|, |cvalargmax|, |cvalmaxSpec| and
|cvalargmaxSpec| by linear search.
Concrete implementations can be found in the SequentialDecisionProblems
folder of the IdrisLibs git repository [2],
together with the original BJI-framework, tabulated backward induction and
several examples.
Backward induction for some examples and counter-examples from [1]
can be tried out by compiling |LeightweightTheory.lidr| in said folder.
%-----------------------------------------------------------------------------%
%-----------------------------------------------------------------------------%
Correctness of Monadic Backward Induction
%-----------------------------------------------------------------------------%
%-----------------------------------------------------------------------------%
This part of the file contains the results of [1].
%-----------------------------------------------------------------------------%
Extension of the BJI-framework
%-----------------------------------------------------------------------------%
The type of state-control sequences:
> data StateCtrlSeq : (t, n : Nat) -> Type where
> Last : {t : Nat} -> X t -> StateCtrlSeq t (S Z)
> (:::) : {t, n : Nat} -> Sigma (X t) (Y t) ->
> StateCtrlSeq (S t) (S n) -> StateCtrlSeq t (S (S n))
A function that extracts the head state from a state-control sequence:
> head : {t, n : Nat} -> StateCtrlSeq t (S n) -> X t
> head (Last x) = x
> head (MkSigma x y ::: xys) = x
The reward accumulated along a state-control sequence:
> sumR : {t, n : Nat} -> StateCtrlSeq t n -> Val
> sumR {t} (Last x) = zero
> sumR {t} (MkSigma x y ::: xys) = reward t x y (head xys) <+> sumR xys
Computation of an |M|-structure of trajectories from a given policy sequence:
> trj : {t, n : Nat} -> PolicySeq t n -> X t -> M (StateCtrlSeq t (S n))
> trj {t} Nil x = (pure . Last) x
> trj {t} (p :: ps) x = let y = p x in
> let mx' = next t x y in
> map ((MkSigma x y) :::) (mx' >>= trj ps)
The function that computes the measured total reward:
> val' : {t, n : Nat} -> PolicySeq t n -> X t -> Val
> val' ps = meas . map sumR . trj ps
%-----------------------------------------------------------------------------%
Functor and monad properties
%-----------------------------------------------------------------------------%
Functor properties
------------------
|map| preserves composition:
> mapPresComp : {A, B, C : Type} ->
> (f : A -> B) -> (g : B -> C) ->
> map (g . f) `ExtEq` (map g . map f)
|map| preserves extensional (pointwise) equality:
> mapPresExtEq : {A, B : Type} ->
> (f, g : A -> B) -> f `ExtEq` g -> (map f) `ExtEq` (map g)
Basic monad properties
----------------------
|pure| and |join| are natural transformations:
> pureNatTrans : {A, B : Type} -> (f : A -> B) ->
> map f . pure `ExtEq` pure . f
> joinNatTrans : {A, B : Type} -> (f : A -> B) ->
> map f . join `ExtEq` join . map (map f)
|bind|, |join| and |map| need to be inter-related:
> bindJoinSpec : {A, B : Type} -> (f : A -> M B) ->
> (>>= f) `ExtEq` join . map f
|NonEmpty| conditions
---------------------
Condition on |pure|:
> pureNotEmpty : {A : Type} -> (a : A) -> NotEmpty (pure a)
>
Conditions on |map| and |bind|:
> mapPresNotEmpty : {A, B : Type} -> (f : A -> B) -> (ma : M A) ->
> NotEmpty ma -> NotEmpty (map f ma)
> bindPresNotEmpty : {A, B : Type} -> (f : A -> M B) -> (ma : M A) ->
> NotEmpty ma -> ((a : A) -> NotEmpty (f a)) ->
> NotEmpty (ma >>= f)
Properties of monad algebras
----------------------------
Condition for a function |f| to be an |M|-algebra homomorphism:
map f
M A ---------------- > M B
| |
| alpha | beta
| |
v v
A ------------------> B
f
> algMorSpec : {A, B : Type} ->
> (alpha : M A -> A) -> (beta : M B -> B) -> (f : A -> B) -> Type
> algMorSpec {A} {B} alpha beta f = (beta . map f) `ExtEq` (f . alpha)
Structure maps of |M|-algebras are left inverses of |pure|:
> algPureSpec : {A : Type} -> (alpha : M A -> A) -> Type
> algPureSpec {A} alpha = alpha . pure `ExtEq` id
Structure maps of |M|-algebras are themselves |M|-algebra homomorphisms:
> -- (alpha . map alpha) `ExtEq` (alpha . join)
> algJoinSpec : {A : Type} -> (alpha : M A -> A) -> Type
> algJoinSpec {A} alpha = algMorSpec join alpha alpha
>
A lemma about computation with |M|-algebras:
> algLemma : {A, B, C : Type} -> (alpha: M C -> C) -> (ee : algJoinSpec alpha) ->
> (f : B -> C ) -> (g : A -> M B) ->
> (alpha . map (alpha . map f . g)) `ExtEq` (alpha . map f . join . map g)
> algLemma {A} {B} {C} alpha ee f g ma =
>
> ( (alpha . map (alpha . map f . g)) ma )
>
> ={ cong (mapPresComp g (alpha . map f) ma) }=
>
> ( (alpha . map (alpha . map f) . map g) ma )
>
> ={ cong (mapPresComp (map f) alpha (map g ma)) }=
>
> ( (alpha . map alpha . map (map f) . map g) ma )
>
> ={ ee (map (map f) (map g ma)) }=
>
> ( (alpha . join . map (map f) . map g) ma )
>
> ={ cong (sym (joinNatTrans f (map g ma))) }=
>
> ( (alpha . map f . join . map g) ma )
>
> QED
%-----------------------------------------------------------------------------%
Measure requirements
%-----------------------------------------------------------------------------%
Three sufficient requirements the measure |meas| needs to fulfill for
the theorem to hold:
The measure needs to be the structure map of an |M|-algebra on |Val|.
This means it has to fulfill two conditions.
1. It is a left inverse to |pure|:
> measPureSpec : algPureSpec meas -- |meas . pure `ExtEq` id|
2. It is an |M|-algebra homomorphism from |join| to itself:
> measJoinSpec : algJoinSpec meas -- |meas . join `ExtEq` meas . map meas|
3. For |v : Val|, the function |(v <+>)| is an |M|-algebra homomorphism:
For all |v : Val|,
map (v <+>)
M Val ---------------> M Val
| |
| meas | meas
| |
v v
Val ----------------> Val
(v <+>)
> measPlusSpec : (v : Val) -> (mv : M Val) -> (NotEmpty mv) ->
> (meas . map (v <+>)) mv = ((v <+>) . meas) mv
We can omit the non-emptiness condition but this means we implicitly
restrict the class of monads for which the result applies |M|
(because |measPlusSpec'| might not be provable).
> -- ∀ v : Val, meas . map (v <+>) `ExtEq` (v <+>) . meas
> measPlusSpec' : (v : Val) -> algMorSpec meas meas ((<+>) v)
%-----------------------------------------------------------------------------%
Lemmas
%-----------------------------------------------------------------------------%
The lemmas will be implemented below the proof of the main theorem.
> oplusLiftEERight : {A : Type} -> (f, g, r : A -> Val) ->
> ExtEq {A} f g -> ExtEq {A} (r <++> f) (r <++> g)
> trjNotEmptyLemma : {t, n : Nat} -> (ps : PolicySeq t n) -> (x : X t) ->
> NotEmpty (trj ps x)
> hdTrjLemma : {t, n : Nat} -> (ps : PolicySeq t n) ->
> (r : X t -> Val) ->
> (s : StateCtrlSeq t (S n) -> Val) -> (x' : X t) ->
> (map (r . head <++> s) . trj ps) x' =
> (map (const (r x') <++> s) . trj ps) x'
> measSumLemma : {t, n : Nat} -> (ps : PolicySeq t n) ->
> (r : X t -> Val) ->
> (s : StateCtrlSeq t (S n) -> Val) ->
> (meas . map (r . head <++> s) . trj ps) `ExtEq`
> (r <++> meas . map s . trj ps)
%-----------------------------------------------------------------------------%
Proof of the main theorem
%-----------------------------------------------------------------------------%
The proof of the main theorem from section \ref{section:valval} that
|val| computes the measured total reward because it is
extensionally equal to |val'|:
> valMeasTotalReward : {t, n : Nat} -> (ps : PolicySeq t n) -> (x : X t) ->
> val' ps x = val ps x