Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
Nicola Botta
papers
Commits
e6c7245c
Commit
e6c7245c
authored
Feb 23, 2021
by
Nuria Brede
Browse files
Initial.
parent
8e0120fa
Changes
12
Hide whitespace changes
Inline
Side-by-side
2021.On the Correctness of Monadic Backward Induction/code/Equality.lidr
0 → 100644
View file @
e6c7245c
> module Equality
> %default total
> %auto_implicits off
> %access public export
> ExtEq : {A, B : Type} -> (f, g : A -> B) -> Type
> ExtEq {A} f g = (a : A) -> f a = g a
2021.On the Correctness of Monadic Backward Induction/code/Framework.lidr
0 → 100644
View file @
e6c7245c
> module Framework
> import Syntax.PreorderReasoning
> import Equality
> import Sigma
> %default total
> %auto_implicits off
> %access public export
> infixr 7 :::
> infixr 7 <++>
> %hide map
> %hide pure
> %hide join
> %hide (>>=)
> %hide head
> %hide lteRefl
> %hide Prelude.Algebra.(<+>)
%-----------------------------------------------------------------------------%
%-----------------------------------------------------------------------------%
Botta et al. framework for Monadic Sequential Decision Problems
%-----------------------------------------------------------------------------%
%-----------------------------------------------------------------------------%
For readability this implementation is a simplified version of the
general framework (notably omitting the *viability* checks),
and uses a non-emptiness condition on the transition
function instead.
%-----------------------------------------------------------------------------%
Monadic dynamical system
%-----------------------------------------------------------------------------%
The components of a monadic dynamical system:
> M : Type -> Type
|M| is a functor:
> map : {A, B : Type} -> (A -> B) -> M A -> M B
|M| is a container monad:
> pure : {A : Type} -> A -> M A
> (>>=) : {A, B : Type} -> M A -> (A -> M B) -> M B
> join : {A : Type} -> M (M A) -> M A
> NotEmpty : {A : Type} -> M A -> Type
Families of state and control spaces indexed by decision steps:
> X : (t : Nat) -> Type
> Y : (t : Nat) -> X t -> Type
The transition function:
> next : (t : Nat) -> (x : X t) -> Y t x -> M (X (S t))
> nextNotEmpty : {t : Nat} -> (x : X t) -> (y : Y t x) ->
> NotEmpty (next t x y)
%-----------------------------------------------------------------------------%
Monadic sequential decision problem
%-----------------------------------------------------------------------------%
The components of a decision problem:
> Val : Type
> reward : (t : Nat) -> (x : X t) -> Y t x -> X (S t) -> Val
> (<+>) : Val -> Val -> Val
> zero : Val
> meas : M Val -> Val
A helper function:
> (<++>) : {A : Type} -> (f, g : A -> Val) -> A -> Val
> f <++> g = \ a => f a <+> g a
Policies and policy sequences:
> Policy : (t : Nat) -> Type
> Policy t = (x : X t) -> Y t x
> data PolicySeq : (t, n : Nat) -> Type where
> Nil : {t : Nat} -> PolicySeq t Z
> (::) : {t, n : Nat} -> Policy t -> PolicySeq (S t) n -> PolicySeq t (S n)
The value of policy sequences (Bellman equation):
> val : {t, n : Nat} -> PolicySeq t n -> X t -> Val
> val {t} Nil x = zero
> val {t} (p :: ps) x = let y = p x in
> let mx' = next t x y in
> (meas . map (reward t x y <++> val ps)) mx'
%-----------------------------------------------------------------------------%
% Backward Induction
%-----------------------------------------------------------------------------%
Preliminairies
--------------
Basic requirements for verified backward induction:
> (<=) : Val -> Val -> Type
> lteRefl : {a : Val} -> a <= a
> lteTrans : {a, b, c : Val} -> a <= b -> b <= c -> a <= c
> plusMonSpec : {a, b, c, d : Val} -> a <= b -> c <= d -> (a <+> c) <= (b <+> d)
> measMonSpec : {A : Type} -> (f, g : A -> Val) -> ((a : A) -> (f a) <= (g a)) ->
> (ma : M A) -> meas (map f ma) <= meas (map g ma)
Optimality of policy sequences:
> OptPolicySeq : {t, n : Nat} -> PolicySeq t n -> Type
> OptPolicySeq {t} {n} ps = (ps' : PolicySeq t n) -> (x : X t) ->
> val ps' x <= val ps x
Optimality of extensions of policy sequences:
> OptExt : {t, n : Nat} -> PolicySeq (S t) n -> Policy t -> Type
> OptExt {t} ps p = (p' : Policy t) -> (x : X t) ->
> val (p' :: ps) x <= val (p :: ps) x
Bellman's principle of optimality:
> Bellman : {t, n : Nat} ->
> (ps : PolicySeq (S t) n) -> OptPolicySeq ps ->
> (p : Policy t) -> OptExt ps p ->
> OptPolicySeq (p :: ps)
>
> Bellman {t} ps ops p oep (p' :: ps') x =
> -------------------------------------------------------
> let y' = p' x in
> let mx' = next t x y' in
> let f' = reward t x y' <++> val ps' in
> let f = reward t x y' <++> val ps in
> let s0 = \ x' => plusMonSpec lteRefl (ops ps' x') in
> let s1 = measMonSpec f' f s0 mx' in -- |val (p' :: ps') x <= val (p' :: ps) x|
> let s2 = oep p' x in -- |val (p' :: ps) x <= val (p :: ps) x|
> -------------------------------------------------------
> lteTrans s1 s2
Verification of backward induction with respect to |val|
---------------------------------------------------------
The empty policy sequence is optimal:
> nilOptPolicySeq : OptPolicySeq Nil
> nilOptPolicySeq Nil x = lteRefl
Now, provided that we can implement
> optExt : {t, n : Nat} -> PolicySeq (S t) n -> Policy t
> optExtSpec : {t, n : Nat} -> (ps : PolicySeq (S t) n) ->
> OptExt ps (optExt ps)
then
> bi : (t : Nat) -> (n : Nat) -> PolicySeq t n
> bi t Z = Nil
> bi t (S n) = let ps = bi (S t) n in
> optExt ps :: ps
is correct "by construction"
> biOptVal : (t : Nat) -> (n : Nat) -> OptPolicySeq (bi t n)
> biOptVal t Z = nilOptPolicySeq
> biOptVal t (S n) =
> ------------------------------
> let ps = bi (S t) n in
> let ops = biOptVal (S t) n in
> let p = optExt ps in
> let oep = optExtSpec ps in
> ------------------------------
> Bellman ps ops p oep
Optimal Extensions
------------------
The generic implementation of backward induction |bi|
naturally raises the question under which conditions one can
implement
< optExt : Functor M => {t, n : Nat} ->
PolicySeq (S t) n -> Policy t
such that
< optExtSpec : Functor M => {t, n : Nat} ->
< (ps : PolicySeq (S t) n) -> OptExt ps (optExt ps)
To this end, consider the function
> cval : {t, n : Nat} ->
> PolicySeq (S t) n -> (x : X t) -> Y t x -> Val
> cval {t} ps x y = let mx' = next t x y in
> meas (map (reward t x y <++> val ps) mx')
By definition of |val| and |cval|, one has
val (p :: ps) x
=
meas (map (reward t x (p x) <++> val ps) (next t x (p x)))
=
cval ps x (p x)
This suggests that, if we can maximize |cval| that is, implement
> cvalmax : {t, n : Nat} ->
> PolicySeq (S t) n -> (x : X t) -> Val
> cvalargmax : {t, n : Nat} ->
> PolicySeq (S t) n -> (x : X t) -> Y t x
that fulfill
> cvalmaxSpec : {t, n : Nat} ->
> (ps : PolicySeq (S t) n) -> (x : X t) ->
> (y : Y t x) -> cval ps x y <= cvalmax ps x
> cvalargmaxSpec : {t, n : Nat} ->
> (ps : PolicySeq (S t) n) -> (x : X t) ->
> cvalmax ps x = cval ps x (cvalargmax ps x)
then we can implement optimal extensions of arbitrary policy
sequences. As it turns out, this intuition is correct. With
> optExt = cvalargmax
one has
> optExtSpec {t} {n} ps p' x = s4 where
> p : Policy t
> p = optExt ps
> y : Y t x
> y = p x
> y' : Y t x
> y' = p' x
> s1 : cval ps x y' <= cvalmax ps x
> s1 = cvalmaxSpec ps x y'
> s2 : cval ps x y' <= cval ps x (cvalargmax ps x)
> s2 = replace {P = \ z => (cval ps x y' <= z)} (cvalargmaxSpec ps x) s1
> s3 : cval ps x y' <= cval ps x y
> s3 = s2
> s4 : val (p' :: ps) x <= val (p :: ps) x
> s4 = s3
|cvalmax| and |cvalargmax|
--------------------------
The observation that functions |cvalmax| and |cvalargmax|
that fulfill |cvalmaxSpec| and |cvalargmaxSpec|
are sufficient to implement an optimal extension |optExt| that fulfills
|optExtSpec| naturally raises the question of what are necessary and
sufficient conditions for |cvalmax| and |cvalargmax|.
Answering this question necessarily requires discussing properties of
|cval| and goes well beyond the scope of formulating a theory of SDPs.
Here, we limit ourselves to remark that if |Y t x| is finite and
non-empty one can implement |cvalmax|, |cvalargmax|, |cvalmaxSpec| and
|cvalargmaxSpec| by linear search.
Concrete implementations can be found in the SequentialDecisionProblems
folder of the IdrisLibs git repository [2],
together with the original BJI-framework, tabulated backward induction and
several examples.
Backward induction for some examples and counter-examples from [1]
can be tried out by compiling |LeightweightTheory.lidr| in said folder.
%-----------------------------------------------------------------------------%
%-----------------------------------------------------------------------------%
Correctness of Monadic Backward Induction
%-----------------------------------------------------------------------------%
%-----------------------------------------------------------------------------%
This part of the file contains the results of [1].
%-----------------------------------------------------------------------------%
Extension of the BJI-framework
%-----------------------------------------------------------------------------%
The type of state-control sequences:
> data StateCtrlSeq : (t, n : Nat) -> Type where
> Last : {t : Nat} -> X t -> StateCtrlSeq t (S Z)
> (:::) : {t, n : Nat} -> Sigma (X t) (Y t) ->
> StateCtrlSeq (S t) (S n) -> StateCtrlSeq t (S (S n))
A function that extracts the head state from a state-control sequence:
> head : {t, n : Nat} -> StateCtrlSeq t (S n) -> X t
> head (Last x) = x
> head (MkSigma x y ::: xys) = x
The reward accumulated along a state-control sequence:
> sumR : {t, n : Nat} -> StateCtrlSeq t n -> Val
> sumR {t} (Last x) = zero
> sumR {t} (MkSigma x y ::: xys) = reward t x y (head xys) <+> sumR xys
Computation of an |M|-structure of trajectories from a given policy sequence:
> trj : {t, n : Nat} -> PolicySeq t n -> X t -> M (StateCtrlSeq t (S n))
> trj {t} Nil x = (pure . Last) x
> trj {t} (p :: ps) x = let y = p x in
> let mx' = next t x y in
> map ((MkSigma x y) :::) (mx' >>= trj ps)
The function that computes the measured total reward:
> val' : {t, n : Nat} -> PolicySeq t n -> X t -> Val
> val' ps = meas . map sumR . trj ps
%-----------------------------------------------------------------------------%
Functor and monad properties
%-----------------------------------------------------------------------------%
Functor properties
------------------
|map| preserves composition:
> mapPresComp : {A, B, C : Type} ->
> (f : A -> B) -> (g : B -> C) ->
> map (g . f) `ExtEq` (map g . map f)
|map| preserves extensional (pointwise) equality:
> mapPresExtEq : {A, B : Type} ->
> (f, g : A -> B) -> f `ExtEq` g -> (map f) `ExtEq` (map g)
Basic monad properties
----------------------
|pure| and |join| are natural transformations:
> pureNatTrans : {A, B : Type} -> (f : A -> B) ->
> map f . pure `ExtEq` pure . f
> joinNatTrans : {A, B : Type} -> (f : A -> B) ->
> map f . join `ExtEq` join . map (map f)
|bind|, |join| and |map| need to be inter-related:
> bindJoinSpec : {A, B : Type} -> (f : A -> M B) ->
> (>>= f) `ExtEq` join . map f
|NonEmpty| conditions
---------------------
Condition on |pure|:
> pureNotEmpty : {A : Type} -> (a : A) -> NotEmpty (pure a)
>
Conditions on |map| and |bind|:
> mapPresNotEmpty : {A, B : Type} -> (f : A -> B) -> (ma : M A) ->
> NotEmpty ma -> NotEmpty (map f ma)
> bindPresNotEmpty : {A, B : Type} -> (f : A -> M B) -> (ma : M A) ->
> NotEmpty ma -> ((a : A) -> NotEmpty (f a)) ->
> NotEmpty (ma >>= f)
Properties of monad algebras
----------------------------
Condition for a function |f| to be an |M|-algebra homomorphism:
map f
M A ---------------- > M B
| |
| alpha | beta
| |
v v
A ------------------> B
f
> algMorSpec : {A, B : Type} ->
> (alpha : M A -> A) -> (beta : M B -> B) -> (f : A -> B) -> Type
> algMorSpec {A} {B} alpha beta f = (beta . map f) `ExtEq` (f . alpha)
Structure maps of |M|-algebras are left inverses of |pure|:
> algPureSpec : {A : Type} -> (alpha : M A -> A) -> Type
> algPureSpec {A} alpha = alpha . pure `ExtEq` id
Structure maps of |M|-algebras are themselves |M|-algebra homomorphisms:
> -- (alpha . map alpha) `ExtEq` (alpha . join)
> algJoinSpec : {A : Type} -> (alpha : M A -> A) -> Type
> algJoinSpec {A} alpha = algMorSpec join alpha alpha
>
A lemma about computation with |M|-algebras:
> algLemma : {A, B, C : Type} -> (alpha: M C -> C) -> (ee : algJoinSpec alpha) ->
> (f : B -> C ) -> (g : A -> M B) ->
> (alpha . map (alpha . map f . g)) `ExtEq` (alpha . map f . join . map g)
> algLemma {A} {B} {C} alpha ee f g ma =
>
> ( (alpha . map (alpha . map f . g)) ma )
>
> ={ cong (mapPresComp g (alpha . map f) ma) }=
>
> ( (alpha . map (alpha . map f) . map g) ma )
>
> ={ cong (mapPresComp (map f) alpha (map g ma)) }=
>
> ( (alpha . map alpha . map (map f) . map g) ma )
>
> ={ ee (map (map f) (map g ma)) }=
>
> ( (alpha . join . map (map f) . map g) ma )
>
> ={ cong (sym (joinNatTrans f (map g ma))) }=
>
> ( (alpha . map f . join . map g) ma )
>
> QED
%-----------------------------------------------------------------------------%
Measure requirements
%-----------------------------------------------------------------------------%
Three sufficient requirements the measure |meas| needs to fulfill for
the theorem to hold:
The measure needs to be the structure map of an |M|-algebra on |Val|.
This means it has to fulfill two conditions.
1. It is a left inverse to |pure|:
> measPureSpec : algPureSpec meas -- |meas . pure `ExtEq` id|
2. It is an |M|-algebra homomorphism from |join| to itself:
> measJoinSpec : algJoinSpec meas -- |meas . join `ExtEq` meas . map meas|
3. For |v : Val|, the function |(v <+>)| is an |M|-algebra homomorphism:
For all |v : Val|,
map (v <+>)
M Val ---------------> M Val
| |
| meas | meas
| |
v v
Val ----------------> Val
(v <+>)
> measPlusSpec : (v : Val) -> (mv : M Val) -> (NotEmpty mv) ->
> (meas . map (v <+>)) mv = ((v <+>) . meas) mv
We can omit the non-emptiness condition but this means we implicitly
restrict the class of monads for which the result applies |M|
(because |measPlusSpec'| might not be provable).
> -- ∀ v : Val, meas . map (v <+>) `ExtEq` (v <+>) . meas
> measPlusSpec' : (v : Val) -> algMorSpec meas meas ((<+>) v)
%-----------------------------------------------------------------------------%
Lemmas
%-----------------------------------------------------------------------------%
The lemmas will be implemented below the proof of the main theorem.
> oplusLiftEERight : {A : Type} -> (f, g, r : A -> Val) ->
> ExtEq {A} f g -> ExtEq {A} (r <++> f) (r <++> g)
> trjNotEmptyLemma : {t, n : Nat} -> (ps : PolicySeq t n) -> (x : X t) ->
> NotEmpty (trj ps x)
> hdTrjLemma : {t, n : Nat} -> (ps : PolicySeq t n) ->
> (r : X t -> Val) ->
> (s : StateCtrlSeq t (S n) -> Val) -> (x' : X t) ->
> (map (r . head <++> s) . trj ps) x' =
> (map (const (r x') <++> s) . trj ps) x'
> measSumLemma : {t, n : Nat} -> (ps : PolicySeq t n) ->
> (r : X t -> Val) ->
> (s : StateCtrlSeq t (S n) -> Val) ->
> (meas . map (r . head <++> s) . trj ps) `ExtEq`
> (r <++> meas . map s . trj ps)
%-----------------------------------------------------------------------------%
Proof of the main theorem
%-----------------------------------------------------------------------------%
The proof of the main theorem from section \ref{section:valval} that
|val| computes the measured total reward because it is
extensionally equal to |val'|:
> valMeasTotalReward : {t, n : Nat} -> (ps : PolicySeq t n) -> (x : X t) ->
> val' ps x = val ps x