Commit e0f847b0 authored by Nuria Brede's avatar Nuria Brede
Browse files

Initial.

parent cb5b4d48
> module LightweightTheory
> import Data.Fin
> import Data.Vect
> import Basic.Predicates
> import Finite.Predicates
> import Finite.Operations
> import Finite.Properties
> import Sigma.Sigma
> import Opt.Operations
> import Rel.TotalPreorder
> import Rel.TotalPreorderOperations
> import VeriMonad.VeriMonad
> import VeriMonad.NotEmpty
> import VeriMonad.Algebras
> %default total
> %access public export
> %auto_implicits off
> -- infixr 7 <+>
> infixr 7 <++>
> infixr 7 ##
* Monadic sequential decision process:
> M : Type -> Type
> -- fM : Functor M
> -- mM : Monad M
> X : (t : Nat) -> Type
> Y : (t : Nat) -> X t -> Type
> next : (t : Nat) -> (x : X t) -> Y t x -> M (X (S t))
> nextNotEmpty : VeriMonad M =>
> {t : Nat} -> (x : X t) -> (y : Y t x) ->
> NotEmpty (next t x y)
* Decision problem:
> Val : Type
> zero : Val
> (<+>) : Val -> Val -> Val
> (<=) : Val -> Val -> Type
> lteTP : TotalPreorder (<=)
> plusMon : {v1, v2, v3, v4 : Val} -> v1 <= v2 -> v3 <= v4 -> (v1 <+> v3) <= (v2 <+> v4)
> reward : (t : Nat) -> (x : X t) -> Y t x -> X (S t) -> Val
> meas : M Val -> Val
> measMon : Functor M =>
> {A : Type} -> (f, g : A -> Val) -> ((a : A) -> f a <= g a) ->
> (ma : M A) -> meas (map f ma) <= meas (map g ma)
> measPureSpec : VeriMonad M => algPureSpec meas -- |meas . pure `ExtEq` id|
> measJoinSpec : VeriMonad M => algJoinSpec meas -- |meas . join `ExtEq`
> measPlusSpec : VeriMonad M => (v : Val) -> (mv : M Val) -> (NotEmpty mv) ->
> (meas . map (v <+>)) mv = ((v <+>) . meas) mv
For a fixed number of decision steps |n = S m|, the problem consists of
finding a sequence
[p0, p1, ..., pm]
of decision rules
p0 : (x0 : X 0) -> Y 0 x0
p1 : (x1 : X 1) -> Y 1 x1
...
pm : (xm : X m) -> Y m xm
that, for any |x0 : X 0|, maximizes the |meas|-measure of the |<+>|-sum
of the |reward|-rewards obtained along the trajectories starting in |x|.
* The theory:
** Policies and policy sequences:
> Policy : (t : Nat) -> Type
> Policy t = (x : X t) -> Y t x
> data PolicySeq : (t : Nat) -> (n : Nat) -> Type where
> Nil : {t : Nat} -> PolicySeq t Z
> (::) : {t, n : Nat} -> Policy t -> PolicySeq (S t) n -> PolicySeq t (S n)
** Value function:
> (<++>) : {A : Type} -> (f, g : A -> Val) -> A -> Val
> f <++> g = \ a => f a <+> g a
> val : Functor M => {t, n : Nat} -> PolicySeq t n -> X t -> Val
> val {t} Nil x = zero
> val {t} (p :: ps) x = let y = p x in
> let mx' = next t x y in
> meas (map (reward t x y <++> val ps) mx')
** Optimality of policy sequences
> OptPolicySeq : Functor M => {t, n : Nat} -> PolicySeq t n -> Type
> OptPolicySeq {t} {n} ps = (ps' : PolicySeq t n) -> (x : X t) -> val ps' x <= val ps x
** Bellman's principle of optimality (1957):
> OptExt : Functor M => {t, n : Nat} -> PolicySeq (S t) n -> Policy t -> Type
> OptExt {t} ps p = (p' : Policy t) -> (x : X t) -> val (p' :: ps) x <= val (p :: ps) x
> Bellman : Functor M => {t, n : Nat} ->
> (ps : PolicySeq (S t) n) -> OptPolicySeq ps ->
> (p : Policy t) -> OptExt ps p ->
> OptPolicySeq (p :: ps)
>
> Bellman {t} ps ops p oep (p' :: ps') x =
> let v1 = val (p' :: ps') x in
> let v2 = val (p' :: ps) x in
> let v3 = val (p :: ps) x in
> let y' = p' x in
> let mx' = next t x y' in
> let f' = \ x' => reward t x y' x' <+> val ps' x' in
> let f = \ x' => reward t x y' x' <+> val ps x' in
> let s0 : ((x' : X (S t)) -> f' x' <= f x')
> = \ x' => plusMon (reflexive lteTP (reward t x y' x')) (ops ps' x') in
> let s1 : (v1 <= v2)
> = measMon f' f s0 mx' in
> let s2 : (v2 <= v3)
> = oep p' x in
> transitive lteTP v1 v2 v3 s1 s2
** Backwards induction
First, the empty policy sequence is optimal:
> nilOptPolicySeq : Functor M => OptPolicySeq Nil
> nilOptPolicySeq Nil x = reflexive lteTP zero
Provided that we can implement optimal extensions of arbitrary policy
sequences
> optExt : Functor M => {t, n : Nat} ->
> PolicySeq (S t) n -> Policy t
> optExtSpec : Functor M => {t, n : Nat} ->
> (ps : PolicySeq (S t) n) -> OptExt ps (optExt ps)
, we can compute optimal policy sequences backwards, starting from the
empty policy sequence:
> bi : Functor M => (t : Nat) -> (n : Nat) -> PolicySeq t n
> bi t Z = Nil
> bi t (S n) = let ps = bi (S t) n in optExt ps :: ps
This generic backwards induction computes optimal policy sequences w.r.t. |val|:
> biOptVal : Functor M => (t : Nat) -> (n : Nat) -> OptPolicySeq (bi t n)
> biOptVal t Z = nilOptPolicySeq
> biOptVal t (S n) = Bellman ps ops p oep
> where ps : PolicySeq (S t) n
> ps = bi (S t) n
> ops : OptPolicySeq ps
> ops = biOptVal (S t) n
> p : Policy t
> p = optExt ps
> oep : OptExt ps p
> oep = optExtSpec ps
** Optimal extensions
The generic implementation of backwards induction |bi|
naturally raises the question of under which conditions one can
implement
< optExt : Functor M => {t, n : Nat} ->
PolicySeq (S t) n -> Policy t
such that
< optExtSpec : Functor M => {t, n : Nat} ->
< (ps : PolicySeq (S t) n) -> OptExt ps (optExt ps)
To this end, consider the function
> cval : Functor M => {t, n : Nat} ->
> PolicySeq (S t) n -> (x : X t) -> Y t x -> Val
> cval {t} ps x y = let mx' = next t x y in
> meas (map (reward t x y <++> val ps) mx')
By definition of |val| and |cval|, one has
val (p :: ps) x
=
meas (map (reward t x (p x) <++> val ps) (next t x (p x)))
=
cval ps x (p x)
This suggests that, if we can maximize |cval| that is, implement
> cvalmax : Functor M => {t, n : Nat} ->
> PolicySeq (S t) n -> (x : X t) -> Val
> cvalargmax : Functor M => {t, n : Nat} ->
> PolicySeq (S t) n -> (x : X t) -> Y t x
that fulfill
> cvalmaxSpec : Functor M => {t, n : Nat} ->
> (ps : PolicySeq (S t) n) -> (x : X t) ->
> (y : Y t x) -> cval ps x y <= cvalmax ps x
> cvalargmaxSpec : Functor M => {t, n : Nat} ->
> (ps : PolicySeq (S t) n) -> (x : X t) ->
> cvalmax ps x = cval ps x (cvalargmax ps x)
then we can implement optimal extensions of arbitrary policy
sequences. As it turns out, this intuition is correct. With
> optExt = cvalargmax
, one has
> optExtSpec {t} {n} ps p' x = s4 where
> p : Policy t
> p = optExt ps
> y : Y t x
> y = p x
> y' : Y t x
> y' = p' x
> s1 : cval ps x y' <= cvalmax ps x
> s1 = cvalmaxSpec ps x y'
> s2 : cval ps x y' <= cval ps x (cvalargmax ps x)
> s2 = replace {P = \ z => (cval ps x y' <= z)} (cvalargmaxSpec ps x) s1
> s3 : cval ps x y' <= cval ps x y
> s3 = s2
> s4 : val (p' :: ps) x <= val (p :: ps) x
> s4 = s3
** |cvalmax| and |cvalargmax|
The observation that
< cvalmax : Functor M => {t, n : Nat} ->
< PolicySeq (S t) n -> (x : X t) -> Val
< cvalargmax : Functor M => {t, n : Nat} ->
< PolicySeq (S t) n -> (x : X t) -> Y t x
that fulfill
< cvalmaxSpec : Functor M => {t, n : Nat} ->
< (ps : PolicySeq (S t) n) -> (x : X t) ->
< (y : Y t x) -> cval ps x y <= cvalmax ps x
< cvalargmaxSpec : Functor M => {t, n : Nat} ->
< (ps : PolicySeq (S t) n) -> (x : X t) ->
< cvalmax ps x = cval ps x (cvalargmax ps x)
are sufficient to implement an optimal extension |optExt| that fulfills
|optExtSpec| naturally raises the question of what are necessary and
sufficient conditions for |cvalmax| and |cvalargmax|.
Answering this question necessarily requires discussing properties of
|cval| and goes well beyond the scope of formulating a theory of SDPs.
Here, we limit ourselves to remark that if |Y t x| is finite and not
empty:
> finiteY : (t : Nat) -> (x : X t) -> Finite (Y t x)
> notEmptyY : (t : Nat) -> (x : X t) -> Y t x
one can implement |cvalmax|, |cvalargmax|, |cvalmaxSpec| and
|cvalargmaxSpec| by linear search:
> cardNotZero : (t : Nat) -> (x : X t) -> CardNotZ (finiteY t x)
> cardNotZero t x = cardNotZLemma (finiteY t x) (notEmptyY t x)
> cvalmax {t} ps x = Opt.Operations.max lteTP (finiteY t x) (cardNotZero t x) (cval ps x)
> cvalargmax {t} ps x = Opt.Operations.argmax lteTP (finiteY t x) (cardNotZero t x) (cval ps x)
> cvalmaxSpec {t} ps x = Opt.Operations.maxSpec lteTP (finiteY t x) (cardNotZero t x) (cval ps x)
> cvalargmaxSpec {t} ps x = Opt.Operations.argmaxSpec lteTP (finiteY t x) (cardNotZero t x) (cval ps x)
** State-control sequences, trajectories, ...
> data StateCtrlSeq : (t : Nat) -> (n : Nat) -> Type where
> Last : {t : Nat} -> (x : X t) -> StateCtrlSeq t (S Z)
> (##) : {t, n : Nat} -> DPair (X t) (Y t) -> StateCtrlSeq (S t) (S n) -> StateCtrlSeq t (S (S n))
> trj : Monad M => {t, n : Nat} -> (ps : PolicySeq t n) -> (x : X t) -> M (StateCtrlSeq t (S n))
> trj {t} Nil x = pure (Last x)
> trj {t} (p :: ps) x = let y = p x in
> let mx' = next t x y in
> map ((MkDPair x y) ##) (mx' >>= trj ps)
> head : {t, n : Nat} -> StateCtrlSeq t (S n) -> X t
> head (Last x) = x
> head (MkDPair x y ## xys) = x
> totalReward : {t, n : Nat} -> StateCtrlSeq t n -> Val
> totalReward {t} (Last x) = zero
> totalReward {t} (MkDPair x y ## xys) = reward t x y (head xys) <+> totalReward xys
> showX : {t : Nat} -> X t -> String
> showY : {t : Nat} -> {x : X t} -> Y t x -> String
> showXY : {t : Nat} -> DPair (X t) (Y t) -> String
> showXY (MkDPair x y) = "(" ++ showX x ++ " ** " ++ showY y ++ ")"
> showStateCtrlSeq : {t, n : Nat} -> StateCtrlSeq t n -> String
> showStateCtrlSeq xys = "[" ++ show'' "" xys ++ "]"
> where show'' : {t, n : Nat} -> String -> StateCtrlSeq t n -> String
> show'' acc (Last x) = acc ++ "(" ++ showX x ++ " ** " ++ " " ++ ")"
> show'' acc (xy ## xys) = show'' (acc ++ showXY xy ++ ", ") xys
> using (t : Nat, n : Nat)
> implementation Show (StateCtrlSeq t n) where
> show = showStateCtrlSeq
# Lightweight BJI-Framework
## Contents
- [LightweightTheory](LightweightTheory.lidr):
This file contains the lightweight version of the BJI-framework discussed in [1], augmented with the necessary implementations for computing optimal extensions and using the framework's backward induction algorithm.
This file needs the library from [2] to type-check.
## Type-checking
To type check `LightweightTheory.lidr`
- download IdrisLibs [2].
- enter `idris -i $IDRISLIBS --sourcepath $IDRISLIBS --allow-capitalized-pattern-variables LightweightTheory.lidr` to type-check and load the file into the Idris REPL
or
- enter `idris -i $IDRISLIBS --sourcepath $IDRISLIBS --allow-capitalized-pattern-variables --check LightweightTheory.lidr` to just type-check the file
where `IDRISLIBS` is the path of `IdrisLibs`.
## References
[1] Brede, N., Botta, N. (2021). On the Correctness of Monadic Backward Induction. *submitted to J. Funct. Program*.
[2] Botta, N. (2016-2021). [IdrisLibs](https://gitlab.pik-potsdam.de/botta/IdrisLibs).
[3] Botta, N., Jansson, P., Ionescu, C. (2017). Contributions to a computational theory of policy
advice and avoidability. J. Funct. Program. 27:e23.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment