Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
Nicola Botta
IdrisLibs2
Commits
dd7e4fb0
Commit
dd7e4fb0
authored
Nov 23, 2020
by
Nicola Botta
Browse files
Replaced with clauses.
parent
aa85068a
Changes
1
Hide whitespace changes
Inline
Side-by-side
SequentialDecisionProblems/NaiveTheory.lidr
View file @
dd7e4fb0
...
@@ -3,7 +3,7 @@
...
@@ -3,7 +3,7 @@
> -- infixr 7 <+>
> -- infixr 7 <+>
> infixr 7 <++>
> infixr 7 <++>
S
equential decision process
* Monadic s
equential decision process
:
> M : Type -> Type
> M : Type -> Type
> fM : Functor M
> fM : Functor M
...
@@ -13,7 +13,7 @@ Sequential decision process
...
@@ -13,7 +13,7 @@ Sequential decision process
> next : (t : Nat) -> (x : X t) -> Y t x -> M (X (S t))
> next : (t : Nat) -> (x : X t) -> Y t x -> M (X (S t))
S
equential decision problem
* Monadic s
equential decision problem
:
> Val : Type
> Val : Type
> zero : Val
> zero : Val
...
@@ -29,12 +29,26 @@ Sequential decision problem
...
@@ -29,12 +29,26 @@ Sequential decision problem
> measMon : {A : Type} -> Functor M => (f, g : A -> Val) -> ((a : A) -> f a <= g a) ->
> measMon : {A : Type} -> Functor M => (f, g : A -> Val) -> ((a : A) -> f a <= g a) ->
> (ma : M A) -> meas (map f ma) <= meas (map g ma)
> (ma : M A) -> meas (map f ma) <= meas (map g ma)
For a fixed number of decision steps |n = S m|, the problem consists of
finding a sequence
The theory
< [p0, p1, ..., pm]
of decision rules
< p0 : (x : X 0) -> Y 0 x
< p1 : (x : X 1) -> Y 1 x
< ...
< pm : (x : X m) -> Y m x
that, for any |x : X 0|, maximizes the |meas|-measure of the |<+>|-sum
of the |reward|-rewards obtained along the trajectories starting in |x|.
> (<++>) : {A : Type} -> (f, g : A -> Val) -> A -> Val
> f <++> g = \ a => f a <+> g a
* The theory:
** Policies and policy sequences:
> Policy : (t : Nat) -> Type
> Policy : (t : Nat) -> Type
> Policy t = (x : X t) -> Y t x
> Policy t = (x : X t) -> Y t x
...
@@ -45,11 +59,17 @@ The theory
...
@@ -45,11 +59,17 @@ The theory
> (::) : {t, n : Nat} -> Policy t -> PolicySeq (S t) n -> PolicySeq t (S n)
> (::) : {t, n : Nat} -> Policy t -> PolicySeq (S t) n -> PolicySeq t (S n)
> (<++>) : {A : Type} -> (f, g : A -> Val) -> A -> Val
> f <++> g = \ a => f a <+> g a
> val : {t, n : Nat} -> Functor M => PolicySeq t n -> X t -> Val
> val : {t, n : Nat} -> Functor M => PolicySeq t n -> X t -> Val
> val {t} Nil x = zero
> val {t} Nil x = zero
> val {t} (p :: ps) x = let y = p x in
> val {t} (p :: ps) x = meas (map {f = M} (reward t x y <++> val ps) mx')
> let mx' = next t x y in
> where y : Y t x
> meas (map {f = M} (reward t x y <++> val ps) mx')
> y = p x
> mx' : M (X (S t))
> mx' = next t x y
> OptPolicySeq : {t, n : Nat} -> Functor M => PolicySeq t n -> Type
> OptPolicySeq : {t, n : Nat} -> Functor M => PolicySeq t n -> Type
...
@@ -60,25 +80,86 @@ The theory
...
@@ -60,25 +80,86 @@ The theory
> OptExt {t} ps p = (p' : Policy t) -> (x : X t) -> val (p' :: ps) x <= val (p :: ps) x
> OptExt {t} ps p = (p' : Policy t) -> (x : X t) -> val (p' :: ps) x <= val (p :: ps) x
Bellman's principle of optimality (1957):
> Bellman : {t, n : Nat} -> Functor M =>
> Bellman : {t, n : Nat} -> Functor M =>
> (ps : PolicySeq (S t) n) -> OptPolicySeq ps ->
> (ps : PolicySeq (S t) n) -> OptPolicySeq ps ->
> (p : Policy t) -> OptExt ps p ->
> (p : Policy t) -> OptExt ps p ->
> OptPolicySeq (p :: ps)
> OptPolicySeq (p :: ps)
>
> Bellman {t} ps ops p oep (p' :: ps') x = lteTrans s1 s2 where
> y' : Y t x
> y' = p' x
> mx' : M (X (S t))
> mx' = next t x y'
> f' : ((x' : X (S t)) -> Val)
> f' = \ x' => reward t x y' x' <+> val ps' x'
> f : ((x' : X (S t)) -> Val)
> f = \ x' => reward t x y' x' <+> val ps x'
> s0 : ((x' : X (S t)) -> f' x' <= f x')
> s0 = \ x' => plusMon NaiveTheory.lteRefl (ops ps' x')
> s1 : (val (p' :: ps') x <= val (p' :: ps) x)
> s1 = measMon f' f s0 mx'
> s2 : (val (p' :: ps) x <= val (p :: ps) x)
> s2 = oep p' x
The empty policy sequence is optimal:
> nilOptPolicySeq : Functor M => OptPolicySeq Nil
> nilOptPolicySeq Nil x = NaiveTheory.lteRefl
Now, provided that we can implement
> optExt : {t, n : Nat} -> PolicySeq (S t) n -> Policy t
> optExtSpec : {t, n : Nat} -> Functor M =>
> (ps : PolicySeq (S t) n) -> OptExt ps (optExt ps)
then
> bi : (t : Nat) -> (n : Nat) -> PolicySeq t n
> bi t Z = Nil
> bi t (S n) = optExt ps :: ps
> where ps : PolicySeq (S t) n
> ps = bi (S t) n
is correct "by construction"
> biLemma : Functor M => (t : Nat) -> (n : Nat) -> OptPolicySeq (bi t n)
> biLemma t Z = nilOptPolicySeq
> biLemma t (S n) = Bellman ps ops p oep
> where ps : PolicySeq (S t) n
> ps = bi (S t) n
> ops : OptPolicySeq ps
> ops = ?kika -- biLemma (S t) n
> p : Policy t
> p = optExt ps
> oep : OptExt ps p
> oep = ?lala -- optExtSpec ps
> {-
> Bellman {t} ps ops p oep (p' :: ps') x =
> Bellman {t} ps ops p oep (p' :: ps') x =
> let y' = p' x in
> let y' = p' x in
> let mx' = next t x y' in
> let mx' = next t x y' in
> let f' = \ x' => reward t x y' x' <+> val ps' x' in
> let f' : ((x' : X (S t)) -> Val)
> let f = \ x' => reward t x y' x' <+> val ps x' in
> = \ x' => reward t x y' x' <+> val ps' x' in
> let f : ((x' : X (S t)) -> Val)
> = \ x' => reward t x y' x' <+> val ps x' in
> let s0 : ((x' : X (S t)) -> f' x' <= f x')
> let s0 : ((x' : X (S t)) -> f' x' <= f x')
> = \ x' => plusMon NaiveTheory.lteRefl (ops ps' x') in
> = \ x' => plusMon
(
NaiveTheory.lteRefl
{v = reward t x y' x'})
(ops ps' x') in
> let s1 : (val (p' :: ps') x <= val (p' :: ps) x)
> let s1 : (val (p' :: ps') x <= val (p' :: ps) x)
> = measMon f' f s0 mx' in
> =
?kuka in --
measMon f' f s0 mx' in
> let s2 : (val (p' :: ps) x <= val (p :: ps) x)
> let s2 : (val (p' :: ps) x <= val (p :: ps) x)
> = oep p' x in
> = oep p' x in
> lteTrans s1 s2
> lteTrans s1 s2
> {-
> ---}
> ---}
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment