Commit 6e6f0136 authored by Nicola Botta's avatar Nicola Botta
Browse files

Added example.

parent 9f0f0c19
> module SequentialDecisionProblems.NaiveTheory
> module Main
> import Data.Fin
> import Data.Vect
> import Isomorphism.Isomorphism
> import Control.Isomorphism
> import Control.Monad.Identity
> %default total
> %access public export
> %auto_implicits off
> -- infixr 7 <+>
> infixr 7 <++>
......@@ -101,7 +105,7 @@ of the |reward|-rewards obtained along the trajectories starting in |x|.
> f : ((x' : X (S t)) -> Val)
> f = \ x' => reward t x y' x' <+> val ps x'
> s0 : ((x' : X (S t)) -> f' x' <= f x')
> s0 = \ x' => plusMon NaiveTheory.lteRefl (ops ps' x')
> s0 = \ x' => plusMon Main.lteRefl (ops ps' x')
> s1 : (val (p' :: ps') x <= val (p' :: ps) x)
> s1 = measMon f' f s0 mx'
> s2 : (val (p' :: ps) x <= val (p :: ps) x)
......@@ -112,7 +116,7 @@ of the |reward|-rewards obtained along the trajectories starting in |x|.
First, the empty policy sequence is optimal:
> nilOptPolicySeq : Functor M => OptPolicySeq Nil
> nilOptPolicySeq Nil x = NaiveTheory.lteRefl
> nilOptPolicySeq Nil x = Main.lteRefl
Provided that we can implement optimal extensions of arbitrary policy
sequences
......@@ -198,9 +202,7 @@ correct. With
, one has
> optExtLemma : {t, n : Nat} -> Functor M =>
> (ps : PolicySeq (S t) n) -> OptExt ps (optExt ps)
> optExtLemma {t} {n} ps p' x = s4 where
> optExtSpec {t} {n} ps p' x = s4 where
> p : Policy t
> p = optExt ps
> y : Y t x
......@@ -219,7 +221,132 @@ correct. With
* Example
The naive theory is simple and straightforward but has a major flaw.
What if |Y t x| is empty? In this case, we cannot compute a control for
|x|. There is so far nothing in our formulation that prevents the set of
controls associated with a certain state to be empty.
As a result, we might not be able to ``solve'' even very simple decision
problems. This is made evident in the following example. Let
> head : {t, n : Nat} -> PolicySeq t (S n) -> Policy t
>
> head (p :: ps) = p
> tail : {t, n : Nat} -> PolicySeq t (S n) -> PolicySeq (S t) n
>
> tail (p :: ps) = ps
and
> data GoodOrBad = Good | Bad
> implementation Show GoodOrBad where
>
> show Good = "Good"
>
> show Bad = "Bad"
> data UpOrDown = Up | Down
Consider the decision problem
> M = Identity
> unId : {A : Type} -> Identity A -> A
> unId (Id a) = a
> X t = GoodOrBad
> Y t Good = UpOrDown
>
> Y t Bad = Void
In |Good| there are two options: |Up| and |Down|. But in |Bad| there are
no controls to select. We can define a transition function
> next t Good Up = Id Good
>
> next t Good Down = Id Bad
>
> next t Bad v impossible
but we will not be able to apply |next| for the |Bad| case unless we
manage to construct a value |v : Void|. This is impossible. Still, we
can complete the specification of the problem:
> Val = Nat
>
> (<+>) = (+)
>
> zero = Z
>
> (<=) = Prelude.Nat.LTE
>
> reward t Good Up x' = 1
>
> reward t Good Down x' = 3
>
> reward t Bad v x' impossible
>
> meas = unId
Notice that, again, we will not be able to compute an argument |v :
Void| to apply |reward|. This also implies that we cannot give a
complete implementation of |cvalargmax|. We can compute a best control
for |Good|:
> cvalargmax {t} ps Good =
>
> let x'Up = unId (next t Good Up) in
>
> let x'Down = unId (next t Good Down) in
>
> let valUp = reward t Good Up x'Up <+> val ps x'Up in
>
> let valDown = reward t Good Down x'Down <+> val ps x'Down in
>
> if valUp >= valDown then Up else Down
But not for |Bad|:
> cvalargmax {t} ps Bad = ?whatNow
In spite of this, we can try to compute a policy sequence for two
decision steps and see what happens:
> computation : IO ()
>
> computation = let ps = bi 0 2 in
>
> let x0 = Good in
>
> let p0 = head ps in
>
> let y0 = p0 x0 in
>
> let x1 = unId (next Z x0 y0) in
>
> let p1 = head (tail ps) in
>
> let y1 = p1 x1 in
>
> let x2 = unId (next 1 x1 y1) in
>
> do putStrLn ("x0 = " ++ show x0)
>
> putStrLn ("x1 = " ++ show x1)
>
> putStrLn ("x2 = " ++ show x2)
> main : IO ()
>
> main = computation
> {-
> ---}
** Memoisation
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment