Commit dff6bf7a authored by Nicola Botta's avatar Nicola Botta
Browse files

Initial revision.

parents
This diff is collapsed.
> module SequentialDecisionProblems.FullTheory
> import SequentialDecisionProblems.CoreTheory
> import Sigma.Sigma
> import Sigma.Operations
> %default total
> %access public export
> %auto_implicits off
> -- %hide Prelude.Nat.LTE
* Preliminaries
In order to prove the correctness of |backwardsInduction| (see
|CoreTheory|), we need a number of additional assumptions, which we
introduce here. These result in proof obligations for the user of the
framework. As in |CoreTheory|, they are represented as meta-variables
(or ``holes''). Once the user has discharged these obligations, type
checking this file will prove the optimality of |backwardsInduction|
(with some caveats related to the current Idris implementation).
* |LTE|
The binary relation introduced in |CoreTheory| for comparing values of
sequences of policies has to be a preorder:
> reflexiveLTE : (a : Val) -> a `LTE` a
> transitiveLTE : (a : Val) -> (b : Val) -> (c : Val) -> a `LTE` b -> b `LTE` c -> a `LTE` c
Additionally, |plus| is required to be monotonic with respect to |LTE|:
> monotonePlusLTE : {a, b, c, d : Val} -> a `LTE` b -> c `LTE` d -> (a `plus` c) `LTE` (b `plus` d)
* |meas|
The function |meas| introduced in |CoreTheory| to describe how a
decision maker measures the possible rewards entailed by (lists,
probability distributions, etc. of) possible next states is required to
fulfill a monotonicity condition:
> measMon : {A : Type} ->
> (f : A -> Val) -> (g : A -> Val) ->
> ((a : A) -> (f a) `LTE` (g a)) ->
> (ma : M A) -> meas (fmap f ma) `LTE` meas (fmap g ma)
In a nutshell, |measMon| says that, if |ma| and |mb| are similar
|M|-structures and |ma| is smaller or equal to |mb|, than it cannot be
the case that the measure of |ma| is greater than the measure of |mb|.
This conditions was originally formalized by Ionescu in [2] to give a
consistent meaning to harm measures in vulnerability studies.
* |cvalargmax|
The function |cvalargmax| introduced in |CoreTheory| must deliver
optimal controls. We need the function |cvalmax|, returning the value
of those optimal controls, in order to fully specify |cvalargmax|:
> cvalmax : {t, n : Nat} ->
> (x : State t) -> (r : Reachable x) -> (v : Viable (S n) x) ->
> (ps : PolicySeq (S t) n) -> Val
> cvalargmaxSpec : {t : Nat} -> {n : Nat} ->
> (x : State t) -> (r : Reachable x) ->
> (v : Viable (S n) x) -> (ps : PolicySeq (S t) n) ->
> cvalmax x r v ps = cval x r v ps (cvalargmax x r v ps)
> cvalmaxSpec : {t : Nat} -> {n : Nat} ->
> (x : State t) -> (r : Reachable x) ->
> (v : Viable (S n) x) -> (ps : PolicySeq (S t) n) ->
> (y : GoodCtrl t x n) ->
> (cval x r v ps y) `LTE` (cvalmax x r v ps)
The reason for using these very specific functions, instead of more
general |argmax|, |max|, |argmaxSpec| and |maxSpec| like for instance
< argmax : {t : Nat} -> {n : Nat} ->
< (x : State t) -> (Viable (S n) x) ->
< (f : GoodCtrl t x n -> Val) -> GoodCtrl t x n
< max : {t : Nat} -> {n : Nat} ->
< (x : State t) -> (Viable (S n) x) ->
< (f : GoodCtrl t x n -> Val) -> Val
< argmaxSpec : {t : Nat} -> {n : Nat} ->
< (x : State t) -> (v : Viable (S n) x) ->
< (f : GoodCtrl t x n -> Val) ->
< max x v f = f (argmax x v f)
< maxSpec : {t : Nat} -> {n : Nat} ->
< (x : State t) -> (v : Viable (S n) x) ->
< (f : GoodCtrl t x n -> Val) -> (y : GoodCtrl t x n) ->
< (f y) `LTE` (max x v f)
is that optimisation is, in most case, not computable. The assumptions
on |cvalmax| and |cvalargmax| are the minimal requirements for the
computability of optimal extensions. Anything more general risks being
non-implementable.
* The proof of correctness of |backwardsInduction|:
** Policy sequences of length zero are optimal
> |||
> nilOptPolicySeq : OptPolicySeq Nil
> nilOptPolicySeq ps' x r v = reflexiveLTE zero
** Bellman's principle of optimality:
> |||
> Bellman : {t, m : Nat} ->
> (ps : PolicySeq (S t) m) -> OptPolicySeq ps ->
> (p : Policy t (S m)) -> OptExt ps p ->
> OptPolicySeq (p :: ps)
> Bellman {t} {m} ps ops p oep = opps where
> opps : OptPolicySeq (p :: ps)
> opps (p' :: ps') x r v = transitiveLTE (val x r v (p' :: ps'))
> (val x r v (p' :: ps))
> (val x r v (p :: ps))
> s4 s5 where
> gy' : GoodCtrl t x m
> gy' = p' x r v
> y' : Ctrl t x
> y' = ctrl gy'
> mx' : M (State (S t))
> mx' = nexts t x y'
> av' : All (Viable m) mx'
> av' = allViable gy'
> f' : PossibleNextState x (ctrl gy') -> Val
> f' = sval x r v gy' ps'
> f : PossibleNextState x (ctrl gy') -> Val
> f = sval x r v gy' ps
> s1 : (x' : State (S t)) -> (r' : Reachable x') -> (v' : Viable m x') ->
> val x' r' v' ps' `LTE` val x' r' v' ps
> s1 x' r' v' = ops ps' x' r' v'
> s2 : (z : PossibleNextState x (ctrl gy')) -> (f' z) `LTE` (f z)
> s2 (MkSigma x' x'emx') =
> monotonePlusLTE (reflexiveLTE (reward t x y' x')) (s1 x' r' v') where
> ar' : All Reachable mx'
> ar' = reachableSpec1 x r y'
> r' : Reachable x'
> r' = allElemSpec0 x' mx' ar' x'emx'
> v' : Viable m x'
> v' = allElemSpec0 x' mx' av' x'emx'
> s3 : meas (fmap f' (tagElem mx')) `LTE` meas (fmap f (tagElem mx'))
> s3 = measMon f' f s2 (tagElem mx')
> s4 : val x r v (p' :: ps') `LTE` val x r v (p' :: ps)
> s4 = s3
> s5 : val x r v (p' :: ps) `LTE` val x r v (p :: ps)
> s5 = oep p' x r v
> |||
> optExtLemma : {t, n : Nat} ->
> (ps : PolicySeq (S t) n) -> OptExt ps (optExt ps)
> optExtLemma {t} {n} ps p' x r v = s6 where
> p : Policy t (S n)
> p = optExt ps
> gy : GoodCtrl t x n
> gy = p x r v
> y : Ctrl t x
> y = ctrl gy
> av : All (Viable n) (nexts t x y)
> av = allViable gy
> gy' : GoodCtrl t x n
> gy' = p' x r v
> y' : Ctrl t x
> y' = ctrl gy'
> av' : All (Viable n) (nexts t x y')
> av' = allViable gy'
> g : GoodCtrl t x n -> Val
> g = cval x r v ps
> f : PossibleNextState x (ctrl gy) -> Val
> f = sval x r v gy ps
> f' : PossibleNextState x (ctrl gy') -> Val
> f' = sval x r v gy' ps
> s1 : g gy' `LTE` cvalmax x r v ps
> s1 = cvalmaxSpec x r v ps gy'
> s2 : g gy' `LTE` g (cvalargmax x r v ps)
> s2 = replace {P = \ z => (g gy' `LTE` z)} (cvalargmaxSpec x r v ps) s1
> -- the rest of the steps are for the (sort of) human reader
> s3 : g gy' `LTE` g gy
> s3 = s2
> s4 : cval x r v ps gy' `LTE` cval x r v ps gy
> s4 = s3
> s5 : meas (fmap f' (tagElem (nexts t x y'))) `LTE` meas (fmap f (tagElem (nexts t x y)))
> s5 = s4
> s6 : val x r v (p' :: ps) `LTE` val x r v (p :: ps)
> s6 = s5
** Correctness of backwards induction
> backwardsInductionLemma : (t : Nat) -> (n : Nat) -> OptPolicySeq (backwardsInduction t n)
> backwardsInductionLemma t Z = nilOptPolicySeq
> backwardsInductionLemma t (S n) = Bellman ps ops p oep where
> ps : PolicySeq (S t) n
> ps = backwardsInduction (S t) n
> ops : OptPolicySeq ps
> ops = backwardsInductionLemma (S t) n
> p : Policy t (S n)
> p = optExt ps
> oep : OptExt ps p
> oep = optExtLemma ps
Thus, we can compute provably optimal sequences of policies for
arbitrary SDPs and number of decision steps.
> {-
> ---}
[1] Bellman, Richard; "Dynamic Programming", Princeton University Press,
1957
[2] Ionescu, Cezar; "Vulnerability Modelling and Monadic Dynamical
Systems", Freie Universitaet Berlin, 2009
> module SequentialDecisionProblems.Helpers
> import Data.Fin
> import Control.Isomorphism
> import SequentialDecisionProblems.CoreTheory
> import Finite.Predicates
> import Finite.Operations
> import Finite.Properties
> import Sigma.Sigma
> import Sigma.Operations
> import Sigma.Properties
> %default total
> %access public export
> %auto_implicits off
* Finiteness notions
> |||
> FiniteViable : Type
> FiniteViable = {t : Nat} ->
> (n : Nat) -> (x : State t) -> Finite (Viable {t} n x)
> |||
> FiniteAll : Type
> FiniteAll = {A : Type} -> {P : A -> Type} ->
> Finite1 P -> (ma : M A) -> Finite (All P ma)
> |||
> FiniteAllViable : Type
> FiniteAllViable = {t : Nat} -> {n : Nat} ->
> (x : State t) -> (y : Ctrl t x) ->
> Finite (All (Viable {t = S t} n) (nexts t x y))
> |||
> FiniteNotEmpty : Type
> FiniteNotEmpty = {t : Nat} -> {n : Nat} ->
> (x : State t) -> (y : Ctrl t x) ->
> Finite (NotEmpty (nexts t x y))
> |||
> FiniteGood : Type
> FiniteGood = {t : Nat} -> {n : Nat} ->
> (x : State t) -> (y : Ctrl t x) ->
> Finite (Good t x n y)
> |||
> FiniteCtrl : Type
> FiniteCtrl = {t : Nat} -> {n : Nat} ->
> (x : State t) -> Finite (Ctrl t x)
> |||
> FiniteGoodCtrl : Type
> FiniteGoodCtrl = {t : Nat} -> {n : Nat} ->
> (x : State t) -> Viable {t = t} (S n) x ->
> Finite (GoodCtrl t x n)
* Standard deduction patterns in the implementation of specific SDPs
We would like to provide users with a function
< finiteAllViable : FiniteAll -> FiniteViable -> FiniteAllViable
but, as it turns out, implementing this function is not trivial (see
issues 3135 and 3136). Thus, for the time being, we introduce 2
additional assumptions in the global context
> ||| If users can prove that All is finite ...
> finiteAll : {A : Type} -> {P : A -> Type} ->
> Finite1 P -> (ma : M A) -> Finite (All P ma)
> ||| ... and that Viable is finite,
> finiteViable : {t : Nat} -> {n : Nat} ->
> (x : State t) -> Finite (Viable {t} n x)
and apply them to deduce finiteness of |All Viable|.
> ||| we can deduce that All Viable is finite
> finiteAllViable : {t : Nat} -> {n : Nat} ->
> (x : State t) -> (y : Ctrl t x) ->
> Finite (All (Viable {t = S t} n) (nexts t x y))
> finiteAllViable {t} {n} x y = finiteAll (finiteViable {t = S t} {n}) (nexts t x y)
Similarly, we assume that users are able to prove that |NotEmpty| is finite
> finiteNotEmpty : {t : Nat} -> {n : Nat} ->
> (x : State t) -> (y : Ctrl t x) ->
> Finite (NotEmpty (nexts t x y))
Finiteness of |NotEmpty| and |All Viable| directly implies finiteness of
|Good|
> |||
> finiteGood : {t : Nat} -> {n : Nat} ->
> (x : State t) -> (y : Ctrl t x) ->
> Finite (Good t x n y)
> finiteGood {n} x y = finiteProduct (finiteNotEmpty {n} x y) (finiteAllViable x y)
and, assuming finiteness of controls, finiteness of |GoodCtrl|:
> |||
> finiteCtrl : {t : Nat} -> {n : Nat} -> (x : State t) -> Finite (Ctrl t x)
> |||
> finiteGoodCtrl : {t : Nat} -> {n : Nat} ->
> (x : State t) ->
> Finite (GoodCtrl t x n)
> finiteGoodCtrl {t} {n} x = finiteSigmaLemma0 (finiteCtrl {t} {n} x) (finiteGood {t} {n} x)
Finally, we can show
> |||
> cardNotZGoodCtrl : {t : Nat} -> {n : Nat} ->
> (x : State t) -> (v : Viable {t = t} (S n) x) ->
> CardNotZ (finiteGoodCtrl {t} {n} x)
> cardNotZGoodCtrl x v = cardNotZLemma (finiteGoodCtrl x) (viableSpec1 x v)
> {-
> ---}
> module SequentialDecisionProblems.Utils
> import Data.Fin
> import Control.Isomorphism
> import SequentialDecisionProblems.CoreTheory
> import Sigma.Sigma
> import Sigma.Operations
> import Finite.Predicates
> import Fun.Operations
> %default total
> %access public export
> %auto_implicits off
* ...
> |||
> FiniteViable : Type
> FiniteViable = {t : Nat} -> {n : Nat} -> (x : State t) -> Finite (Viable {t} n x)
> |||
> FiniteAll : Type
> FiniteAll = {A : Type} -> {P : A -> Type} -> Finite1 P -> (ma : M A) -> Finite (All P ma)
> |||
> FiniteAllViable : Type
> FiniteAllViable = {t : Nat} -> {n : Nat} ->
> (x : State t) -> (y : Ctrl t x) ->
> Finite (All (Viable {t = S t} n) (nexts t x y))
> |||
> FiniteNonEmpty : Type
> FiniteNonEmpty = {t : Nat} -> {n : Nat} ->
> (x : State t) -> (y : Ctrl t x) ->
> Finite (NotEmpty (nexts t x y))
> |||
> FiniteGoodCtrl : Type
> FiniteGoodCtrl = {t : Nat} -> {n : Nat} ->
> (x : State t) -> Viable {t = t} (S n) x ->
> Finite (GoodCtrl t x n)
> {-
> finiteAllViable : FiniteAll -> FiniteViable -> FiniteAllViable
> finiteAllViable fAll fViable = fAllViable where
> fAll' : {A : Type} -> {P : A -> Type} -> Finite1 P -> (ma : M A) -> Finite (All P ma)
> fAll' = fAll
> fViable' : {t : Nat} -> {n : Nat} -> (x : State t) -> Finite (Viable {t} n x)
> fViable' = fViable
> fAllViable : {t : Nat} -> {n : Nat} ->
> (x : State t) -> (y : Ctrl t x) ->
> Finite (All (Viable {t = S t} n) (nexts t x y))
> fAllViable {t} {n} x y = fAll' (fViable' {t = S t} {n}) (nexts t x y)
> -}
* Show states and controls
> |||
> showState : {t : Nat} -> State t -> String
> |||
> showCtrl : {t : Nat} -> {x : State t} -> Ctrl t x -> String
> |||
> showStateCtrl : {t : Nat} -> Sigma (State t) (Ctrl t) -> String
> showStateCtrl {t} (MkSigma x y) = "(" ++ showState {t} x ++ " ** " ++ showCtrl {t} {x} y ++ ")"
* Sequences of state-control pairs
> |||
> data StateCtrlSeq : (t : Nat) -> (n : Nat) -> Type where
> Nil : {t : Nat} -> (x : State t) -> StateCtrlSeq t Z
> (::) : {t, n : Nat} -> Sigma (State t) (Ctrl t) -> StateCtrlSeq (S t) n -> StateCtrlSeq t (S n)
> using (t : Nat, n : Nat)
> implementation Show (StateCtrlSeq t n) where
> show = show' where
> show' : {t : Nat} -> {n : Nat} -> StateCtrlSeq t n -> String
> show' xys = "[" ++ show'' "" xys ++ "]" where
> show'' : {t' : Nat} -> {n' : Nat} -> String -> StateCtrlSeq t' n' -> String
> show'' {t'} {n' = Z} acc (Nil x) =
> acc ++ "(" ++ showState x ++ " ** " ++ " " ++ ")"
> show'' {t'} {n' = S m'} acc (xy :: xys) =
> show'' {t' = S t'} {n' = m'} (acc ++ showStateCtrl xy ++ ", ") xys
> |||
> valStateCtrlSeq : (t : Nat) -> (n : Nat) -> StateCtrlSeq t n -> Val
> valStateCtrlSeq t Z (Nil x) =
> zero
> valStateCtrlSeq t (S Z) ((MkSigma x y) :: (Nil x')) =
> reward t x y x'
> valStateCtrlSeq t (S (S m)) ((MkSigma x y) :: (MkSigma x' y') :: xys) =
> reward t x y x' `plus` valStateCtrlSeq (S t) (S m) ((MkSigma x' y') :: xys)
* Trajectories
> ||| The monadic operations
> ret : {A : Type} -> A -> M A
> bind : {A, B : Type} -> M A -> (A -> M B) -> M B
> postulate monadSpec1 : {A, B : Type} -> {f : A -> B} ->
> (fmap f) . ret = ret . f
> postulate monadSpec21 : {A, B : Type} -> {f : A -> M B} -> {a : A} ->
> bind (ret a) f = f a
> postulate monadSpec22 : {A : Type} -> {ma : M A} ->
> bind ma ret = ma
> postulate monadSpec23 : {A, B, C : Type} -> {f : A -> M B} -> {g : B -> M C} -> {ma : M A} ->
> bind (bind ma f) g = bind ma (\ a => bind (f a) g)
> |||
> possibleStateCtrlSeqs : {t, n : Nat} -> (x : State t) -> (r : Reachable x) -> (v : Viable n x) ->
> (ps : PolicySeq t n) -> M (StateCtrlSeq t n)
> possibleStateCtrlSeqs {t} {n = Z} x r v Nil = ret (Nil x)
> possibleStateCtrlSeqs {t} {n = S m} x r v (p :: ps') =
> fmap g (bind (tagElem mx') f) where
> y : Ctrl t x
> y = ctrl (p x r v)
> mx' : M (State (S t))
> mx' = nexts t x y
> av : All (Viable m) mx'
> av = allViable (p x r v)
> g : StateCtrlSeq (S t) m -> StateCtrlSeq t (S m)
> g = ((MkSigma x y) ::)
> f : Sigma (State (S t)) (\ x' => x' `Elem` mx') -> M (StateCtrlSeq (S t) m)
> f (MkSigma x' x'emx') = possibleStateCtrlSeqs {n = m} x' r' v' ps' where
> ar : All Reachable mx'
> ar = reachableSpec1 x r y
> r' : Reachable x'
> r' = allElemSpec0 x' mx' ar x'emx'
> v' : Viable m x'
> v' = allElemSpec0 x' mx' av x'emx'
> |||
> morePossibleStateCtrlSeqs : {t, n : Nat} -> (mx : M (State t)) ->
> All Reachable mx -> All (Viable n) mx ->
> (ps : PolicySeq t n) -> M (StateCtrlSeq t n)
> morePossibleStateCtrlSeqs {t} {n} mx ar av ps = bind (tagElem mx) f where
> f : Sigma (State t) (\ x => x `Elem` mx) -> M (StateCtrlSeq t n)
> f (MkSigma x xemx) = possibleStateCtrlSeqs x r v ps where
> r : Reachable x
> r = allElemSpec0 x mx ar xemx
> v : Viable n x
> v = allElemSpec0 x mx av xemx
> |||
> possibleStateCtrlSeqsRewards : {t : Nat} -> {n : Nat} ->
> (x : State t) -> (r : Reachable x) -> (v : Viable n x) ->
> (ps : PolicySeq t n) -> M (StateCtrlSeq t n, Val)
> possibleStateCtrlSeqsRewards {t} {n} x r v ps =
> fmap (pair (id, valStateCtrlSeq t n)) (possibleStateCtrlSeqs {t} {n} x r v ps)
> |||
> possibleStateCtrlSeqsRewards' : {t : Nat} -> {n : Nat} ->
> M (StateCtrlSeq t n) -> M (StateCtrlSeq t n, Val)
> possibleStateCtrlSeqsRewards' {t} {n} xys = fmap (pair (id, valStateCtrlSeq t n)) xys
> |||
> possibleRewards : {t : Nat} -> {n : Nat} ->
> (x : State t) -> (r : Reachable x) -> (v : Viable n x) ->
> (ps : PolicySeq t n) -> M Val
> possibleRewards {t} {n} x r v ps =
> fmap (valStateCtrlSeq t n) (possibleStateCtrlSeqs {t} {n} x r v ps)
> |||
> possibleRewards' : {t : Nat} -> {n : Nat} -> M (StateCtrlSeq t n) -> M Val
> possibleRewards' {t} {n} xys = fmap (valStateCtrlSeq t n) xys
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment