Commit dff6bf7a by Nicola Botta

### Initial revision.

parents
This diff is collapsed.
 > module SequentialDecisionProblems.FullTheory > import SequentialDecisionProblems.CoreTheory > import Sigma.Sigma > import Sigma.Operations > %default total > %access public export > %auto_implicits off > -- %hide Prelude.Nat.LTE * Preliminaries In order to prove the correctness of |backwardsInduction| (see |CoreTheory|), we need a number of additional assumptions, which we introduce here. These result in proof obligations for the user of the framework. As in |CoreTheory|, they are represented as meta-variables (or ``holes''). Once the user has discharged these obligations, type checking this file will prove the optimality of |backwardsInduction| (with some caveats related to the current Idris implementation). * |LTE| The binary relation introduced in |CoreTheory| for comparing values of sequences of policies has to be a preorder: > reflexiveLTE : (a : Val) -> a `LTE` a > transitiveLTE : (a : Val) -> (b : Val) -> (c : Val) -> a `LTE` b -> b `LTE` c -> a `LTE` c Additionally, |plus| is required to be monotonic with respect to |LTE|: > monotonePlusLTE : {a, b, c, d : Val} -> a `LTE` b -> c `LTE` d -> (a `plus` c) `LTE` (b `plus` d) * |meas| The function |meas| introduced in |CoreTheory| to describe how a decision maker measures the possible rewards entailed by (lists, probability distributions, etc. of) possible next states is required to fulfill a monotonicity condition: > measMon : {A : Type} -> > (f : A -> Val) -> (g : A -> Val) -> > ((a : A) -> (f a) `LTE` (g a)) -> > (ma : M A) -> meas (fmap f ma) `LTE` meas (fmap g ma) In a nutshell, |measMon| says that, if |ma| and |mb| are similar |M|-structures and |ma| is smaller or equal to |mb|, than it cannot be the case that the measure of |ma| is greater than the measure of |mb|. This conditions was originally formalized by Ionescu in [2] to give a consistent meaning to harm measures in vulnerability studies. * |cvalargmax| The function |cvalargmax| introduced in |CoreTheory| must deliver optimal controls. We need the function |cvalmax|, returning the value of those optimal controls, in order to fully specify |cvalargmax|: > cvalmax : {t, n : Nat} -> > (x : State t) -> (r : Reachable x) -> (v : Viable (S n) x) -> > (ps : PolicySeq (S t) n) -> Val > cvalargmaxSpec : {t : Nat} -> {n : Nat} -> > (x : State t) -> (r : Reachable x) -> > (v : Viable (S n) x) -> (ps : PolicySeq (S t) n) -> > cvalmax x r v ps = cval x r v ps (cvalargmax x r v ps) > cvalmaxSpec : {t : Nat} -> {n : Nat} -> > (x : State t) -> (r : Reachable x) -> > (v : Viable (S n) x) -> (ps : PolicySeq (S t) n) -> > (y : GoodCtrl t x n) -> > (cval x r v ps y) `LTE` (cvalmax x r v ps) The reason for using these very specific functions, instead of more general |argmax|, |max|, |argmaxSpec| and |maxSpec| like for instance < argmax : {t : Nat} -> {n : Nat} -> < (x : State t) -> (Viable (S n) x) -> < (f : GoodCtrl t x n -> Val) -> GoodCtrl t x n < max : {t : Nat} -> {n : Nat} -> < (x : State t) -> (Viable (S n) x) -> < (f : GoodCtrl t x n -> Val) -> Val < argmaxSpec : {t : Nat} -> {n : Nat} -> < (x : State t) -> (v : Viable (S n) x) -> < (f : GoodCtrl t x n -> Val) -> < max x v f = f (argmax x v f) < maxSpec : {t : Nat} -> {n : Nat} -> < (x : State t) -> (v : Viable (S n) x) -> < (f : GoodCtrl t x n -> Val) -> (y : GoodCtrl t x n) -> < (f y) `LTE` (max x v f) is that optimisation is, in most case, not computable. The assumptions on |cvalmax| and |cvalargmax| are the minimal requirements for the computability of optimal extensions. Anything more general risks being non-implementable. * The proof of correctness of |backwardsInduction|: ** Policy sequences of length zero are optimal > ||| > nilOptPolicySeq : OptPolicySeq Nil > nilOptPolicySeq ps' x r v = reflexiveLTE zero ** Bellman's principle of optimality: > ||| > Bellman : {t, m : Nat} -> > (ps : PolicySeq (S t) m) -> OptPolicySeq ps -> > (p : Policy t (S m)) -> OptExt ps p -> > OptPolicySeq (p :: ps) > Bellman {t} {m} ps ops p oep = opps where > opps : OptPolicySeq (p :: ps) > opps (p' :: ps') x r v = transitiveLTE (val x r v (p' :: ps')) > (val x r v (p' :: ps)) > (val x r v (p :: ps)) > s4 s5 where > gy' : GoodCtrl t x m > gy' = p' x r v > y' : Ctrl t x > y' = ctrl gy' > mx' : M (State (S t)) > mx' = nexts t x y' > av' : All (Viable m) mx' > av' = allViable gy' > f' : PossibleNextState x (ctrl gy') -> Val > f' = sval x r v gy' ps' > f : PossibleNextState x (ctrl gy') -> Val > f = sval x r v gy' ps > s1 : (x' : State (S t)) -> (r' : Reachable x') -> (v' : Viable m x') -> > val x' r' v' ps' `LTE` val x' r' v' ps > s1 x' r' v' = ops ps' x' r' v' > s2 : (z : PossibleNextState x (ctrl gy')) -> (f' z) `LTE` (f z) > s2 (MkSigma x' x'emx') = > monotonePlusLTE (reflexiveLTE (reward t x y' x')) (s1 x' r' v') where > ar' : All Reachable mx' > ar' = reachableSpec1 x r y' > r' : Reachable x' > r' = allElemSpec0 x' mx' ar' x'emx' > v' : Viable m x' > v' = allElemSpec0 x' mx' av' x'emx' > s3 : meas (fmap f' (tagElem mx')) `LTE` meas (fmap f (tagElem mx')) > s3 = measMon f' f s2 (tagElem mx') > s4 : val x r v (p' :: ps') `LTE` val x r v (p' :: ps) > s4 = s3 > s5 : val x r v (p' :: ps) `LTE` val x r v (p :: ps) > s5 = oep p' x r v > ||| > optExtLemma : {t, n : Nat} -> > (ps : PolicySeq (S t) n) -> OptExt ps (optExt ps) > optExtLemma {t} {n} ps p' x r v = s6 where > p : Policy t (S n) > p = optExt ps > gy : GoodCtrl t x n > gy = p x r v > y : Ctrl t x > y = ctrl gy > av : All (Viable n) (nexts t x y) > av = allViable gy > gy' : GoodCtrl t x n > gy' = p' x r v > y' : Ctrl t x > y' = ctrl gy' > av' : All (Viable n) (nexts t x y') > av' = allViable gy' > g : GoodCtrl t x n -> Val > g = cval x r v ps > f : PossibleNextState x (ctrl gy) -> Val > f = sval x r v gy ps > f' : PossibleNextState x (ctrl gy') -> Val > f' = sval x r v gy' ps > s1 : g gy' `LTE` cvalmax x r v ps > s1 = cvalmaxSpec x r v ps gy' > s2 : g gy' `LTE` g (cvalargmax x r v ps) > s2 = replace {P = \ z => (g gy' `LTE` z)} (cvalargmaxSpec x r v ps) s1 > -- the rest of the steps are for the (sort of) human reader > s3 : g gy' `LTE` g gy > s3 = s2 > s4 : cval x r v ps gy' `LTE` cval x r v ps gy > s4 = s3 > s5 : meas (fmap f' (tagElem (nexts t x y'))) `LTE` meas (fmap f (tagElem (nexts t x y))) > s5 = s4 > s6 : val x r v (p' :: ps) `LTE` val x r v (p :: ps) > s6 = s5 ** Correctness of backwards induction > backwardsInductionLemma : (t : Nat) -> (n : Nat) -> OptPolicySeq (backwardsInduction t n) > backwardsInductionLemma t Z = nilOptPolicySeq > backwardsInductionLemma t (S n) = Bellman ps ops p oep where > ps : PolicySeq (S t) n > ps = backwardsInduction (S t) n > ops : OptPolicySeq ps > ops = backwardsInductionLemma (S t) n > p : Policy t (S n) > p = optExt ps > oep : OptExt ps p > oep = optExtLemma ps Thus, we can compute provably optimal sequences of policies for arbitrary SDPs and number of decision steps. > {- > ---} [1] Bellman, Richard; "Dynamic Programming", Princeton University Press, 1957 [2] Ionescu, Cezar; "Vulnerability Modelling and Monadic Dynamical Systems", Freie Universitaet Berlin, 2009
 > module SequentialDecisionProblems.Helpers > import Data.Fin > import Control.Isomorphism > import SequentialDecisionProblems.CoreTheory > import Finite.Predicates > import Finite.Operations > import Finite.Properties > import Sigma.Sigma > import Sigma.Operations > import Sigma.Properties > %default total > %access public export > %auto_implicits off * Finiteness notions > ||| > FiniteViable : Type > FiniteViable = {t : Nat} -> > (n : Nat) -> (x : State t) -> Finite (Viable {t} n x) > ||| > FiniteAll : Type > FiniteAll = {A : Type} -> {P : A -> Type} -> > Finite1 P -> (ma : M A) -> Finite (All P ma) > ||| > FiniteAllViable : Type > FiniteAllViable = {t : Nat} -> {n : Nat} -> > (x : State t) -> (y : Ctrl t x) -> > Finite (All (Viable {t = S t} n) (nexts t x y)) > ||| > FiniteNotEmpty : Type > FiniteNotEmpty = {t : Nat} -> {n : Nat} -> > (x : State t) -> (y : Ctrl t x) -> > Finite (NotEmpty (nexts t x y)) > ||| > FiniteGood : Type > FiniteGood = {t : Nat} -> {n : Nat} -> > (x : State t) -> (y : Ctrl t x) -> > Finite (Good t x n y) > ||| > FiniteCtrl : Type > FiniteCtrl = {t : Nat} -> {n : Nat} -> > (x : State t) -> Finite (Ctrl t x) > ||| > FiniteGoodCtrl : Type > FiniteGoodCtrl = {t : Nat} -> {n : Nat} -> > (x : State t) -> Viable {t = t} (S n) x -> > Finite (GoodCtrl t x n) * Standard deduction patterns in the implementation of specific SDPs We would like to provide users with a function < finiteAllViable : FiniteAll -> FiniteViable -> FiniteAllViable but, as it turns out, implementing this function is not trivial (see issues 3135 and 3136). Thus, for the time being, we introduce 2 additional assumptions in the global context > ||| If users can prove that All is finite ... > finiteAll : {A : Type} -> {P : A -> Type} -> > Finite1 P -> (ma : M A) -> Finite (All P ma) > ||| ... and that Viable is finite, > finiteViable : {t : Nat} -> {n : Nat} -> > (x : State t) -> Finite (Viable {t} n x) and apply them to deduce finiteness of |All Viable|. > ||| we can deduce that All Viable is finite > finiteAllViable : {t : Nat} -> {n : Nat} -> > (x : State t) -> (y : Ctrl t x) -> > Finite (All (Viable {t = S t} n) (nexts t x y)) > finiteAllViable {t} {n} x y = finiteAll (finiteViable {t = S t} {n}) (nexts t x y) Similarly, we assume that users are able to prove that |NotEmpty| is finite > finiteNotEmpty : {t : Nat} -> {n : Nat} -> > (x : State t) -> (y : Ctrl t x) -> > Finite (NotEmpty (nexts t x y)) Finiteness of |NotEmpty| and |All Viable| directly implies finiteness of |Good| > ||| > finiteGood : {t : Nat} -> {n : Nat} -> > (x : State t) -> (y : Ctrl t x) -> > Finite (Good t x n y) > finiteGood {n} x y = finiteProduct (finiteNotEmpty {n} x y) (finiteAllViable x y) and, assuming finiteness of controls, finiteness of |GoodCtrl|: > ||| > finiteCtrl : {t : Nat} -> {n : Nat} -> (x : State t) -> Finite (Ctrl t x) > ||| > finiteGoodCtrl : {t : Nat} -> {n : Nat} -> > (x : State t) -> > Finite (GoodCtrl t x n) > finiteGoodCtrl {t} {n} x = finiteSigmaLemma0 (finiteCtrl {t} {n} x) (finiteGood {t} {n} x) Finally, we can show > ||| > cardNotZGoodCtrl : {t : Nat} -> {n : Nat} -> > (x : State t) -> (v : Viable {t = t} (S n) x) -> > CardNotZ (finiteGoodCtrl {t} {n} x) > cardNotZGoodCtrl x v = cardNotZLemma (finiteGoodCtrl x) (viableSpec1 x v) > {- > ---}
 > module SequentialDecisionProblems.Utils > import Data.Fin > import Control.Isomorphism > import SequentialDecisionProblems.CoreTheory > import Sigma.Sigma > import Sigma.Operations > import Finite.Predicates > import Fun.Operations > %default total > %access public export > %auto_implicits off * ... > ||| > FiniteViable : Type > FiniteViable = {t : Nat} -> {n : Nat} -> (x : State t) -> Finite (Viable {t} n x) > ||| > FiniteAll : Type > FiniteAll = {A : Type} -> {P : A -> Type} -> Finite1 P -> (ma : M A) -> Finite (All P ma) > ||| > FiniteAllViable : Type > FiniteAllViable = {t : Nat} -> {n : Nat} -> > (x : State t) -> (y : Ctrl t x) -> > Finite (All (Viable {t = S t} n) (nexts t x y)) > ||| > FiniteNonEmpty : Type > FiniteNonEmpty = {t : Nat} -> {n : Nat} -> > (x : State t) -> (y : Ctrl t x) -> > Finite (NotEmpty (nexts t x y)) > ||| > FiniteGoodCtrl : Type > FiniteGoodCtrl = {t : Nat} -> {n : Nat} -> > (x : State t) -> Viable {t = t} (S n) x -> > Finite (GoodCtrl t x n) > {- > finiteAllViable : FiniteAll -> FiniteViable -> FiniteAllViable > finiteAllViable fAll fViable = fAllViable where > fAll' : {A : Type} -> {P : A -> Type} -> Finite1 P -> (ma : M A) -> Finite (All P ma) > fAll' = fAll > fViable' : {t : Nat} -> {n : Nat} -> (x : State t) -> Finite (Viable {t} n x) > fViable' = fViable > fAllViable : {t : Nat} -> {n : Nat} -> > (x : State t) -> (y : Ctrl t x) -> > Finite (All (Viable {t = S t} n) (nexts t x y)) > fAllViable {t} {n} x y = fAll' (fViable' {t = S t} {n}) (nexts t x y) > -} * Show states and controls > ||| > showState : {t : Nat} -> State t -> String > ||| > showCtrl : {t : Nat} -> {x : State t} -> Ctrl t x -> String > ||| > showStateCtrl : {t : Nat} -> Sigma (State t) (Ctrl t) -> String > showStateCtrl {t} (MkSigma x y) = "(" ++ showState {t} x ++ " ** " ++ showCtrl {t} {x} y ++ ")" * Sequences of state-control pairs > ||| > data StateCtrlSeq : (t : Nat) -> (n : Nat) -> Type where > Nil : {t : Nat} -> (x : State t) -> StateCtrlSeq t Z > (::) : {t, n : Nat} -> Sigma (State t) (Ctrl t) -> StateCtrlSeq (S t) n -> StateCtrlSeq t (S n) > using (t : Nat, n : Nat) > implementation Show (StateCtrlSeq t n) where > show = show' where > show' : {t : Nat} -> {n : Nat} -> StateCtrlSeq t n -> String > show' xys = "[" ++ show'' "" xys ++ "]" where > show'' : {t' : Nat} -> {n' : Nat} -> String -> StateCtrlSeq t' n' -> String > show'' {t'} {n' = Z} acc (Nil x) = > acc ++ "(" ++ showState x ++ " ** " ++ " " ++ ")" > show'' {t'} {n' = S m'} acc (xy :: xys) = > show'' {t' = S t'} {n' = m'} (acc ++ showStateCtrl xy ++ ", ") xys > ||| > valStateCtrlSeq : (t : Nat) -> (n : Nat) -> StateCtrlSeq t n -> Val > valStateCtrlSeq t Z (Nil x) = > zero > valStateCtrlSeq t (S Z) ((MkSigma x y) :: (Nil x')) = > reward t x y x' > valStateCtrlSeq t (S (S m)) ((MkSigma x y) :: (MkSigma x' y') :: xys) = > reward t x y x' `plus` valStateCtrlSeq (S t) (S m) ((MkSigma x' y') :: xys) * Trajectories > ||| The monadic operations > ret : {A : Type} -> A -> M A > bind : {A, B : Type} -> M A -> (A -> M B) -> M B > postulate monadSpec1 : {A, B : Type} -> {f : A -> B} -> > (fmap f) . ret = ret . f > postulate monadSpec21 : {A, B : Type} -> {f : A -> M B} -> {a : A} -> > bind (ret a) f = f a > postulate monadSpec22 : {A : Type} -> {ma : M A} -> > bind ma ret = ma > postulate monadSpec23 : {A, B, C : Type} -> {f : A -> M B} -> {g : B -> M C} -> {ma : M A} -> > bind (bind ma f) g = bind ma (\ a => bind (f a) g) > ||| > possibleStateCtrlSeqs : {t, n : Nat} -> (x : State t) -> (r : Reachable x) -> (v : Viable n x) -> > (ps : PolicySeq t n) -> M (StateCtrlSeq t n) > possibleStateCtrlSeqs {t} {n = Z} x r v Nil = ret (Nil x) > possibleStateCtrlSeqs {t} {n = S m} x r v (p :: ps') = > fmap g (bind (tagElem mx') f) where > y : Ctrl t x > y = ctrl (p x r v) > mx' : M (State (S t)) > mx' = nexts t x y > av : All (Viable m) mx' > av = allViable (p x r v) > g : StateCtrlSeq (S t) m -> StateCtrlSeq t (S m) > g = ((MkSigma x y) ::) > f : Sigma (State (S t)) (\ x' => x' `Elem` mx') -> M (StateCtrlSeq (S t) m) > f (MkSigma x' x'emx') = possibleStateCtrlSeqs {n = m} x' r' v' ps' where > ar : All Reachable mx' > ar = reachableSpec1 x r y > r' : Reachable x' > r' = allElemSpec0 x' mx' ar x'emx' > v' : Viable m x' > v' = allElemSpec0 x' mx' av x'emx' > ||| > morePossibleStateCtrlSeqs : {t, n : Nat} -> (mx : M (State t)) -> > All Reachable mx -> All (Viable n) mx -> > (ps : PolicySeq t n) -> M (StateCtrlSeq t n) > morePossibleStateCtrlSeqs {t} {n} mx ar av ps = bind (tagElem mx) f where > f : Sigma (State t) (\ x => x `Elem` mx) -> M (StateCtrlSeq t n) > f (MkSigma x xemx) = possibleStateCtrlSeqs x r v ps where > r : Reachable x > r = allElemSpec0 x mx ar xemx > v : Viable n x > v = allElemSpec0 x mx av xemx > ||| > possibleStateCtrlSeqsRewards : {t : Nat} -> {n : Nat} -> > (x : State t) -> (r : Reachable x) -> (v : Viable n x) -> > (ps : PolicySeq t n) -> M (StateCtrlSeq t n, Val) > possibleStateCtrlSeqsRewards {t} {n} x r v ps = > fmap (pair (id, valStateCtrlSeq t n)) (possibleStateCtrlSeqs {t} {n} x r v ps) > ||| > possibleStateCtrlSeqsRewards' : {t : Nat} -> {n : Nat} -> > M (StateCtrlSeq t n) -> M (StateCtrlSeq t n, Val) > possibleStateCtrlSeqsRewards' {t} {n} xys = fmap (pair (id, valStateCtrlSeq t n)) xys > ||| > possibleRewards : {t : Nat} -> {n : Nat} -> > (x : State t) -> (r : Reachable x) -> (v : Viable n x) -> > (ps : PolicySeq t n) -> M Val > possibleRewards {t} {n} x r v ps = > fmap (valStateCtrlSeq t n) (possibleStateCtrlSeqs {t} {n} x r v ps) > ||| > possibleRewards' : {t : Nat} -> {n : Nat} -> M (StateCtrlSeq t n) -> M Val > possibleRewards' {t} {n} xys = fmap (valStateCtrlSeq t n) xys
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!