Commit c1f2186a authored by Nicola Botta's avatar Nicola Botta
Browse files

Merge branch 'master' of gitlab.pik-potsdam.de:botta/papers

parents 11a4dd0c a09e0e8f
......@@ -15,7 +15,7 @@ supplementary material to the paper, see [README](code/README.md).
discussed in [1], together with code for computing optimal
extensions. Requires [2] to type-check, see [README](lwtheory/README.md).
The files have been type-checked with Idris 1.3.3-git:ae98085b8 and
The files have been type-checked with Idris 1.3.2, Idris 1.3.3-git:ae98085b8 and
IdrisLibs v1.3.3. If you encounter any issues, please get in touch with
Nuria Brede (nubrede@pik-potsdam.de) or Nicola Botta
(botta@pik-potsdam.de).
......@@ -33,7 +33,9 @@ under grant agreement No 820970.
* 2020-07-14: Submitted to the Journal of Functional Programming (JFP)
* 2020-11-02: Decision: Major revision
* 2020-02-23: Submitted revised manuscript
* 2021-02-23: Submitted revised manuscript (R1)
* 2021-05-26: Decision: Major revision
* 2021-07-22: Submitted revised manuscript (R2)
---
......@@ -45,4 +47,4 @@ under grant agreement No 820970.
[3] Botta, N., Jansson, P., Ionescu, C. (2017).
Contributions to a computational theory of policy advice and avoidability. J. Funct. Program. 27:e23.
[Contributions to a computational theory of policy advice and avoidability](https://doi.org/10.1017/S0956796817000156). J. Funct. Program. 27:e23.
......@@ -3,7 +3,6 @@
> import Syntax.PreorderReasoning
> import Equality
> import Sigma
> %default total
> %auto_implicits off
......@@ -340,7 +339,7 @@ The type of state-control sequences:
> data StateCtrlSeq : (t, n : Nat) -> Type where
> Last : {t : Nat} -> X t -> StateCtrlSeq t (S Z)
> (:::) : {t, n : Nat} -> Sigma (X t) (Y t) ->
> (:::) : {t, n : Nat} -> (x : X t ** Y t x) ->
> StateCtrlSeq (S t) (S n) -> StateCtrlSeq t (S (S n))
......@@ -348,14 +347,14 @@ A function that extracts the head state from a state-control sequence:
> head : {t, n : Nat} -> StateCtrlSeq t (S n) -> X t
> head (Last x) = x
> head (MkSigma x y ::: xys) = x
> head ((x ** y) ::: xys) = x
The reward accumulated along a state-control sequence:
> sumR : {t, n : Nat} -> StateCtrlSeq t n -> Val
> sumR {t} (Last x) = zero
> sumR {t} (MkSigma x y ::: xys) = reward t x y (head xys) <+> sumR xys
> sumR {t} ((x ** y) ::: xys) = reward t x y (head xys) <+> sumR xys
Computation of an |M|-structure of trajectories from a given policy sequence:
......@@ -364,7 +363,7 @@ Computation of an |M|-structure of trajectories from a given policy sequence:
> trj {t} Nil x = (pure . Last) x
> trj {t} (p :: ps) x = let y = p x in
> let mx' = next t x y in
> map ((MkSigma x y) :::) (mx' >>= trj ps)
> map ((x ** y) :::) (mx' >>= trj ps)
The function that computes the measured total reward:
......@@ -632,7 +631,7 @@ The base case:
> let mx' : M (X (S t)) = next t x y in
> let r : (X (S t) -> Val) = reward t x y in
> let trjps : (X (S t) -> M SCS) = trj ps in
> let consxy : (SCS -> SCS') = ((MkSigma x y) :::) in
> let consxy : (SCS -> SCS') = ((x ** y) :::) in
> let mx'trjps : M SCS = (mx' >>= trjps) in
> let sR : (SCS -> Val) = sumR {t=S t} {n=S m} in
> let hd : (SCS -> X (S t)) = head {t=S t} {n=m} in
......@@ -681,7 +680,7 @@ The base case:
Now we can prove the soundness of monadic backward induction as corollary, using a
Now we can prove the correctness of monadic backward induction as corollary, using a
generalized optimality of policy sequences predicate:
> GenOptPolicySeq : {t, n : Nat} ->
......@@ -733,7 +732,7 @@ generalized optimality of policy sequences predicate:
> let y = p x in
> let trjps = trj ps in
> let nxpx = next t x y in
> let consxy = ((MkSigma x y) :::) in
> let consxy = ((x ** y) :::) in
> let nne = nextNotEmpty x y in
> let netrjps = trjNotEmptyLemma ps in
> let bne = bindPresNotEmpty trjps nxpx nne netrjps in
......@@ -772,7 +771,7 @@ generalized optimality of policy sequences predicate:
> let y' = p x' in
> let mx'' = next t x' y' in
> let SCS = (StateCtrlSeq (S t) (S n)) in
> let consx'y' = (:::) {t} {n} (MkSigma x' y') in
> let consx'y' = (:::) {t} {n} (x' ** y') in
> let mx''trjps = (>>=) {B=SCS} mx'' (trj ps) in
> let traj = trj {t} in
> let useMapPresComp = mapPresComp consx'y' (const (r x') <++> s) mx''trjps in
......
......@@ -8,42 +8,55 @@ type-checking can be found below.
###### Framework and proofs
* [Framework](Framework.lidr): Contains the lightweight version of the BJI-framework and its extension (as described in section 3 and 4), the proof of the correctness result from (section 6), and proofs of the auxiliary results (section 6 and appendix).
* [Framework](Framework.lidr): Contains the lightweight version of the
BJI-framework and its extension (as described in section 3 and 4),
the proof of the correctness result from (section 6), and proofs of the
auxiliary results (section 6 and appendix).
###### Examples and proofs for specific instances
* [List](List.lidr): Contains proofs for a non-deterministic instance of the framework in which the monad `M` is instantiated with `List`.
* [List](List.lidr): Contains proofs for a non-deterministic
instance of the framework in which the monad `M` is instantiated
with `List`.
* [Id](Id.lidr): Contains proofs for a deterministic
instance of the framework in which the monad `M` is instantiated as identity functor.
instance of the framework in which the monad `M` is instantiated
as identity functor.
* [MaxList](MaxList.lidr): Contains the proofs for an example measure that fulfills all three requirements.
* [MaxList](MaxList.lidr): Contains the proofs for an example measure
that fulfills all three requirements.
* [ListMeasures](ListMeasures.lidr): Contains the proofs for the abstract `List` measures discussed in section 5.3.
* [ListMeasures](ListMeasures.lidr): Contains the proofs for the
abstract |List| measures discussed in section 5.3.
* [GoodBadExample](GoodBadExample.lidr): Contains the formalization of the example of section 2.1.
* [GoodBadExample](GoodBadExample.lidr): Contains the formalization of
the example of section 2.1.
* [SchedulingExample](SchedulingExample.lidr): Contains the formalization of the example of section 2.2.
* [SchedulingExampleVariant1](SchedulingExampleVariant1.lidr): Contains a
simple possible formalization of the example of section 2.2.
* [GoodBadCounterExamples](GoodBadCounterExamples.lidr): Contains some of the counter-examples to `valMeasTotalReward` of section 5.2.
* [SchedulingExampleVariant2](SchedulingExampleVariant2.lidr): Contains a
possible formalization of the example of section 2.2 that makes more use
of type dependency.
* [GoodBadCounterExamples](GoodBadCounterExamples.lidr): Contains some of the
counter-examples to valMeasTotalReward of section 5.2.
###### Auxiliary Definitions
* [Equality](Equality.lidr): Definition of extensional equality as in [2].
* [Sigma](Sigma.lidr): Definition of Sigma types as in [2].
#### Type-Checking Instructions
The following has been successfully tested using Idris version 1.3.2:
* The source files can be typechecked all at once with
`idris --checkpkg code.ipkg`
`idris --checkpkg valval.ipkg`
or built with
`idris --build code.ipkg`
`idris --build valval.ipkg`
* Individual files can be loaded into the Idris REPL using
......@@ -51,7 +64,7 @@ The following has been successfully tested using Idris version 1.3.2:
* Generated intermediate code and executables can be deleted with
`idris --clean code.ipkg`
`idris --clean valval.ipkg`
## References
......@@ -61,5 +74,5 @@ Backward Induction. *submitted to J. Funct. Program*.
[2] Botta, N. (2016-2021). [IdrisLibs](https://gitlab.pik-potsdam.de/botta/IdrisLibs).
[3] Botta, N., Jansson, P., Ionescu, C. (2017). Contributions to a computational theory of policy
advice and avoidability. J. Funct. Program. 27:e23.
[3] Botta, N., Jansson, P., Ionescu, C. (2017). [Contributions to a computational theory of policy
advice and avoidability](https://doi.org/10.1017/S0956796817000156). J. Funct. Program. 27:e23.
> module SchedulingExampleVariant1
> import Data.Fin
> import Equality
> import Framework
> import Id
> %default total
> %auto_implicits off
> %access public export
The following example is the deterministic SDP from [Bertsekas1995, ch.1]
described in section 2.2 of [1].
For this example we make use of the possible dependencies
for state and control spaces in the BJI-framework.
As type of states at time steps $0 < t < 4$
we use could the canonical finite set of size |2 * t|:
< Framework.X (S (S (S (S _)))) = Void
< Framework.X Z = Unit
< Framework.X t = Fin (2 * (S t))
but we stay more closely to the diagram in figure 1b and define
> data State1 = A | C
> data State2 = AB | AC | CA | CD
> data State3 = ABC | ACB | ACD | CAB | CAD | CDA
> Framework.X Z = Unit
> Framework.X (S Z) = State1
> Framework.X (S (S Z)) = State2
> Framework.X (S (S (S Z))) = State3
> Framework.X _ = Void
As |A| and |C| are already the names of constructors,
we use |AA| etc. for possible actions:
> data Act = AA | BB | CC | DD
> data Ctrl : Act -> Type where
> Do : (a : Act) -> Ctrl a
The type of controls at each decision step and state is defined by
> Framework.Y Z _ = Either (Ctrl AA) (Ctrl CC)
> Framework.Y (S Z) A = Either (Ctrl BB) (Ctrl CC)
> Framework.Y (S Z) C = Either (Ctrl AA) (Ctrl DD)
> Framework.Y (S (S Z)) AB = Ctrl CC
> Framework.Y (S (S Z)) AC = Either (Ctrl BB) (Ctrl DD)
> Framework.Y (S (S Z)) CA = Either (Ctrl BB) (Ctrl DD)
> Framework.Y (S (S Z)) CD = Ctrl AA
> Framework.Y _ _ = Void
The SDP is deterministic, thus
< Framework.M = Id
As values we use natural numbers with their familiar structure
> Framework.Val = Nat
> Framework.zero = 0
> Framework.(<+>) = (+)
> Framework.meas = id
The transition function following fig. 1b is then defined by:
> Framework.next Z () (Left (Do AA)) = A
> Framework.next Z () (Right (Do CC)) = C
> Framework.next (S Z) A (Left (Do BB)) = AB
> Framework.next (S Z) A (Right (Do CC)) = AC
> Framework.next (S Z) C (Left (Do AA)) = CA
> Framework.next (S Z) C (Right (Do DD)) = CD
> Framework.next (S (S Z)) AB (Do CC) = ABC
> Framework.next (S (S Z)) AC (Left (Do BB)) = ACB
> Framework.next (S (S Z)) AC (Right (Do DD)) = ACD
> Framework.next (S (S Z)) CA (Left (Do BB)) = CAB
> Framework.next (S (S Z)) CA (Right (Do DD)) = CAD
> Framework.next (S (S Z)) CD (Do AA) = CDA
>
> Framework.next (S (S (S _))) _ _ impossible
and the reward (or cost) function by:
> Framework.reward Z () (Left (Do AA)) A = 5
> Framework.reward Z () (Right (Do CC)) C = 3
> Framework.reward (S Z) A (Left (Do BB)) AB = 2
> Framework.reward (S Z) A (Right (Do CC)) AC = 3
> Framework.reward (S Z) C (Left (Do AA)) CA = 4
> Framework.reward (S Z) C (Right (Do DD)) CD = 6
> Framework.reward (S (S Z)) AB (Do CC) ABC = 3
> Framework.reward (S (S Z)) AC (Left (Do BB)) ACB = 4
> Framework.reward (S (S Z)) AC (Right (Do DD)) ACD = 6
> Framework.reward (S (S Z)) CA (Left (Do BB)) CAB = 2
> Framework.reward (S (S Z)) CA (Right (Do DD)) CAD = 4
> Framework.reward (S (S Z)) CD (Do AA) CDA = 3
>
> Framework.reward Z _ _ _ = 0
> Framework.reward (S Z) _ _ _ = 0
> Framework.reward (S (S Z)) _ _ _ = 0
> Framework.reward (S (S (S _))) _ _ _ impossible
Some example policies:
> p0A : Framework.Policy 0
> p0A () = Left (Do AA)
> p0C : Framework.Policy 0
> p0C () = Right (Do CC)
> p1 : Framework.Policy 1
> p1 A = Left (Do BB)
> p1 C = Left (Do AA)
> p2 : Framework.Policy 2
> p2 AB = Do CC
> p2 AC = Left (Do BB)
> p2 CA = Left (Do BB)
> p2 CD = Do AA
> psA : Framework.PolicySeq 0 3
> psA = (p0A :: (p1 :: (p2 :: Nil)))
> psC : Framework.PolicySeq 0 3
> psC = (p0C :: (p1 :: (p2 :: Nil)))
> testA : Type
> testA = val psA () = val' psA () -- 10 = 10
> testC : Type
> testC = val psC () = val' psC () -- 9 = 9
> module SchedulingExampleVariant2
> import Data.Fin
> import Data.Vect
> import Equality
> import Framework
> import Id
> %default total
> %auto_implicits off
> %access public export
The following example is the deterministic SDP from [Bertsekas1995, Ch.1]
described in Section 2.2 of [1].
For this example we make use of the possible dependencies
for state and control spaces in the BJI-framework.
The implementation leverages on the fact that controls represent
possible actions and states admissible sequences of actions.
We first define a type of actions
> data Act = A | B | C | D
> Eq Act where
> (==) A A = True
> (==) B B = True
> (==) C C = True
> (==) D D = True
> (==) _x _y = False
To capture the order in which actions can be applied, we first define the
dependencies of each action, i.e. |A| and |C| have no dependencies while
|B| and |D| can only be performed when |A| or |C|, respectively, have
already been done.
> dependencies : Act -> List Act
> dependencies A = []
> dependencies B = [A]
> dependencies C = []
> dependencies D = [C]
To check whether all depdencies of a certain action have been fulfilled, we
use a more generic test which checks whether all occurrences from a list
occur in some vector:
> hasAll : {A : Type} -> Eq A => List A -> {n : Nat} -> Vect n A -> Bool
> hasAll [] _l = True
> hasAll _l [] = False
> hasAll (a :: as) l = elem a l && hasAll as l
Whether some vector represents an admissible state is captured by a type family
that evaluates to the one element type |Unit| if the vector is admissible and
to |Void| if not.
> isAdmissible : {t : Nat} -> Vect t Act -> Bool
> isAdmissible [] = True
> isAdmissible {t=S (S (S (S t)))} _ = False
> isAdmissible (a :: as) = hasAll (dependencies a) as &&
> not (elem a as) &&
> isAdmissible as
We can lift Boolean predicates to type valued predicates
> toType : {A : Type} -> (A -> Bool) -> A -> Type
> toType P a = P a = True
and define
> AdmissibleState : {t : Nat} -> Vect t Act -> Type
> AdmissibleState = toType isAdmissible
Similarly we could use the type |So| from the standard library to lift
Boolean predicates. We choose to use lifted Boolean-valued predicates
instead of proof-relevant type-valued predicates as we do not need
computational content, but just a decision whether a state is admissible
or not.
Now a state is simply a pair of a vector of actions and a proof that
it represents an admissible state.
> State : (t : Nat) -> Type
> State t = (as : Vect t Act ** AdmissibleState as)
Admissible controls for a given state are actions that lead to an
admissible next state:
> AdmissibleControl : {t : Nat} -> State t -> Act -> Type
> AdmissibleControl (as ** prf) a = AdmissibleState (a :: as)
And controls are dpendent pairs of an action and a proof that it is
admissible.
> Control : (t : Nat) -> State t -> Type
> Control t x = (a : Act ** AdmissibleControl x a)
We instantiate the framework components accordingly:
> Framework.X t = State t
> Framework.Y t x = Control t x
The SDP is deterministic, thus
< Framework.M = Id
As values we use natural numbers with their familiar structure
> Framework.Val = Nat
> Framework.zero = 0
> Framework.(<+>) = (+)
> Framework.meas = id
The transition function following Fig. 1b can then simply be defined by:
> Framework.next t (as ** _pf) (act ** pf) = MkDPair (act :: as) pf
The reward (or cost) function uses a helper function that defines the setup
costs for certain combinations of actions. Here we omit restricting to the
possible combinations of actions via type dependencies and just use a
dummy value for combinations that will not occur.
> setupCost : Act -> Act -> Nat
> setupCost A B = 2
> setupCost A C = 3
> setupCost A D = 4
> setupCost B C = 3
> setupCost C A = 4
> setupCost C B = 4
> setupCost C D = 6
> setupCost D A = 3
> setupCost _ _ = 100
> Framework.reward Z _x (A ** _pf) _x' = 5
> Framework.reward Z _x (C ** _pf) _x' = 3
> Framework.reward Z ([] ** Refl) (B ** pf) _x' = absurd pf
> Framework.reward Z ([] ** Refl) (D ** pf) _x' = absurd pf
> Framework.reward (S t) x y _x' = setupCost (head (fst x)) (fst y)
Now we can define some example policies.
Note that ideally Idris' totality checker would know the absurd cases by itself
so that they could be omitted. However this is not yet the case - thus we have to
construct contradictions explicitly using |absurd|.
> p0A : Framework.Policy 0
> p0A ([] ** Refl) = MkDPair A Refl
> p0C : Framework.Policy 0
> p0C ([] ** Refl) = MkDPair C Refl
> p1 : Framework.Policy 1
> p1 ([A] ** Refl) = (B ** Refl)
> p1 ([C] ** Refl) = (A ** Refl)
> p1 ([B] ** pf) = absurd pf
> p1 ([D] ** pf) = absurd pf
> p2 : Framework.Policy 2
> p2 ([B, A] ** Refl) = (C ** Refl)
> p2 ([A, C] ** Refl) = (B ** Refl)
> p2 ([C, A] ** Refl) = (B ** Refl)
> p2 ([D, C] ** Refl) = (A ** Refl)
> p2 ([A, A] ** pf) = absurd pf
> p2 ([A, B] ** pf) = absurd pf
> p2 ([A, D] ** pf) = absurd pf
> p2 ([B, B] ** pf) = absurd pf
> p2 ([B, C] ** pf) = absurd pf
> p2 ([B, D] ** pf) = absurd pf
> p2 ([C, B] ** pf) = absurd pf
> p2 ([C, C] ** pf) = absurd pf
> p2 ([C, D] ** pf) = absurd pf
> p2 ([D, A] ** pf) = absurd pf
> p2 ([D, B] ** pf) = absurd pf
> p2 ([D, D] ** pf) = absurd pf
.. and policy sequences
> psA : Framework.PolicySeq 0 3
> psA = (p0A :: (p1 :: (p2 :: Nil)))
> psC : Framework.PolicySeq 0 3
> psC = (p0C :: (p1 :: (p2 :: Nil)))
The initial state is
> initial : State 0
> initial = ([] ** Refl)
And we can compare the behaviour of |val| and |val'| for
our example policy sequences:
> testA : Type
> testA = val psA initial = val' psA initial -- 10 = 10
> testC : Type
> testC = val psC initial = val' psC initial -- 9 = 9
> module Sigma
> %default total
> %auto_implicits off
> %access public export
> data Sigma : (A : Type) -> (P : A -> Type) -> Type where
> MkSigma : {A : Type} -> .{P : A -> Type} -> (x : A) -> (pf : P x) -> Sigma A P
package valval
modules= Equality
, Framework
, List
, Id
, ListMeasures
, MaxList
, GoodBadExample
, SchedulingExampleVariant1
, SchedulingExampleVariant2
, GoodBadCounterExamples
opts="--allow-capitalized-pattern-variables"
......@@ -4,8 +4,6 @@
> import Syntax.PreorderReasoning
> import Sigma.Sigma
> %default total
> %auto_implicits off
> %access public export
......@@ -132,22 +130,22 @@ The type of state-control sequences:
> data StateCtrlSeq : (t, n : Nat) -> Type where
> Last : {t : Nat} -> X t -> StateCtrlSeq t (S Z)
> (:::) : {t, n : Nat} -> Sigma (X t) (Y t) ->
> (:::) : {t, n : Nat} -> (x : X t ** Y t x) ->
> StateCtrlSeq (S t) (S n) -> StateCtrlSeq t (S (S n))
A function that extracts the head state from a state-control sequence:
> head : {t, n : Nat} -> StateCtrlSeq t (S n) -> X t
> head (Last x) = x
> head (MkSigma x y ::: xys) = x
> head (Last x) = x
> head ((x ** y) ::: xys) = x
The reward accumulated along a state-control sequence:
> sumR : {t, n : Nat} -> StateCtrlSeq t n -> Val
> sumR {t} (Last x) = zero
> sumR {t} (MkSigma x y ::: xys) = reward t x y (head xys) <+> sumR xys
> sumR {t} (Last x) = zero
> sumR {t} ((x ** y) ::: xys) = reward t x y (head xys) <+> sumR xys