Class MarkovDecisionProcess<S extends State,A extends Action>

java.lang.Object
org.tweetyproject.machinelearning.rl.mdp.MarkovDecisionProcess<S,A>
Type Parameters:
S - The type of states this MDP uses
A - The type of actions this MDP uses

public class MarkovDecisionProcess<S extends State,A extends Action> extends Object
This class models a Markov Decision Process (MDP, for fixed starting and terminal states), which can be used to represent reinforcement learning scenarios.
Author:
Matthias Thimm
  • Constructor Summary

    Constructors
    Constructor
    Description
    MarkovDecisionProcess(Collection<S> states, S initial_state, Collection<S> terminal_states, Collection<A> actions)
    Creates a new Markov Decision Process with the given states and actions
  • Method Summary

    Modifier and Type
    Method
    Description
    double
    expectedUtility(Policy<S,A> pi, int num_episodes, double gamma)
    Approximates the expected utility of the given policy within this MPD using Monte Carlo search (which uses the given number of episodes)
    Returns the actions of this MDP
    double
    getProb(S s, A a, S sp)
    Returns the probability of the given transition.
    double
    Returns the probability of the given episode
    double
    getReward(S s, A a, S sp)
    Returns the reward of the given transition.
    Returns the states of this MDP
    double
    getUtility(Episode<S,A> ep, double gamma)
    Returns the utility of the given episode with the given discount factor
    boolean
    Checks whether the given state is terminal
    boolean
    Checks whether this MDP is well-formed, i.e.
    void
    putProb(S s, A a, S sp, double p)
    Sets the transition probability from s to sp via a to p.
    void
    putReward(S s, A a, S sp, double r)
    Sets the reward from s to sp via a to p.
    sample(S s, A a)
    Samples the next state for executing a in s (given the corresponding probabilities)
    sample(S s, Policy<S,A> pi)
    Samples an episode wrt.
    void
    setSeed(long seed)
    Sets the seed for the used random number generator.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • MarkovDecisionProcess

      public MarkovDecisionProcess(Collection<S> states, S initial_state, Collection<S> terminal_states, Collection<A> actions)
      Creates a new Markov Decision Process with the given states and actions
      Parameters:
      states - some states
      initial_state - initial state
      terminal_states - terminal state
      actions - some action
  • Method Details

    • setSeed

      public void setSeed(long seed)
      Sets the seed for the used random number generator.
      Parameters:
      seed - some seed.
    • getStates

      public Collection<S> getStates()
      Returns the states of this MDP
      Returns:
      the states of this MDP
    • getActions

      public Collection<A> getActions()
      Returns the actions of this MDP
      Returns:
      the actions of this MDP
    • isTerminal

      public boolean isTerminal(S s)
      Checks whether the given state is terminal
      Parameters:
      s - some state
      Returns:
      true iff the state is terminal
    • isWellFormed

      public boolean isWellFormed()
      Checks whether this MDP is well-formed, i.e. whether for every state and action, the probabilities of all successor states sum up to one.
      Returns:
      true iff this MDP is well-formed
    • putProb

      public void putProb(S s, A a, S sp, double p)
      Sets the transition probability from s to sp via a to p.
      Parameters:
      s - some state
      a - some action
      sp - some state
      p - the probability
    • getReward

      public double getReward(S s, A a, S sp)
      Returns the reward of the given transition.
      Parameters:
      s - some state
      a - some action
      sp - some state
      Returns:
      the reward of the transition s,a,sp
    • getProb

      public double getProb(S s, A a, S sp)
      Returns the probability of the given transition.
      Parameters:
      s - some state
      a - some action
      sp - some state
      Returns:
      the probability of the transition s,a,sp
    • putReward

      public void putReward(S s, A a, S sp, double r)
      Sets the reward from s to sp via a to p.
      Parameters:
      s - some state
      a - some action
      sp - some state
      r - the reward
    • sample

      public S sample(S s, A a)
      Samples the next state for executing a in s (given the corresponding probabilities)
      Parameters:
      s - some state
      a - some action
      Returns:
      the sampled next state
    • sample

      public Episode<S,A> sample(S s, Policy<S,A> pi)
      Samples an episode wrt. the given policy
      Parameters:
      s - some initial state
      pi - a policy
      Returns:
      an episode
    • getProbability

      public double getProbability(Episode<S,A> ep)
      Returns the probability of the given episode
      Parameters:
      ep - some episode
      Returns:
      the probability of the episode
    • getUtility

      public double getUtility(Episode<S,A> ep, double gamma)
      Returns the utility of the given episode with the given discount factor
      Parameters:
      ep - some episode
      gamma - some discount factor
      Returns:
      the utility of the episode
    • expectedUtility

      public double expectedUtility(Policy<S,A> pi, int num_episodes, double gamma)
      Approximates the expected utility of the given policy within this MPD using Monte Carlo search (which uses the given number of episodes)
      Parameters:
      pi - some policy
      num_episodes - number of epsiodes
      gamma - gamma for utitlity
      Returns:
      the expected utility of the policy (approximated)