java.lang.Object

org.tweetyproject.machinelearning.rl.mdp.MarkovDecisionProcess<S,A>

Type Parameters:: S - The type of states this MDP uses; A - The type of actions this MDP uses

public class MarkovDecisionProcess<S extends State, A extends Action> extends Object

This class models a Markov Decision Process (MDP, for fixed starting and terminal states), which can be used to represent reinforcement learning scenarios.

Author:: Matthias Thimm

Constructor Summary

Constructors

Constructor

Description

MarkovDecisionProcess(Collection<S> states, S initial_state, Collection<S> terminal_states, Collection<A> actions)

Creates a new Markov Decision Process with the given states and actions
Method Summary

Modifier and Type

Method

Description

double

expectedUtility(Policy<S,A> pi, int num_episodes, double gamma)

Approximates the expected utility of the given policy within this MPD using Monte Carlo search (which uses the given number of episodes)

Collection<A>

getActions()

Returns the actions of this MDP

double

getProb(S s, A a, S sp)

Returns the probability of the given transition.

double

getProbability(Episode<S,A> ep)

Returns the probability of the given episode

double

getReward(S s, A a, S sp)

Returns the reward of the given transition.

Collection<S>

getStates()

Returns the states of this MDP

double

getUtility(Episode<S,A> ep, double gamma)

Returns the utility of the given episode with the given discount factor

boolean

isTerminal(S s)

Checks whether the given state is terminal

boolean

isWellFormed()

Checks whether this MDP is well-formed, i.e.

void

putProb(S s, A a, S sp, double p)

Sets the transition probability from s to sp via a to p.

void

putReward(S s, A a, S sp, double r)

Sets the reward from s to sp via a to p.

S

sample(S s, A a)

Samples the next state for executing a in s (given the corresponding probabilities)

Episode<S,A>

sample(S s, Policy<S,A> pi)

Samples an episode wrt.

void

setSeed(long seed)

Sets the seed for the used random number generator.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- MarkovDecisionProcess
  
  public MarkovDecisionProcess(Collection<S> states, S initial_state, Collection<S> terminal_states, Collection<A> actions)
  
  Creates a new Markov Decision Process with the given states and actions
  
  Parameters:
  
  states - some states
  
  initial_state - initial state
  
  terminal_states - terminal state
  
  actions - some action
Method Details
- setSeed
  
  public void setSeed(long seed)
  
  Sets the seed for the used random number generator.
  
  Parameters:
  
  seed - some seed.
- getStates
  
  public Collection<S> getStates()
  
  Returns the states of this MDP
  
  Returns:
  
  the states of this MDP
- getActions
  
  public Collection<A> getActions()
  
  Returns the actions of this MDP
  
  Returns:
  
  the actions of this MDP
- isTerminal
  
  public boolean isTerminal(S s)
  
  Checks whether the given state is terminal
  
  Parameters:
  
  s - some state
  
  Returns:
  
  true iff the state is terminal
- isWellFormed
  
  public boolean isWellFormed()
  
  Checks whether this MDP is well-formed, i.e. whether for every state and action, the probabilities of all successor states sum up to one.
  
  Returns:
  
  true iff this MDP is well-formed
- putProb
  
  public void putProb(S s, A a, S sp, double p)
  
  Sets the transition probability from s to sp via a to p.
  
  Parameters:
  
  s - some state
  
  a - some action
  
  sp - some state
  
  p - the probability
- getReward
  
  public double getReward(S s, A a, S sp)
  
  Returns the reward of the given transition.
  
  Parameters:
  
  s - some state
  
  a - some action
  
  sp - some state
  
  Returns:
  
  the reward of the transition s,a,sp
- getProb
  
  public double getProb(S s, A a, S sp)
  
  Returns the probability of the given transition.
  
  Parameters:
  
  s - some state
  
  a - some action
  
  sp - some state
  
  Returns:
  
  the probability of the transition s,a,sp
- putReward
  
  public void putReward(S s, A a, S sp, double r)
  
  Sets the reward from s to sp via a to p.
  
  Parameters:
  
  s - some state
  
  a - some action
  
  sp - some state
  
  r - the reward
- sample
  
  public S sample(S s, A a)
  
  Samples the next state for executing a in s (given the corresponding probabilities)
  
  Parameters:
  
  s - some state
  
  a - some action
  
  Returns:
  
  the sampled next state
- sample
  
  public Episode<S,A> sample(S s, Policy<S,A> pi)
  
  Samples an episode wrt. the given policy
  
  Parameters:
  
  s - some initial state
  
  pi - a policy
  
  Returns:
  
  an episode
- getProbability
  
  public double getProbability(Episode<S,A> ep)
  
  Returns the probability of the given episode
  
  Parameters:
  
  ep - some episode
  
  Returns:
  
  the probability of the episode
- getUtility
  
  public double getUtility(Episode<S,A> ep, double gamma)
  
  Returns the utility of the given episode with the given discount factor
  
  Parameters:
  
  ep - some episode
  
  gamma - some discount factor
  
  Returns:
  
  the utility of the episode
- expectedUtility
  
  public double expectedUtility(Policy<S,A> pi, int num_episodes, double gamma)
  
  Approximates the expected utility of the given policy within this MPD using Monte Carlo search (which uses the given number of episodes)
  
  Parameters:
  
  pi - some policy
  
  num_episodes - number of epsiodes
  
  gamma - gamma for utitlity
  
  Returns:
  
  the expected utility of the policy (approximated)

Class MarkovDecisionProcess<S extends State, A extends Action>

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

MarkovDecisionProcess

Method Details

setSeed

getStates

getActions

isTerminal

isWellFormed

putProb

getReward

getProb

putReward

sample

sample

getProbability

getUtility

expectedUtility