OmniSafe Model-based Planner#

ARC Planner#

Documentation

class omnisafe.algorithms.model_based.planner.ARCPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#

The planner of Actor Regularized Control (ARC) algorithm.

References

  • Title: Learning Off-Policy with Online Planning

  • Authors: Harshit Sikchi, Wenxuan Zhou, David Held.

  • URL: ARC

Initialize the planner of Actor Regularized Control (ARC) algorithm.

_act_from_actor(state)[source]#

Sample actions from the actor.

Parameters:

state (torch.Tensor) – The current state.

Returns:

sampled actions – Sampled actions from the actor.

Return type:

Tensor

_act_from_last_gaus(last_mean, last_var)[source]#

Sample actions from the last gaussian distribution.

Parameters:
  • last_mean (torch.Tensor) – Last mean of the gaussian distribution.

  • last_var (torch.Tensor) – Last variance of the gaussian distribution.

Returns:

sampled actions – Sampled actions from the last gaussian distribution.

Return type:

Tensor

_select_elites(actions, traj)[source]#

Select elites from the sampled actions.

Parameters:
  • actions (torch.Tensor) – Sampled actions.

  • traj (dict[str, torch.Tensor]) – Trajectory dictionary.

Returns:
  • elites_value – The value of the elites.

  • elites_action – The action of the elites.

  • info – The dictionary containing the information of elites value and action.

Return type:

tuple[Tensor, Tensor, dict[str, float]]

_state_action_repeat(state, action)[source]#

Repeat the state for num_repeat * action.shape[0] times and action for num_repeat times.

Parameters:
  • state (torch.Tensor) – The current state.

  • action (torch.Tensor) – The sampled actions.

Returns:
  • states – The repeated states.

  • actions – The repeated actions.

Return type:

tuple[Tensor, Tensor]

_update_mean_var(elite_actions, elite_values, info)[source]#

Update the mean and variance of the elite actions.

Parameters:
  • elite_actions (torch.Tensor) – The elite actions.

  • elite_values (torch.Tensor) – The elite values.

  • info (dict[str, float]) – The dictionary containing the information of the elite values and actions.

Returns:
  • new_mean – The new mean of the elite actions.

  • new_var – The new variance of the elite actions.

Return type:

tuple[Tensor, Tensor]

output_action(state)[source]#

Output the action given the state.

Parameters:

state (torch.Tensor) – State of the environment.

Returns:
  • action – The action of the agent.

  • info – The dictionary containing the information of the action.

Return type:

tuple[Tensor, dict[str, float]]

CAP Planner#

Documentation

class omnisafe.algorithms.model_based.planner.CAPPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#

The planner of Conservative and Adaptive Penalty (CAP) algorithm.

References

  • Title: Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning

  • Authors: Yecheng Jason Ma, Andrew Shen, Osbert Bastani, Dinesh Jayaraman.

  • URL: CAP

Initializes the planner of Conservative and Adaptive Penalty (CAP) algorithm.

_select_elites(actions, traj)[source]#

Select elites from the sampled actions.

Parameters:
  • actions (torch.Tensor) – Sampled actions.

  • traj (dict[str, torch.Tensor]) – Trajectory dictionary.

Returns:
  • elites_value – The value of the elites.

  • elites_action – The action of the elites.

  • info – The dictionary containing the information of elites value and action.

Return type:

tuple[Tensor, Tensor, dict[str, float]]

CCE Planner#

Documentation

class omnisafe.algorithms.model_based.planner.CCEPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#

The planner of Constrained Cross-Entropy (CCE) algorithm.

References

  • Title: Constrained Cross-Entropy Method for Safe Reinforcement Learning

  • Authors: Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess,

    Tom Erez, Yuval Tassa, David Silver, Daan Wierstra.

  • URL: CCE

Initializes the planner of Constrained Cross-Entropy (CCE) algorithm.

_select_elites(actions, traj)[source]#

Select elites from the sampled actions.

Parameters:
  • actions (torch.Tensor) – Sampled actions.

  • traj (dict[str, torch.Tensor]) – Trajectory dictionary.

Returns:
  • elites_value – The value of the elites.

  • elites_action – The action of the elites.

  • info – The dictionary containing the information of elites value and action.

Return type:

tuple[Tensor, Tensor, dict[str, float]]

CEM Planner#

Documentation

class omnisafe.algorithms.model_based.planner.CEMPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#

The planner of Cross-Entropy Method optimization (CEM) algorithm.

References

  • Title: Sample-efficient Cross-Entropy Method for Real-time Planning

  • Authors: Cristina Pinneri, Shambhuraj Sawant, Sebastian Blaes, Jan Achterhold,

    Joerg Stueckler, Michal Rolinek, Georg Martius

  • URL: CEM

Initializes the planner of Cross-Entropy Method optimization (CEM) algorithm.

_act_from_last_gaus(last_mean, last_var)[source]#

Sample actions from the last gaussian distribution.

Parameters:
  • last_mean (torch.Tensor) – Last mean of the gaussian distribution.

  • last_var (torch.Tensor) – Last variance of the gaussian distribution.

Returns:

sampled actions – Sampled actions from the last gaussian distribution.

Return type:

Tensor

_select_elites(actions, traj)[source]#

Select elites from the sampled actions.

Parameters:
  • actions (torch.Tensor) – Sampled actions.

  • traj (dict[str, torch.Tensor]) – Trajectory dictionary.

Returns:
  • elites_value – The value of the elites.

  • elites_action – The action of the elites.

  • info – The dictionary containing the information of elites value and action.

Return type:

tuple[Tensor, Tensor, dict[str, float]]

_state_action_repeat(state, action)[source]#

Repeat the state for num_repeat * action.shape[0] times and action for num_repeat times.

Parameters:
  • state (torch.Tensor) – The current state.

  • action (torch.Tensor) – The sampled actions.

Returns:
  • states – The repeated states.

  • actions – The repeated actions.

Return type:

tuple[Tensor, Tensor]

_update_mean_var(elite_actions, elite_values, info)[source]#

Update the mean and variance of the elite actions.

Parameters:
  • elite_actions (torch.Tensor) – The elite actions.

  • elite_values (torch.Tensor) – The elite values.

  • info (dict[str, float]) – The dictionary containing the information of the elite values and actions.

Returns:
  • new_mean – The new mean of the elite actions.

  • new_var – The new variance of the elite actions.

Return type:

tuple[Tensor, Tensor]

output_action(state)[source]#

Output the action given the state.

Parameters:

state (torch.Tensor) – State of the environment.

Returns:
  • action – The action of the agent.

  • info – The dictionary containing the information of the action.

Return type:

tuple[Tensor, dict]

RCE Planner#

Documentation

class omnisafe.algorithms.model_based.planner.RCEPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#

The planner of Robust Cross Entropy (RCE) algorithm.

References

  • Title: Constrained Model-based Reinforcement Learning with Robust Cross-Entropy Method

  • Authors: Zuxin Liu, Hongyi Zhou, Baiming Chen, Sicheng Zhong, Martial Hebert, Ding Zhao.

  • URL: RCE

Initializes the planner of Constrained Cross-Entropy (CCE) algorithm.

_select_elites(actions, traj)[source]#

Select elites from the sampled actions.

Parameters:
  • actions (torch.Tensor) – Sampled actions.

  • traj (dict[str, torch.Tensor]) – Trajectory dictionary.

Returns:
  • elites_value – The value of the elites.

  • elites_action – The action of the elites.

  • info – The dictionary containing the information of elites value and action.

Return type:

tuple[Tensor, Tensor, dict[str, float]]

SafeARC Planner#

Documentation

class omnisafe.algorithms.model_based.planner.SafeARCPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#

The planner of Safe Actor Regularized Control (ARC) algorithm.

References

  • Title: Learning Off-Policy with Online Planning

  • Authors: Harshit Sikchi, Wenxuan Zhou, David Held.

  • URL: Safe ARC

Initializes the planner of Safe Actor Regularized Control (ARC) algorithm.

_select_elites(actions, traj)[source]#

Select elites from the sampled actions.

Parameters:
  • actions (torch.Tensor) – Sampled actions.

  • traj (dict[str, torch.Tensor]) – Trajectory dictionary.

Returns:
  • elites_value – The value of the elites.

  • elites_action – The action of the elites.

  • info – The dictionary containing the information of elites value and action.

Return type:

tuple[Tensor, Tensor, dict[str, float]]

_update_mean_var(elite_actions, elite_values, info)[source]#

Update the mean and variance of the elite actions.

Parameters:
  • elite_actions (torch.Tensor) – The elite actions.

  • elite_values (torch.Tensor) – The elite values.

  • info (dict[str, float]) – The dictionary containing the information of the elite values and actions.

Returns:
  • new_mean – The new mean of the elite actions.

  • new_var – The new variance of the elite actions.

Return type:

tuple[Tensor, Tensor]

output_action(state)[source]#

Output the action given the state.

Parameters:

state (torch.Tensor) – State of the environment.

Returns:
  • action – The action of the agent.

  • info – The dictionary containing the information of the action.

Return type:

tuple[Tensor, dict[str, float]]