OmniSafe Model-based Planner#

ARC Planner#

Documentation

class omnisafe.algorithms.model_based.planner.ARCPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#

The planner of Actor Regularized Control (ARC) algorithm.

References

Title: Learning Off-Policy with Online Planning
Authors: Harshit Sikchi, Wenxuan Zhou, David Held.
URL: ARC

Initialize the planner of Actor Regularized Control (ARC) algorithm.

_act_from_actor(state)[source]#

Sample actions from the actor.

Parameters:: state (torch.Tensor) – The current state.
Returns:: sampled actions – Sampled actions from the actor.
Return type:: Tensor

_act_from_last_gaus(last_mean, last_var)[source]#

Sample actions from the last gaussian distribution.

Parameters:

last_mean (torch.Tensor) – Last mean of the gaussian distribution.
last_var (torch.Tensor) – Last variance of the gaussian distribution.

Returns:

sampled actions – Sampled actions from the last gaussian distribution.

Return type:

Tensor

_select_elites(actions, traj)[source]#

Select elites from the sampled actions.

Parameters:

actions (torch.Tensor) – Sampled actions.
traj (dict[str, torch.Tensor]) – Trajectory dictionary.

Returns:

elites_value – The value of the elites.
elites_action – The action of the elites.
info – The dictionary containing the information of elites value and action.

Return type:

tuple[Tensor, Tensor, dict[str, float]]

_state_action_repeat(state, action)[source]#

Repeat the state for num_repeat * action.shape[0] times and action for num_repeat times.

Parameters:

state (torch.Tensor) – The current state.
action (torch.Tensor) – The sampled actions.

Returns:

states – The repeated states.
actions – The repeated actions.

Return type:

tuple[Tensor, Tensor]

_update_mean_var(elite_actions, elite_values, info)[source]#

Update the mean and variance of the elite actions.

Parameters:

elite_actions (torch.Tensor) – The elite actions.
elite_values (torch.Tensor) – The elite values.
info (dict[str, float]) – The dictionary containing the information of the elite values and actions.

Returns:

new_mean – The new mean of the elite actions.
new_var – The new variance of the elite actions.

Return type:

tuple[Tensor, Tensor]

output_action(state)[source]#

Output the action given the state.

Parameters:

state (torch.Tensor) – State of the environment.

Returns:

action – The action of the agent.
info – The dictionary containing the information of the action.

Return type:

tuple[Tensor, dict[str, float]]

CAP Planner#

Documentation

class omnisafe.algorithms.model_based.planner.CAPPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#

The planner of Conservative and Adaptive Penalty (CAP) algorithm.

References

Title: Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning
Authors: Yecheng Jason Ma, Andrew Shen, Osbert Bastani, Dinesh Jayaraman.
URL: CAP

Initializes the planner of Conservative and Adaptive Penalty (CAP) algorithm.

_select_elites(actions, traj)[source]#

Select elites from the sampled actions.

Parameters:

actions (torch.Tensor) – Sampled actions.
traj (dict[str, torch.Tensor]) – Trajectory dictionary.

Returns:

elites_value – The value of the elites.
elites_action – The action of the elites.
info – The dictionary containing the information of elites value and action.

Return type:

tuple[Tensor, Tensor, dict[str, float]]

CCE Planner#

Documentation

class omnisafe.algorithms.model_based.planner.CCEPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#

The planner of Constrained Cross-Entropy (CCE) algorithm.

References

Title: Constrained Cross-Entropy Method for Safe Reinforcement Learning
Authors: Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess,
Tom Erez, Yuval Tassa, David Silver, Daan Wierstra.
URL: CCE

Initializes the planner of Constrained Cross-Entropy (CCE) algorithm.

_select_elites(actions, traj)[source]#

Select elites from the sampled actions.

Parameters:

actions (torch.Tensor) – Sampled actions.
traj (dict[str, torch.Tensor]) – Trajectory dictionary.

Returns:

elites_value – The value of the elites.
elites_action – The action of the elites.
info – The dictionary containing the information of elites value and action.

Return type:

tuple[Tensor, Tensor, dict[str, float]]

CEM Planner#

Documentation

class omnisafe.algorithms.model_based.planner.CEMPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#

The planner of Cross-Entropy Method optimization (CEM) algorithm.

References

Title: Sample-efficient Cross-Entropy Method for Real-time Planning
Authors: Cristina Pinneri, Shambhuraj Sawant, Sebastian Blaes, Jan Achterhold,
Joerg Stueckler, Michal Rolinek, Georg Martius
URL: CEM

Initializes the planner of Cross-Entropy Method optimization (CEM) algorithm.

_act_from_last_gaus(last_mean, last_var)[source]#

Sample actions from the last gaussian distribution.

Parameters:

last_mean (torch.Tensor) – Last mean of the gaussian distribution.
last_var (torch.Tensor) – Last variance of the gaussian distribution.

Returns:

sampled actions – Sampled actions from the last gaussian distribution.

Return type:

Tensor

_select_elites(actions, traj)[source]#

Select elites from the sampled actions.

Parameters:

actions (torch.Tensor) – Sampled actions.
traj (dict[str, torch.Tensor]) – Trajectory dictionary.

Returns:

elites_value – The value of the elites.
elites_action – The action of the elites.
info – The dictionary containing the information of elites value and action.

Return type:

tuple[Tensor, Tensor, dict[str, float]]

_state_action_repeat(state, action)[source]#

Repeat the state for num_repeat * action.shape[0] times and action for num_repeat times.

Parameters:

state (torch.Tensor) – The current state.
action (torch.Tensor) – The sampled actions.

Returns:

states – The repeated states.
actions – The repeated actions.

Return type:

tuple[Tensor, Tensor]

_update_mean_var(elite_actions, elite_values, info)[source]#

Update the mean and variance of the elite actions.

Parameters:

elite_actions (torch.Tensor) – The elite actions.
elite_values (torch.Tensor) – The elite values.
info (dict[str, float]) – The dictionary containing the information of the elite values and actions.

Returns:

new_mean – The new mean of the elite actions.
new_var – The new variance of the elite actions.

Return type:

tuple[Tensor, Tensor]

output_action(state)[source]#

Output the action given the state.

Parameters:

state (torch.Tensor) – State of the environment.

Returns:

action – The action of the agent.
info – The dictionary containing the information of the action.

Return type:

tuple[Tensor, dict]

RCE Planner#

Documentation

class omnisafe.algorithms.model_based.planner.RCEPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#

The planner of Robust Cross Entropy (RCE) algorithm.

References

Title: Constrained Model-based Reinforcement Learning with Robust Cross-Entropy Method
Authors: Zuxin Liu, Hongyi Zhou, Baiming Chen, Sicheng Zhong, Martial Hebert, Ding Zhao.
URL: RCE

Initializes the planner of Constrained Cross-Entropy (CCE) algorithm.

_select_elites(actions, traj)[source]#

Select elites from the sampled actions.

Parameters:

actions (torch.Tensor) – Sampled actions.
traj (dict[str, torch.Tensor]) – Trajectory dictionary.

Returns:

elites_value – The value of the elites.
elites_action – The action of the elites.
info – The dictionary containing the information of elites value and action.

Return type:

tuple[Tensor, Tensor, dict[str, float]]

SafeARC Planner#

Documentation

class omnisafe.algorithms.model_based.planner.SafeARCPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#

The planner of Safe Actor Regularized Control (ARC) algorithm.

References

Title: Learning Off-Policy with Online Planning
Authors: Harshit Sikchi, Wenxuan Zhou, David Held.
URL: Safe ARC

Initializes the planner of Safe Actor Regularized Control (ARC) algorithm.

_select_elites(actions, traj)[source]#

Select elites from the sampled actions.

Parameters:

actions (torch.Tensor) – Sampled actions.
traj (dict[str, torch.Tensor]) – Trajectory dictionary.

Returns:

elites_value – The value of the elites.
elites_action – The action of the elites.
info – The dictionary containing the information of elites value and action.

Return type:

tuple[Tensor, Tensor, dict[str, float]]

_update_mean_var(elite_actions, elite_values, info)[source]#

Update the mean and variance of the elite actions.

Parameters:

elite_actions (torch.Tensor) – The elite actions.
elite_values (torch.Tensor) – The elite values.
info (dict[str, float]) – The dictionary containing the information of the elite values and actions.

Returns:

new_mean – The new mean of the elite actions.
new_var – The new variance of the elite actions.

Return type:

tuple[Tensor, Tensor]

output_action(state)[source]#

Output the action given the state.

Parameters:

state (torch.Tensor) – State of the environment.

Returns:

action – The action of the agent.
info – The dictionary containing the information of the action.

Return type:

tuple[Tensor, dict[str, float]]