OmniSafe Model-based Planner#
ARC Planner#
Documentation
- class omnisafe.algorithms.model_based.planner.ARCPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#
The planner of Actor Regularized Control (ARC) algorithm.
References
Title: Learning Off-Policy with Online Planning
Authors: Harshit Sikchi, Wenxuan Zhou, David Held.
URL: ARC
Initialize the planner of Actor Regularized Control (ARC) algorithm.
- _act_from_actor(state)[source]#
Sample actions from the actor.
- Parameters:
state (torch.Tensor) – The current state.
- Returns:
sampled actions – Sampled actions from the actor.
- Return type:
Tensor
- _act_from_last_gaus(last_mean, last_var)[source]#
Sample actions from the last gaussian distribution.
- Parameters:
last_mean (torch.Tensor) – Last mean of the gaussian distribution.
last_var (torch.Tensor) – Last variance of the gaussian distribution.
- Returns:
sampled actions – Sampled actions from the last gaussian distribution.
- Return type:
Tensor
- _select_elites(actions, traj)[source]#
Select elites from the sampled actions.
- Parameters:
actions (torch.Tensor) – Sampled actions.
traj (dict[str, torch.Tensor]) – Trajectory dictionary.
- Returns:
elites_value – The value of the elites.
elites_action – The action of the elites.
info – The dictionary containing the information of elites value and action.
- Return type:
tuple
[Tensor
,Tensor
,dict
[str
,float
]]
- _state_action_repeat(state, action)[source]#
Repeat the state for num_repeat * action.shape[0] times and action for num_repeat times.
- Parameters:
state (torch.Tensor) – The current state.
action (torch.Tensor) – The sampled actions.
- Returns:
states – The repeated states.
actions – The repeated actions.
- Return type:
tuple
[Tensor
,Tensor
]
- _update_mean_var(elite_actions, elite_values, info)[source]#
Update the mean and variance of the elite actions.
- Parameters:
elite_actions (torch.Tensor) – The elite actions.
elite_values (torch.Tensor) – The elite values.
info (dict[str, float]) – The dictionary containing the information of the elite values and actions.
- Returns:
new_mean – The new mean of the elite actions.
new_var – The new variance of the elite actions.
- Return type:
tuple
[Tensor
,Tensor
]
CAP Planner#
Documentation
- class omnisafe.algorithms.model_based.planner.CAPPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#
The planner of Conservative and Adaptive Penalty (CAP) algorithm.
References
Title: Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning
Authors: Yecheng Jason Ma, Andrew Shen, Osbert Bastani, Dinesh Jayaraman.
URL: CAP
Initializes the planner of Conservative and Adaptive Penalty (CAP) algorithm.
- _select_elites(actions, traj)[source]#
Select elites from the sampled actions.
- Parameters:
actions (torch.Tensor) – Sampled actions.
traj (dict[str, torch.Tensor]) – Trajectory dictionary.
- Returns:
elites_value – The value of the elites.
elites_action – The action of the elites.
info – The dictionary containing the information of elites value and action.
- Return type:
tuple
[Tensor
,Tensor
,dict
[str
,float
]]
CCE Planner#
Documentation
- class omnisafe.algorithms.model_based.planner.CCEPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#
The planner of Constrained Cross-Entropy (CCE) algorithm.
References
Title: Constrained Cross-Entropy Method for Safe Reinforcement Learning
- Authors: Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess,
Tom Erez, Yuval Tassa, David Silver, Daan Wierstra.
URL: CCE
Initializes the planner of Constrained Cross-Entropy (CCE) algorithm.
- _select_elites(actions, traj)[source]#
Select elites from the sampled actions.
- Parameters:
actions (torch.Tensor) – Sampled actions.
traj (dict[str, torch.Tensor]) – Trajectory dictionary.
- Returns:
elites_value – The value of the elites.
elites_action – The action of the elites.
info – The dictionary containing the information of elites value and action.
- Return type:
tuple
[Tensor
,Tensor
,dict
[str
,float
]]
CEM Planner#
Documentation
- class omnisafe.algorithms.model_based.planner.CEMPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#
The planner of Cross-Entropy Method optimization (CEM) algorithm.
References
Title: Sample-efficient Cross-Entropy Method for Real-time Planning
- Authors: Cristina Pinneri, Shambhuraj Sawant, Sebastian Blaes, Jan Achterhold,
Joerg Stueckler, Michal Rolinek, Georg Martius
URL: CEM
Initializes the planner of Cross-Entropy Method optimization (CEM) algorithm.
- _act_from_last_gaus(last_mean, last_var)[source]#
Sample actions from the last gaussian distribution.
- Parameters:
last_mean (torch.Tensor) – Last mean of the gaussian distribution.
last_var (torch.Tensor) – Last variance of the gaussian distribution.
- Returns:
sampled actions – Sampled actions from the last gaussian distribution.
- Return type:
Tensor
- _select_elites(actions, traj)[source]#
Select elites from the sampled actions.
- Parameters:
actions (torch.Tensor) – Sampled actions.
traj (dict[str, torch.Tensor]) – Trajectory dictionary.
- Returns:
elites_value – The value of the elites.
elites_action – The action of the elites.
info – The dictionary containing the information of elites value and action.
- Return type:
tuple
[Tensor
,Tensor
,dict
[str
,float
]]
- _state_action_repeat(state, action)[source]#
Repeat the state for num_repeat * action.shape[0] times and action for num_repeat times.
- Parameters:
state (torch.Tensor) – The current state.
action (torch.Tensor) – The sampled actions.
- Returns:
states – The repeated states.
actions – The repeated actions.
- Return type:
tuple
[Tensor
,Tensor
]
- _update_mean_var(elite_actions, elite_values, info)[source]#
Update the mean and variance of the elite actions.
- Parameters:
elite_actions (torch.Tensor) – The elite actions.
elite_values (torch.Tensor) – The elite values.
info (dict[str, float]) – The dictionary containing the information of the elite values and actions.
- Returns:
new_mean – The new mean of the elite actions.
new_var – The new variance of the elite actions.
- Return type:
tuple
[Tensor
,Tensor
]
RCE Planner#
Documentation
- class omnisafe.algorithms.model_based.planner.RCEPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#
The planner of Robust Cross Entropy (RCE) algorithm.
References
Title: Constrained Model-based Reinforcement Learning with Robust Cross-Entropy Method
Authors: Zuxin Liu, Hongyi Zhou, Baiming Chen, Sicheng Zhong, Martial Hebert, Ding Zhao.
URL: RCE
Initializes the planner of Constrained Cross-Entropy (CCE) algorithm.
- _select_elites(actions, traj)[source]#
Select elites from the sampled actions.
- Parameters:
actions (torch.Tensor) – Sampled actions.
traj (dict[str, torch.Tensor]) – Trajectory dictionary.
- Returns:
elites_value – The value of the elites.
elites_action – The action of the elites.
info – The dictionary containing the information of elites value and action.
- Return type:
tuple
[Tensor
,Tensor
,dict
[str
,float
]]
SafeARC Planner#
Documentation
- class omnisafe.algorithms.model_based.planner.SafeARCPlanner(dynamics, planner_cfgs, gamma, cost_gamma, dynamics_state_shape, action_shape, action_max, action_min, device, **kwargs)[source]#
The planner of Safe Actor Regularized Control (ARC) algorithm.
References
Title: Learning Off-Policy with Online Planning
Authors: Harshit Sikchi, Wenxuan Zhou, David Held.
URL: Safe ARC
Initializes the planner of Safe Actor Regularized Control (ARC) algorithm.
- _select_elites(actions, traj)[source]#
Select elites from the sampled actions.
- Parameters:
actions (torch.Tensor) – Sampled actions.
traj (dict[str, torch.Tensor]) – Trajectory dictionary.
- Returns:
elites_value – The value of the elites.
elites_action – The action of the elites.
info – The dictionary containing the information of elites value and action.
- Return type:
tuple
[Tensor
,Tensor
,dict
[str
,float
]]
- _update_mean_var(elite_actions, elite_values, info)[source]#
Update the mean and variance of the elite actions.
- Parameters:
elite_actions (torch.Tensor) – The elite actions.
elite_values (torch.Tensor) – The elite values.
info (dict[str, float]) – The dictionary containing the information of the elite values and actions.
- Returns:
new_mean – The new mean of the elite actions.
new_var – The new variance of the elite actions.
- Return type:
tuple
[Tensor
,Tensor
]