Model-based Algorithms#

CAPPETS#

Documentation

class omnisafe.algorithms.model_based.CAPPETS(env_id, cfgs)[source]#

The Conservative and Adaptive Penalty (CAP) algorithm implementation based on PETS.

References

  • Title: Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning

  • Authors: Yecheng Jason Ma, Andrew Shen, Osbert Bastani, Dinesh Jayaraman.

  • URL: CAP

Initialize an instance of algorithm.

_init_log()[source]#

Initialize the logger.

Things to log

Description

Plan/feasible_num

The number of feasible plans.

Plan/episode_costs_max

The maximum planning cost.

Plan/episode_costs_mean

The mean planning cost.

Plan/episode_costs_min

The minimum planning cost.

Metrics/LagrangeMultiplier

The lagrange multiplier.

Plan/var_penalty_max

The maximum planning penalty.

Plan/var_penalty_mean

The mean planning penalty.

Plan/var_penalty_min

The minimum planning penalty.

Return type:

None

_init_model()[source]#

Initialize the dynamics model and the planner.

CAP uses following models: :rtype: None

  • dynamics model: to predict the next state and the cost.

  • lagrange multiplier: to trade off between the cost and the reward.

  • planner: to generate the action.

_save_model()[source]#

Save the model.

Return type:

None

_update_epoch()[source]#

Update function per epoch.

Return type:

None

CCEPETS#

Documentation

class omnisafe.algorithms.model_based.CCEPETS(env_id, cfgs)[source]#

The Constrained Cross-Entropy (CCE) algorithm implementation based on PETS.

References

  • Title: Constrained Cross-Entropy Method for Safe Reinforcement Learning

  • Authors: Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess,

    Tom Erez, Yuval Tassa, David Silver, Daan Wierstra.

  • URL: CCE

Initialize an instance of algorithm.

_init_log()[source]#

Initialize the logger keys for the CCE algorithm.

Things to log

Description

Plan/feasible_num

The number of feasible plans.

Plan/episode_costs_max

The maximum planning cost.

Plan/episode_costs_mean

The mean planning cost.

Plan/episode_costs_min

The minimum planning cost.

Return type:

None

_init_model()[source]#

Initialize the dynamics model and the planner.

CCEPETS uses following models: :rtype: None

  • dynamics model: to predict the next state and the cost.

  • planner: to generate the action.

RCEPETS#

Documentation

class omnisafe.algorithms.model_based.RCEPETS(env_id, cfgs)[source]#

The Robust Cross Entropy (RCE) algorithm implementation based on PETS.

References

  • Title: Constrained Model-based Reinforcement Learning with Robust Cross-Entropy Method

  • Authors: Zuxin Liu, Hongyi Zhou, Baiming Chen, Sicheng Zhong, Martial Hebert, Ding Zhao.

  • URL: RCE

Initialize an instance of algorithm.

_init_log()[source]#

Initialize the logger.

Things to log

Description

Plan/feasible_num

The number of feasible plans.

Plan/episode_costs_max

The maximum planning cost.

Plan/episode_costs_mean

The mean planning cost.

Plan/episode_costs_min

The minimum planning cost.

Metrics/LagrangeMultiplier

The lagrange multiplier.

Return type:

None

_init_model()[source]#

Initialize the dynamics model and the planner.

RCEPETS uses following models: :rtype: None

  • dynamics model: to predict the next state and the cost.

  • planner: to generate the action.

Safe LOOP#

Documentation

class omnisafe.algorithms.model_based.SafeLOOP(env_id, cfgs)[source]#

The Safe Learning Off-Policy with Online Planning (SafeLOOP) algorithm.

References

  • Title: Learning Off-Policy with Online Planning

  • Authors: Harshit Sikchi, Wenxuan Zhou, David Held.

  • URL: SafeLOOP

Initialize an instance of algorithm.

_init_log()[source]#

Initialize the logger keys for the algorithm.

Things to log

Description

Plan/feasible_num

The number of feasible plans.

Plan/episode_costs_max

The maximum planning cost.

Plan/episode_costs_mean

The mean planning cost.

Plan/episode_costs_min

The minimum planning cost.

Return type:

None

_init_model()[source]#

Initialize the dynamics model and the planner.

SafeLOOP uses following models: :rtype: None

  • dynamics model: to predict the next state and the cost.

  • planner: to generate the action.