OmniSafe Adapter#

OmniSafe provides a set of adapters to adapt the environment to the framework.

OnlineAdapter(env_id, num_envs, seed, cfgs)

Online Adapter for OmniSafe.

OnPolicyAdapter(env_id, num_envs, seed, cfgs)

OnPolicy Adapter for OmniSafe.

OffPolicyAdapter(env_id, num_envs, seed, cfgs)

OffPolicy Adapter for OmniSafe.

SauteAdapter(env_id, num_envs, seed, cfgs)

Saute Adapter for OmniSafe.

SimmerAdapter(env_id, num_envs, seed, cfgs)

Simmer Adapter for OmniSafe.

ModelBasedAdapter(env_id, num_envs, seed, ...)

Model Based Adapter for OmniSafe.

Online Adapter#

Documentation

class omnisafe.adapter.OnlineAdapter(env_id, num_envs, seed, cfgs)[source]#

Online Adapter for OmniSafe.

OmniSafe is a framework for safe reinforcement learning. It is designed to be compatible with any existing RL algorithms. The online adapter is used to adapt the environment to the framework.

Parameters:
  • env_id (str) – The environment id.

  • num_envs (int) – The number of parallel environments.

  • seed (int) – The random seed.

  • cfgs (Config) – The configuration.

Initialize an instance of OnlineAdapter.

_wrapper(obs_normalize=True, reward_normalize=True, cost_normalize=True)[source]#

Wrapper the environment.

Hint

OmniSafe supports the following wrappers:

Wrapper

Description

TimeLimit

Limit the time steps of the environment.

AutoReset

Reset the environment when the episode is done.

ObsNormalize

Normalize the observation.

RewardNormalize

Normalize the reward.

CostNormalize

Normalize the cost.

ActionScale

Scale the action.

Unsqueeze

Unsqueeze the step result for single environment case.

Parameters:
  • obs_normalize (bool, optional) – Whether to normalize the observation. Defaults to True.

  • reward_normalize (bool, optional) – Whether to normalize the reward. Defaults to True.

  • cost_normalize (bool, optional) – Whether to normalize the cost. Defaults to True.

Return type:

None

property action_space: Box | Discrete#

The action space of the environment.

close()[source]#

Close the environment after training.

Return type:

None

property env_spec_keys: list[str]#

Return the environment specification log.

property observation_space: Box | Discrete#

The observation space of the environment.

reset(seed=None, options=None)[source]#

Reset the environment and returns an initial observation.

Parameters:
  • seed (int, optional) – The random seed. Defaults to None.

  • options (dict[str, Any], optional) – The options for the environment. Defaults to None.

Returns:
  • observation – The initial observation of the space.

  • info – Some information logged by the environment.

Return type:

tuple[torch.Tensor, dict[str, Any]]

save()[source]#

Save the important components of the environment. :rtype: dict[str, Module]

Note

The saved components will be stored in the wrapped environment. If the environment is not wrapped, the saved components will be an empty dict. common wrappers are obs_normalize, reward_normalize, and cost_normalize.

Returns:

The saved components of environment, e.g., ``obs_normalizer``.

step(action)[source]#

Run one timestep of the environment’s dynamics using the agent actions.

Parameters:

action (torch.Tensor) – The action from the agent or random.

Returns:
  • observation – The agent’s observation of the current environment.

  • reward – The amount of reward returned after previous action.

  • cost – The amount of cost returned after previous action.

  • terminated – Whether the episode has ended.

  • truncated – Whether the episode has been truncated due to a time limit.

  • info – Some information logged by the environment.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor, dict[str, Any]]

Offline Adapter#

Documentation

class omnisafe.adapter.OfflineAdapter(env_id, seed, cfgs)[source]#

Offline Adapter for OmniSafe.

OfflineAdapter is used to adapt the environment to the offline training.

Note

Technically, Offline training doesn’t need env to interact with the agent. However, to visualize the performance of the agent when training, we still need instantiate a environment to evaluate the agent. OfflineAdapter provide an important interface evaluate to test the agent.

Parameters:
  • env_id (str) – The environment id.

  • seed (int) – The random seed.

  • cfgs (Config) – The configuration.

Initialize a instance of OfflineAdapter.

property action_space: Box | Discrete#

The action space of the environment.

evaluate(evaluate_epoisodes, agent, logger)[source]#

Evaluate the agent in the environment.

Parameters:
  • evaluate_epoisodes (int) – the number of episodes for evaluation.

  • agent (Actor) – the agent to be evaluated.

  • logger (Logger) – the logger for logging the evaluation results.

Return type:

None

property observation_space: Box | Discrete#

The observation space of the environment.

reset(seed=None, options=None)[source]#

Reset the environment and returns an initial observation.

Parameters:
  • seed (int, optional) – The random seed. Defaults to None.

  • options (dict[str, Any], optional) – The options for the environment. Defaults to None.

Returns:
  • observation – The initial observation of the space.

  • info – Some information logged by the environment.

Return type:

tuple[torch.Tensor, dict[str, Any]]

step(actions)[source]#

Run one timestep of the environment’s dynamics using the agent actions.

Parameters:

action (torch.Tensor) – The action from the agent or random.

Returns:
  • observation – The agent’s observation of the current environment.

  • reward – The amount of reward returned after previous action.

  • cost – The amount of cost returned after previous action.

  • terminated – Whether the episode has ended.

  • truncated – Whether the episode has been truncated due to a time limit.

  • info – Some information logged by the environment.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor, dict]

On Policy Adapter#

Documentation

class omnisafe.adapter.OnPolicyAdapter(env_id, num_envs, seed, cfgs)[source]#

OnPolicy Adapter for OmniSafe.

OnPolicyAdapter is used to adapt the environment to the on-policy training.

Parameters:
  • env_id (str) – The environment id.

  • num_envs (int) – The number of environments.

  • seed (int) – The random seed.

  • cfgs (Config) – The configuration.

Initialize an instance of OnPolicyAdapter.

_log_metrics(logger, idx)[source]#

Log metrics, including EpRet, EpCost, EpLen.

Parameters:
  • logger (Logger) – Logger, to log EpRet, EpCost, EpLen.

  • idx (int) – The index of the environment.

Return type:

None

_log_value(reward, cost, info)[source]#

Log value.

Note

OmniSafe uses RewardNormalizer wrapper, so the original reward and cost will be stored in info['original_reward'] and info['original_cost'].

Parameters:
  • reward (torch.Tensor) – The immediate step reward.

  • cost (torch.Tensor) – The immediate step cost.

  • info (dict[str, Any]) – Some information logged by the environment.

Return type:

None

_reset_log(idx=None)[source]#

Reset the episode return, episode cost and episode length.

Parameters:

idx (int or None, optional) – The index of the environment. Defaults to None (single environment).

Return type:

None

rollout(steps_per_epoch, agent, buffer, logger)[source]#

Rollout the environment and store the data in the buffer.

Warning

As OmniSafe uses AutoReset wrapper, the environment will be reset automatically, so the final observation will be stored in info['final_observation'].

Parameters:
  • steps_per_epoch (int) – Number of steps per epoch.

  • agent (ConstraintActorCritic) – Constraint actor-critic, including actor , reward critic and cost critic.

  • buffer (VectorOnPolicyBuffer) – Vector on-policy buffer.

  • logger (Logger) – Logger, to log EpRet, EpCost, EpLen.

Return type:

None

Off Policy Adapter#

Documentation

class omnisafe.adapter.OffPolicyAdapter(env_id, num_envs, seed, cfgs)[source]#

OffPolicy Adapter for OmniSafe.

OffPolicyAdapter is used to adapt the environment to the off-policy training.

Note

Off-policy training need to update the policy before finish the episode, so the OffPolicyAdapter will store the current observation in _current_obs. After update the policy, the agent will remember the current observation and use it to interact with the environment.

Parameters:
  • env_id (str) – The environment id.

  • num_envs (int) – The number of environments.

  • seed (int) – The random seed.

  • cfgs (Config) – The configuration.

Initialize a instance of OffPolicyAdapter.

_log_metrics(logger, idx)[source]#

Log metrics, including EpRet, EpCost, EpLen.

Parameters:
  • logger (Logger) – Logger, to log EpRet, EpCost, EpLen.

  • idx (int) – The index of the environment.

Return type:

None

_log_value(reward, cost, info)[source]#

Log value.

Note

OmniSafe uses RewardNormalizer wrapper, so the original reward and cost will be stored in info['original_reward'] and info['original_cost'].

Parameters:
  • reward (torch.Tensor) – The immediate step reward.

  • cost (torch.Tensor) – The immediate step cost.

  • info (dict[str, Any]) – Some information logged by the environment.

Return type:

None

_reset_log(idx=None)[source]#

Reset the episode return, episode cost and episode length.

Parameters:

idx (int or None, optional) – The index of the environment. Defaults to None (single environment).

Return type:

None

eval_policy(episode, agent, logger)[source]#

Rollout the environment with deterministic agent action.

Parameters:
  • episode (int) – Number of episodes.

  • agent (ConstraintActorCritic) – Agent.

  • logger (Logger) – Logger, to log EpRet, EpCost, EpLen.

Return type:

None

rollout(rollout_step, agent, buffer, logger, use_rand_action)[source]#

Rollout the environment and store the data in the buffer.

Warning

As OmniSafe uses AutoReset wrapper, the environment will be reset automatically, so the final observation will be stored in info['final_observation'].

Parameters:
  • rollout_step (int) – Number of rollout steps.

  • agent (ConstraintActorCritic) – Constraint actor-critic, including actor, reward critic, and cost critic.

  • buffer (VectorOnPolicyBuffer) – Vector on-policy buffer.

  • logger (Logger) – Logger, to log EpRet, EpCost, EpLen.

  • use_rand_action (bool) – Whether to use random action.

Return type:

None

Saute Adapter#

Documentation

class omnisafe.adapter.SauteAdapter(env_id, num_envs, seed, cfgs)[source]#

Saute Adapter for OmniSafe.

Saute is a safe RL algorithm that uses state augmentation to ensure safety. The state augmentation is the concatenation of the original state and the safety state. The safety state is the safety budget minus the cost divided by the safety budget.

Note

  • If the safety state is greater than 0, the reward is the original reward.

  • If the safety state is less than 0, the reward is the unsafe reward (always 0 or less than 0).

OmniSafe provides two implementations of Saute RL: PPOSaute and TRPOSaute.

References

  • Title: Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation

  • Authors: Aivar Sootla, Alexander I. Cowen-Rivers, Taher Jafferjee, Ziyan Wang,

    David Mguni, Jun Wang, Haitham Bou-Ammar.

  • URL: Saute

Parameters:
  • env_id (str) – The environment id.

  • num_envs (int) – The number of parallel environments.

  • seed (int) – The random seed.

  • cfgs (Config) – The configuration passed from yaml file.

Initialize an instance of SauteAdapter.

_augment_obs(obs)[source]#

Augmenting the obs with the safety obs.

The augmented obs is the concatenation of the original obs and the safety obs. The safety obs is the safety budget minus the cost divided by the safety budget.

Parameters:

obs (torch.Tensor) – The original observation.

Returns:

The augmented observation.

Return type:

Tensor

_log_metrics(logger, idx)[source]#

Log metrics, including EpRet, EpCost, EpLen and EpBudget.

Parameters:
  • logger (Logger) – Logger, to log EpRet, EpCost, EpLen.

  • idx (int) – The index of the environment.

Return type:

None

_log_value(reward, cost, info)[source]#

Log value.

Note

Additionally, the safety observation will be updated and logged.

Parameters:
  • reward (torch.Tensor) – The immediate step reward.

  • cost (torch.Tensor) – The immediate step cost.

  • info (dict[str, Any]) – Some information logged by the environment.

Return type:

None

_reset_log(idx=None)[source]#

Reset the episode return, episode cost, episode length and episode budget.

Parameters:

idx (int or None, optional) – The index of the environment. Defaults to None (single environment).

Return type:

None

_safety_reward(reward)[source]#

Update the reward with the safety observation.

Note

If the safety observation is greater than 0, the reward will be the original reward. Otherwise, the reward will be the unsafe reward.

Parameters:

reward (torch.Tensor) – The reward of the current step.

Returns:

The final reward determined by the safety observation.

Return type:

Tensor

_safety_step(cost)[source]#

Update the safety observation.

Parameters:

cost (torch.Tensor) – The cost of the current step.

Return type:

None

_wrapper(obs_normalize=True, reward_normalize=False, cost_normalize=False)[source]#

Wrapper the environment.

Warning

The reward or cost normalization is not supported in Saute Adapter.

Parameters:
  • obs_normalize (bool, optional) – Whether to normalize the observation. Defaults to True.

  • reward_normalize (bool, optional) – Whether to normalize the reward. Defaults to True.

  • cost_normalize (bool, optional) – Whether to normalize the cost. Defaults to True.

Return type:

None

property observation_space: Box#

The observation space of the environment.

reset(seed=None, options=None)[source]#

Reset the environment and returns an initial observation.

Note

Additionally, the safety observation will be reset.

Parameters:
  • seed (int, optional) – The random seed. Defaults to None.

  • options (dict[str, Any], optional) – The options for the environment. Defaults to None.

Returns:
  • observation – The initial observation of the space.

  • info – Some information logged by the environment.

Return type:

tuple[torch.Tensor, dict[str, Any]]

step(action)[source]#

Run one timestep of the environment’s dynamics using the agent actions.

Note

The _saute_step() will be called to update the safety observation. Then the reward will be updated by _safety_reward().

Parameters:

action (torch.Tensor) – The action from the agent or random.

Returns:
  • observation – The agent’s observation of the current environment.

  • reward – The amount of reward returned after previous action.

  • cost – The amount of cost returned after previous action.

  • terminated – Whether the episode has ended.

  • truncated – Whether the episode has been truncated due to a time limit.

  • info – Some information logged by the environment.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor, dict[str, Any]]

Simmer Adapter#

Documentation

class omnisafe.adapter.SimmerAdapter(env_id, num_envs, seed, cfgs)[source]#

Simmer Adapter for OmniSafe.

Simmer is a safe RL algorithm that uses a safety budget to control the exploration of the RL agent. Similar to SauteEnvWrapper, Simmer uses state augmentation to ensure safety. Additionally, Simmer uses controller to control the safety budget.

Note

  • If the safety state is greater than 0, the reward is the original reward.

  • If the safety state is less than 0, the reward is the unsafe reward (always 0 or less than 0).

OmniSafe provides two implementations of Simmer RL: PPOSimmer and TRPOSimmer.

References

  • Title: Effects of Safety State Augmentation on Safe Exploration.

  • Authors: Aivar Sootla, Alexander I. Cowen-Rivers, Taher Jafferjee, Ziyan Wang,

    David Mguni, Jun Wang, Haitham Bou-Ammar.

  • URL: Simmer

Parameters:
  • env_id (str) – The environment id.

  • num_envs (int) – The number of parallel environments.

  • seed (int) – The random seed.

  • cfgs (Config) – The configuration passed from yaml file.

Initialize an instance of SimmerAdapter.

control_budget(ep_costs)[source]#

Control the safety budget.

Parameters:

ep_costs (torch.Tensor) – The episode costs.

Return type:

None

reset(seed=None, options=None)[source]#

Reset the environment and returns an initial observation.

Note

Additionally, the safety observation will be reset. And the safety budget will be reset to the value of current rel_safety_budget.

Parameters:
  • seed (int, optional) – The random seed. Defaults to None.

  • options (dict[str, Any], optional) – The options for the environment. Defaults to None.

Returns:
  • observation – The initial observation of the space.

  • info – Some information logged by the environment.

Return type:

tuple[torch.Tensor, dict[str, Any]]

Model-based Adapter#

Documentation

class omnisafe.adapter.ModelBasedAdapter(env_id, num_envs, seed, cfgs, **env_kwargs)[source]#

Model Based Adapter for OmniSafe.

ModelBasedAdapter is used to adapt the environment to the model-based training. It trains a world model to provide data for algorithms training.

Parameters:
  • env_id (str) – The environment id.

  • num_envs (int) – The number of environments.

  • seed (int) – The random seed.

  • cfgs (Config) – The configuration.

Keyword Arguments:
  • render_mode (str, optional) – The render mode ranges from ‘human’ to ‘rgb_array’ and ‘rgb_array_list’. Defaults to ‘rgb_array’.

  • camera_name (str, optional) – The camera name.

  • camera_id (int, optional) – The camera id.

  • width (int, optional) – The width of the rendered image. Defaults to 256.

  • height (int, optional) – The height of the rendered image. Defaults to 256.

Variables:
  • coordinate_observation_space (OmnisafeSpace) – The coordinate observation space.

  • lidar_observation_space (OmnisafeSpace) – The lidar observation space.

  • task (str) – The task. eg. The task of SafetyPointGoal-v0 is ‘goal’

Initialize the model-based adapter.

_log_metrics(logger)[source]#

Log metrics.

Parameters:

logger (Logger) – Logger, to log EpRet, EpCost, EpLen.

Return type:

None

_log_value(reward, cost, info)[source]#

Log value.

Note

OmniSafe uses RewardNormalizer wrapper, so the original reward and cost will be stored in info['original_reward'] and info['original_cost'].

Parameters:
  • reward (torch.Tensor) – The immediate step reward.

  • cost (torch.Tensor) – The immediate step cost.

  • info (dict[str, Any]) – Some information logged by the environment.

Return type:

None

_reset_log()[source]#

Reset log.

Return type:

None

_wrapper(obs_normalize=True, reward_normalize=True, cost_normalize=True, action_repeat=1)[source]#

Wrapper the environment.

Hint

OmniSafe supports the following wrappers:

Wrapper

Description

TimeLimit

Limit the time steps of the environment.

AutoReset

Reset the environment when the episode is done.

ObsNormalize

Normalize the observation.

RewardNormalize

Normalize the reward.

CostNormalize

Normalize the cost.

ActionScale

Scale the action.

ActionRepeat

Repeat the action.

Unsqueeze

Unsqueeze the step result for single environment case.

Parameters:
  • obs_normalize (bool) – Whether to normalize the observation.

  • reward_normalize (bool) – Whether to normalize the reward.

  • cost_normalize (bool) – Whether to normalize the cost.

  • action_repeat (int) – The action repeat times.

Return type:

None

get_cost_from_obs_tensor(obs)[source]#

Get cost from tensor observation.

Parameters:

obs (torch.Tensor) – The tensor version of observation.

Return type:

Tensor

get_lidar_from_coordinate(obs)[source]#

Get lidar from numpy coordinate.

Parameters:

obs (np.ndarray) – The observation.

Return type:

torch.Tensor | None

render(*args, **kwargs)[source]#

Render the environment.

Parameters:

args (str) – The arguments.

Keyword Arguments:
  • render_mode (str, optional) – The render mode, ranging from human, rgb_array, rgb_array_list. Defaults to rgb_array.

  • camera_name (str, optional) – The camera name.

  • camera_id (int, optional) – The camera id.

  • width (int, optional) – The width of the rendered image. Defaults to 256.

  • height (int, optional) – The height of the rendered image. Defaults to 256.

Return type:

Any

rollout(current_step, rollout_step, use_actor_critic, act_func, store_data_func, update_dynamics_func, logger, use_eval, eval_func, algo_reset_func, update_actor_func)[source]#

Roll out the environment and store the data in the buffer.

Parameters:
  • current_step (int) – Current training step.

  • rollout_step (int) – Number of steps to roll out.

  • use_actor_critic (bool) – Whether to use actor-critic.

  • act_func (Callable[[int, torch.Tensor], torch.Tensor]) – Function to get action.

  • store_data_func (Callable[[torch.Tensor, ..., dict[str, Any], ], None,]) – Function to store data.

  • update_dynamics_func (Callable[[], None]) – Function to update dynamics.

  • logger (Logger) – Logger, to log EpRet, EpCost, EpLen.

  • use_eval (bool) – Whether to use evaluation.

  • eval_func (Callable[[int, bool], None]) – Function to evaluate the agent.

  • algo_reset_func (Callable[[], None]) – Function to reset the algorithm.

  • update_actor_func (Callable[[int], None]) – Function to update the actor.

Return type:

int