OmniSafe Adapter#
OmniSafe provides a set of adapters to adapt the environment to the framework.
|
Online Adapter for OmniSafe. |
|
OnPolicy Adapter for OmniSafe. |
|
OffPolicy Adapter for OmniSafe. |
|
Saute Adapter for OmniSafe. |
|
Simmer Adapter for OmniSafe. |
|
Model Based Adapter for OmniSafe. |
Online Adapter#
Documentation
- class omnisafe.adapter.OnlineAdapter(env_id, num_envs, seed, cfgs)[source]#
Online Adapter for OmniSafe.
OmniSafe is a framework for safe reinforcement learning. It is designed to be compatible with any existing RL algorithms. The online adapter is used to adapt the environment to the framework.
- Parameters:
env_id (str) – The environment id.
num_envs (int) – The number of parallel environments.
seed (int) – The random seed.
cfgs (Config) – The configuration.
Initialize an instance of
OnlineAdapter
.- _wrapper(obs_normalize=True, reward_normalize=True, cost_normalize=True)[source]#
Wrapper the environment.
Hint
OmniSafe supports the following wrappers:
Wrapper
Description
TimeLimit
Limit the time steps of the environment.
AutoReset
Reset the environment when the episode is done.
ObsNormalize
Normalize the observation.
RewardNormalize
Normalize the reward.
CostNormalize
Normalize the cost.
ActionScale
Scale the action.
Unsqueeze
Unsqueeze the step result for single environment case.
- Parameters:
obs_normalize (bool, optional) – Whether to normalize the observation. Defaults to True.
reward_normalize (bool, optional) – Whether to normalize the reward. Defaults to True.
cost_normalize (bool, optional) – Whether to normalize the cost. Defaults to True.
- Return type:
None
- property action_space: Box | Discrete#
The action space of the environment.
- property env_spec_keys: list[str]#
Return the environment specification log.
- property observation_space: Box | Discrete#
The observation space of the environment.
- reset(seed=None, options=None)[source]#
Reset the environment and returns an initial observation.
- Parameters:
seed (int, optional) – The random seed. Defaults to None.
options (dict[str, Any], optional) – The options for the environment. Defaults to None.
- Returns:
observation – The initial observation of the space.
info – Some information logged by the environment.
- Return type:
tuple[torch.Tensor, dict[str, Any]]
- save()[source]#
Save the important components of the environment. :rtype:
dict
[str
,Module
]Note
The saved components will be stored in the wrapped environment. If the environment is not wrapped, the saved components will be an empty dict. common wrappers are
obs_normalize
,reward_normalize
, andcost_normalize
.- Returns:
The saved components of environment, e.g., ``obs_normalizer``.
- step(action)[source]#
Run one timestep of the environment’s dynamics using the agent actions.
- Parameters:
action (torch.Tensor) – The action from the agent or random.
- Returns:
observation – The agent’s observation of the current environment.
reward – The amount of reward returned after previous action.
cost – The amount of cost returned after previous action.
terminated – Whether the episode has ended.
truncated – Whether the episode has been truncated due to a time limit.
info – Some information logged by the environment.
- Return type:
tuple
[Tensor
,Tensor
,Tensor
,Tensor
,Tensor
,dict
[str
,Any
]]
Offline Adapter#
Documentation
- class omnisafe.adapter.OfflineAdapter(env_id, seed, cfgs)[source]#
Offline Adapter for OmniSafe.
OfflineAdapter
is used to adapt the environment to the offline training.Note
Technically, Offline training doesn’t need env to interact with the agent. However, to visualize the performance of the agent when training, we still need instantiate a environment to evaluate the agent. OfflineAdapter provide an important interface
evaluate
to test the agent.- Parameters:
env_id (str) – The environment id.
seed (int) – The random seed.
cfgs (Config) – The configuration.
Initialize a instance of
OfflineAdapter
.- property action_space: Box | Discrete#
The action space of the environment.
- property observation_space: Box | Discrete#
The observation space of the environment.
- reset(seed=None, options=None)[source]#
Reset the environment and returns an initial observation.
- Parameters:
seed (int, optional) – The random seed. Defaults to None.
options (dict[str, Any], optional) – The options for the environment. Defaults to None.
- Returns:
observation – The initial observation of the space.
info – Some information logged by the environment.
- Return type:
tuple[torch.Tensor, dict[str, Any]]
- step(actions)[source]#
Run one timestep of the environment’s dynamics using the agent actions.
- Parameters:
action (torch.Tensor) – The action from the agent or random.
- Returns:
observation – The agent’s observation of the current environment.
reward – The amount of reward returned after previous action.
cost – The amount of cost returned after previous action.
terminated – Whether the episode has ended.
truncated – Whether the episode has been truncated due to a time limit.
info – Some information logged by the environment.
- Return type:
tuple
[Tensor
,Tensor
,Tensor
,Tensor
,Tensor
,dict
]
On Policy Adapter#
Documentation
- class omnisafe.adapter.OnPolicyAdapter(env_id, num_envs, seed, cfgs)[source]#
OnPolicy Adapter for OmniSafe.
OnPolicyAdapter
is used to adapt the environment to the on-policy training.- Parameters:
env_id (str) – The environment id.
num_envs (int) – The number of environments.
seed (int) – The random seed.
cfgs (Config) – The configuration.
Initialize an instance of
OnPolicyAdapter
.- _log_metrics(logger, idx)[source]#
Log metrics, including
EpRet
,EpCost
,EpLen
.- Parameters:
logger (Logger) – Logger, to log
EpRet
,EpCost
,EpLen
.idx (int) – The index of the environment.
- Return type:
None
- _log_value(reward, cost, info)[source]#
Log value.
Note
OmniSafe uses
RewardNormalizer
wrapper, so the original reward and cost will be stored ininfo['original_reward']
andinfo['original_cost']
.- Parameters:
reward (torch.Tensor) – The immediate step reward.
cost (torch.Tensor) – The immediate step cost.
info (dict[str, Any]) – Some information logged by the environment.
- Return type:
None
- _reset_log(idx=None)[source]#
Reset the episode return, episode cost and episode length.
- Parameters:
idx (int or None, optional) – The index of the environment. Defaults to None (single environment).
- Return type:
None
- rollout(steps_per_epoch, agent, buffer, logger)[source]#
Rollout the environment and store the data in the buffer.
Warning
As OmniSafe uses
AutoReset
wrapper, the environment will be reset automatically, so the final observation will be stored ininfo['final_observation']
.- Parameters:
steps_per_epoch (int) – Number of steps per epoch.
agent (ConstraintActorCritic) – Constraint actor-critic, including actor , reward critic and cost critic.
buffer (VectorOnPolicyBuffer) – Vector on-policy buffer.
logger (Logger) – Logger, to log
EpRet
,EpCost
,EpLen
.
- Return type:
None
Off Policy Adapter#
Documentation
- class omnisafe.adapter.OffPolicyAdapter(env_id, num_envs, seed, cfgs)[source]#
OffPolicy Adapter for OmniSafe.
OffPolicyAdapter
is used to adapt the environment to the off-policy training.Note
Off-policy training need to update the policy before finish the episode, so the
OffPolicyAdapter
will store the current observation in_current_obs
. After update the policy, the agent will remember the current observation and use it to interact with the environment.- Parameters:
env_id (str) – The environment id.
num_envs (int) – The number of environments.
seed (int) – The random seed.
cfgs (Config) – The configuration.
Initialize a instance of
OffPolicyAdapter
.- _log_metrics(logger, idx)[source]#
Log metrics, including
EpRet
,EpCost
,EpLen
.- Parameters:
logger (Logger) – Logger, to log
EpRet
,EpCost
,EpLen
.idx (int) – The index of the environment.
- Return type:
None
- _log_value(reward, cost, info)[source]#
Log value.
Note
OmniSafe uses
RewardNormalizer
wrapper, so the original reward and cost will be stored ininfo['original_reward']
andinfo['original_cost']
.- Parameters:
reward (torch.Tensor) – The immediate step reward.
cost (torch.Tensor) – The immediate step cost.
info (dict[str, Any]) – Some information logged by the environment.
- Return type:
None
- _reset_log(idx=None)[source]#
Reset the episode return, episode cost and episode length.
- Parameters:
idx (int or None, optional) – The index of the environment. Defaults to None (single environment).
- Return type:
None
- eval_policy(episode, agent, logger)[source]#
Rollout the environment with deterministic agent action.
- Parameters:
episode (int) – Number of episodes.
agent (ConstraintActorCritic) – Agent.
logger (Logger) – Logger, to log
EpRet
,EpCost
,EpLen
.
- Return type:
None
- rollout(rollout_step, agent, buffer, logger, use_rand_action)[source]#
Rollout the environment and store the data in the buffer.
Warning
As OmniSafe uses
AutoReset
wrapper, the environment will be reset automatically, so the final observation will be stored ininfo['final_observation']
.- Parameters:
rollout_step (int) – Number of rollout steps.
agent (ConstraintActorCritic) – Constraint actor-critic, including actor, reward critic, and cost critic.
buffer (VectorOnPolicyBuffer) – Vector on-policy buffer.
logger (Logger) – Logger, to log
EpRet
,EpCost
,EpLen
.use_rand_action (bool) – Whether to use random action.
- Return type:
None
Saute Adapter#
Documentation
- class omnisafe.adapter.SauteAdapter(env_id, num_envs, seed, cfgs)[source]#
Saute Adapter for OmniSafe.
Saute is a safe RL algorithm that uses state augmentation to ensure safety. The state augmentation is the concatenation of the original state and the safety state. The safety state is the safety budget minus the cost divided by the safety budget.
Note
If the safety state is greater than 0, the reward is the original reward.
If the safety state is less than 0, the reward is the unsafe reward (always 0 or less than 0).
OmniSafe provides two implementations of Saute RL:
PPOSaute
andTRPOSaute
.References
Title: Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation
- Authors: Aivar Sootla, Alexander I. Cowen-Rivers, Taher Jafferjee, Ziyan Wang,
David Mguni, Jun Wang, Haitham Bou-Ammar.
URL: Saute
- Parameters:
env_id (str) – The environment id.
num_envs (int) – The number of parallel environments.
seed (int) – The random seed.
cfgs (Config) – The configuration passed from yaml file.
Initialize an instance of
SauteAdapter
.- _augment_obs(obs)[source]#
Augmenting the obs with the safety obs.
The augmented obs is the concatenation of the original obs and the safety obs. The safety obs is the safety budget minus the cost divided by the safety budget.
- Parameters:
obs (torch.Tensor) – The original observation.
- Returns:
The augmented observation.
- Return type:
Tensor
- _log_metrics(logger, idx)[source]#
Log metrics, including
EpRet
,EpCost
,EpLen
andEpBudget
.- Parameters:
logger (Logger) – Logger, to log
EpRet
,EpCost
,EpLen
.idx (int) – The index of the environment.
- Return type:
None
- _log_value(reward, cost, info)[source]#
Log value.
Note
Additionally, the safety observation will be updated and logged.
- Parameters:
reward (torch.Tensor) – The immediate step reward.
cost (torch.Tensor) – The immediate step cost.
info (dict[str, Any]) – Some information logged by the environment.
- Return type:
None
- _reset_log(idx=None)[source]#
Reset the episode return, episode cost, episode length and episode budget.
- Parameters:
idx (int or None, optional) – The index of the environment. Defaults to None (single environment).
- Return type:
None
- _safety_reward(reward)[source]#
Update the reward with the safety observation.
Note
If the safety observation is greater than 0, the reward will be the original reward. Otherwise, the reward will be the unsafe reward.
- Parameters:
reward (torch.Tensor) – The reward of the current step.
- Returns:
The final reward determined by the safety observation.
- Return type:
Tensor
- _safety_step(cost)[source]#
Update the safety observation.
- Parameters:
cost (torch.Tensor) – The cost of the current step.
- Return type:
None
- _wrapper(obs_normalize=True, reward_normalize=False, cost_normalize=False)[source]#
Wrapper the environment.
Warning
The reward or cost normalization is not supported in Saute Adapter.
- Parameters:
obs_normalize (bool, optional) – Whether to normalize the observation. Defaults to True.
reward_normalize (bool, optional) – Whether to normalize the reward. Defaults to True.
cost_normalize (bool, optional) – Whether to normalize the cost. Defaults to True.
- Return type:
None
- property observation_space: Box#
The observation space of the environment.
- reset(seed=None, options=None)[source]#
Reset the environment and returns an initial observation.
Note
Additionally, the safety observation will be reset.
- Parameters:
seed (int, optional) – The random seed. Defaults to None.
options (dict[str, Any], optional) – The options for the environment. Defaults to None.
- Returns:
observation – The initial observation of the space.
info – Some information logged by the environment.
- Return type:
tuple[torch.Tensor, dict[str, Any]]
- step(action)[source]#
Run one timestep of the environment’s dynamics using the agent actions.
Note
The
_saute_step()
will be called to update the safety observation. Then the reward will be updated by_safety_reward()
.- Parameters:
action (torch.Tensor) – The action from the agent or random.
- Returns:
observation – The agent’s observation of the current environment.
reward – The amount of reward returned after previous action.
cost – The amount of cost returned after previous action.
terminated – Whether the episode has ended.
truncated – Whether the episode has been truncated due to a time limit.
info – Some information logged by the environment.
- Return type:
tuple
[Tensor
,Tensor
,Tensor
,Tensor
,Tensor
,dict
[str
,Any
]]
Simmer Adapter#
Documentation
- class omnisafe.adapter.SimmerAdapter(env_id, num_envs, seed, cfgs)[source]#
Simmer Adapter for OmniSafe.
Simmer is a safe RL algorithm that uses a safety budget to control the exploration of the RL agent. Similar to
SauteEnvWrapper
, Simmer uses state augmentation to ensure safety. Additionally, Simmer uses controller to control the safety budget.Note
If the safety state is greater than 0, the reward is the original reward.
If the safety state is less than 0, the reward is the unsafe reward (always 0 or less than 0).
OmniSafe provides two implementations of Simmer RL:
PPOSimmer
andTRPOSimmer
.References
Title: Effects of Safety State Augmentation on Safe Exploration.
- Authors: Aivar Sootla, Alexander I. Cowen-Rivers, Taher Jafferjee, Ziyan Wang,
David Mguni, Jun Wang, Haitham Bou-Ammar.
URL: Simmer
- Parameters:
env_id (str) – The environment id.
num_envs (int) – The number of parallel environments.
seed (int) – The random seed.
cfgs (Config) – The configuration passed from yaml file.
Initialize an instance of
SimmerAdapter
.- control_budget(ep_costs)[source]#
Control the safety budget.
- Parameters:
ep_costs (torch.Tensor) – The episode costs.
- Return type:
None
- reset(seed=None, options=None)[source]#
Reset the environment and returns an initial observation.
Note
Additionally, the safety observation will be reset. And the safety budget will be reset to the value of current
rel_safety_budget
.- Parameters:
seed (int, optional) – The random seed. Defaults to None.
options (dict[str, Any], optional) – The options for the environment. Defaults to None.
- Returns:
observation – The initial observation of the space.
info – Some information logged by the environment.
- Return type:
tuple[torch.Tensor, dict[str, Any]]
Model-based Adapter#
Documentation
- class omnisafe.adapter.ModelBasedAdapter(env_id, num_envs, seed, cfgs, **env_kwargs)[source]#
Model Based Adapter for OmniSafe.
ModelBasedAdapter
is used to adapt the environment to the model-based training. It trains a world model to provide data for algorithms training.- Parameters:
env_id (str) – The environment id.
num_envs (int) – The number of environments.
seed (int) – The random seed.
cfgs (Config) – The configuration.
- Keyword Arguments:
render_mode (str, optional) – The render mode ranges from ‘human’ to ‘rgb_array’ and ‘rgb_array_list’. Defaults to ‘rgb_array’.
camera_name (str, optional) – The camera name.
camera_id (int, optional) – The camera id.
width (int, optional) – The width of the rendered image. Defaults to 256.
height (int, optional) – The height of the rendered image. Defaults to 256.
- Variables:
coordinate_observation_space (OmnisafeSpace) – The coordinate observation space.
lidar_observation_space (OmnisafeSpace) – The lidar observation space.
task (str) – The task. eg. The task of SafetyPointGoal-v0 is ‘goal’
Initialize the model-based adapter.
- _log_metrics(logger)[source]#
Log metrics.
- Parameters:
logger (Logger) – Logger, to log
EpRet
,EpCost
,EpLen
.- Return type:
None
- _log_value(reward, cost, info)[source]#
Log value.
Note
OmniSafe uses
RewardNormalizer
wrapper, so the original reward and cost will be stored ininfo['original_reward']
andinfo['original_cost']
.- Parameters:
reward (torch.Tensor) – The immediate step reward.
cost (torch.Tensor) – The immediate step cost.
info (dict[str, Any]) – Some information logged by the environment.
- Return type:
None
- _wrapper(obs_normalize=True, reward_normalize=True, cost_normalize=True, action_repeat=1)[source]#
Wrapper the environment.
Hint
OmniSafe supports the following wrappers:
Wrapper
Description
TimeLimit
Limit the time steps of the environment.
AutoReset
Reset the environment when the episode is done.
ObsNormalize
Normalize the observation.
RewardNormalize
Normalize the reward.
CostNormalize
Normalize the cost.
ActionScale
Scale the action.
ActionRepeat
Repeat the action.
Unsqueeze
Unsqueeze the step result for single environment case.
- Parameters:
obs_normalize (bool) – Whether to normalize the observation.
reward_normalize (bool) – Whether to normalize the reward.
cost_normalize (bool) – Whether to normalize the cost.
action_repeat (int) – The action repeat times.
- Return type:
None
- get_cost_from_obs_tensor(obs)[source]#
Get cost from tensor observation.
- Parameters:
obs (torch.Tensor) – The tensor version of observation.
- Return type:
Tensor
- get_lidar_from_coordinate(obs)[source]#
Get lidar from numpy coordinate.
- Parameters:
obs (np.ndarray) – The observation.
- Return type:
torch.Tensor | None
- render(*args, **kwargs)[source]#
Render the environment.
- Parameters:
args (str) – The arguments.
- Keyword Arguments:
render_mode (str, optional) – The render mode, ranging from
human
,rgb_array
,rgb_array_list
. Defaults torgb_array
.camera_name (str, optional) – The camera name.
camera_id (int, optional) – The camera id.
width (int, optional) – The width of the rendered image. Defaults to 256.
height (int, optional) – The height of the rendered image. Defaults to 256.
- Return type:
Any
- rollout(current_step, rollout_step, use_actor_critic, act_func, store_data_func, update_dynamics_func, logger, use_eval, eval_func, algo_reset_func, update_actor_func)[source]#
Roll out the environment and store the data in the buffer.
- Parameters:
current_step (int) – Current training step.
rollout_step (int) – Number of steps to roll out.
use_actor_critic (bool) – Whether to use actor-critic.
act_func (Callable[[int, torch.Tensor], torch.Tensor]) – Function to get action.
store_data_func (Callable[[torch.Tensor, ..., dict[str, Any], ], None,]) – Function to store data.
update_dynamics_func (Callable[[], None]) – Function to update dynamics.
logger (Logger) – Logger, to log
EpRet
,EpCost
,EpLen
.use_eval (bool) – Whether to use evaluation.
eval_func (Callable[[int, bool], None]) – Function to evaluate the agent.
algo_reset_func (Callable[[], None]) – Function to reset the algorithm.
update_actor_func (Callable[[int], None]) – Function to update the actor.
- Return type:
int