OmniSafe Wrapper#

TimeLimit(env, time_limit, device)

Time limit wrapper for the environment.

AutoReset(env, device)

Auto reset the environment when the episode is terminated.

ObsNormalize(env, device[, norm])

Normalize the observation.

RewardNormalize(env, device[, norm])

Normalize the reward.

CostNormalize(env, device[, norm])

Normalize the cost.

ActionScale(env, device, low, high)

Scale the action space to a given range.

Unsqueeze(env, device)

Unsqueeze the observation, reward, cost, terminated, truncated and info.

Time Limit Wrapper#

Documentation

class omnisafe.envs.wrapper.TimeLimit(env, time_limit, device)[source]#

Time limit wrapper for the environment.

Warning

The time limit wrapper only supports single environment.

Examples

>>> env = TimeLimit(env, time_limit=100)
Parameters:
  • env (CMDP) – The environment to wrap.

  • time_limit (int) – The time limit for each episode.

  • device (torch.device) – The torch device to use.

Variables:
  • _time_limit (int) – The time limit for each episode.

  • _time (int) – The current time step.

Initialize an instance of TimeLimit.

reset(seed=None, options=None)[source]#

Reset the environment.

Note

Additionally, the time step will be reset to 0.

Parameters:
  • seed (int, optional) – The random seed. Defaults to None.

  • options (dict[str, Any], optional) – The options for the environment. Defaults to None.

Returns:
  • observation – The initial observation of the space.

  • info – Some information logged by the environment.

Return type:

tuple[torch.Tensor, dict[str, Any]]

step(action)[source]#

Run one timestep of the environment’s dynamics using the agent actions.

Note

Additionally, the time step will be increased by 1.

Parameters:

action (torch.Tensor) – The action from the agent or random.

Returns:
  • observation – The agent’s observation of the current environment.

  • reward – The amount of reward returned after previous action.

  • cost – The amount of cost returned after previous action.

  • terminated – Whether the episode has ended.

  • truncated – Whether the episode has been truncated due to a time limit.

  • info – Some information logged by the environment.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor, dict[str, Any]]

Auto Reset Wrapper#

Documentation

class omnisafe.envs.wrapper.AutoReset(env, device)[source]#

Auto reset the environment when the episode is terminated.

Examples

>>> env = AutoReset(env)
Parameters:
  • env (CMDP) – The environment to wrap.

  • device (torch.device) – The torch device to use.

Initialize an instance of AutoReset.

step(action)[source]#

Run one timestep of the environment’s dynamics using the agent actions.

Note

If the episode is terminated, the environment will be reset. The obs will be the first observation of the new episode. And the true final observation will be stored in info['final_observation'].

Parameters:

action (torch.Tensor) – The action from the agent or random.

Returns:
  • observation – The agent’s observation of the current environment.

  • reward – The amount of reward returned after previous action.

  • cost – The amount of cost returned after previous action.

  • terminated – Whether the episode has ended.

  • truncated – Whether the episode has been truncated due to a time limit.

  • info – Some information logged by the environment.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor, dict[str, Any]]

Observation Normalization Wrapper#

Documentation

class omnisafe.envs.wrapper.ObsNormalize(env, device, norm=None)[source]#

Normalize the observation.

Examples

>>> env = ObsNormalize(env)
>>> norm = Normalizer(env.observation_space.shape)  # load saved normalizer
>>> env = ObsNormalize(env, norm)
Parameters:
  • env (CMDP) – The environment to wrap.

  • device (torch.device) – The torch device to use.

  • norm (Normalizer or None, optional) – The normalizer to use. Defaults to None.

Initialize an instance of ObsNormalize.

reset(seed=None, options=None)[source]#

Reset the environment and returns an initial observation.

Parameters:
  • seed (int, optional) – The random seed. Defaults to None.

  • options (dict[str, Any], optional) – The options for the environment. Defaults to None.

Returns:
  • observation – The initial observation of the space.

  • info – Some information logged by the environment.

Return type:

tuple[torch.Tensor, dict[str, Any]]

save()[source]#

Save the observation normalizer. :rtype: dict[str, Module]

Note

The saved components will be stored in the wrapped environment. If the environment is not wrapped, the saved components will be empty dict. common wrappers are obs_normalize, reward_normalize, and cost_normalize. When evaluating the saved model, the normalizer should be loaded.

Returns:

The saved components, that is the observation normalizer.

step(action)[source]#

Run one timestep of the environment’s dynamics using the agent actions.

Note

The observation and the info['final_observation'] will be normalized.

Parameters:

action (torch.Tensor) – The action from the agent or random.

Returns:
  • observation – The agent’s observation of the current environment.

  • reward – The amount of reward returned after previous action.

  • cost – The amount of cost returned after previous action.

  • terminated – Whether the episode has ended.

  • truncated – Whether the episode has been truncated due to a time limit.

  • info – Some information logged by the environment.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor, dict[str, Any]]

Reward Normalization Wrapper#

Documentation

class omnisafe.envs.wrapper.RewardNormalize(env, device, norm=None)[source]#

Normalize the reward.

Examples

>>> env = RewardNormalize(env)
>>> norm = Normalizer(()) # load saved normalizer
>>> env = RewardNormalize(env, norm)
Parameters:
  • env (CMDP) – The environment to wrap.

  • device (torch.device) – The torch device to use.

  • norm (Normalizer or None, optional) – The normalizer to use. Defaults to None.

Initialize an instance of RewardNormalize.

save()[source]#

Save the reward normalizer. :rtype: dict[str, Module]

Note

The saved components will be stored in the wrapped environment. If the environment is not wrapped, the saved components will be empty dict. common wrappers are obs_normalize, reward_normalize, and cost_normalize.

Returns:

The saved components, that is the reward normalizer.

step(action)[source]#

Run one timestep of the environment’s dynamics using the agent actions.

Note

The reward will be normalized for agent training. Then the original reward will be stored in info['original_reward'] for logging.

Parameters:

action (torch.Tensor) – The action from the agent or random.

Returns:
  • observation – The agent’s observation of the current environment.

  • reward – The amount of reward returned after previous action.

  • cost – The amount of cost returned after previous action.

  • terminated – Whether the episode has ended.

  • truncated – Whether the episode has been truncated due to a time limit.

  • info – Some information logged by the environment.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor, dict[str, Any]]

Cost Normalization Wrapper#

Documentation

class omnisafe.envs.wrapper.CostNormalize(env, device, norm=None)[source]#

Normalize the cost.

Examples

>>> env = CostNormalize(env)
>>> norm = Normalizer(()) # load saved normalizer
>>> env = CostNormalize(env, norm)
Parameters:
  • env (CMDP) – The environment to wrap.

  • device (torch.device) – The torch device to use.

  • norm (Normalizer or None, optional) – The normalizer to use. Defaults to None.

Initialize an instance of CostNormalize.

save()[source]#

Save the cost normalizer. :rtype: dict[str, Module]

Note

The saved components will be stored in the wrapped environment. If the environment is not wrapped, the saved components will be empty dict. common wrappers are obs_normalize, reward_normalize, and cost_normalize.

Returns:

The saved components, that is the cost normalizer.

step(action)[source]#

Run one timestep of the environment’s dynamics using the agent actions.

Note

The cost will be normalized for agent training. Then the original reward will be stored in info['original_cost'] for logging.

Parameters:

action (torch.Tensor) – The action from the agent or random.

Returns:
  • observation – The agent’s observation of the current environment.

  • reward – The amount of reward returned after previous action.

  • cost – The amount of cost returned after previous action.

  • terminated – Whether the episode has ended.

  • truncated – Whether the episode has been truncated due to a time limit.

  • info – Some information logged by the environment.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor, dict[str, Any]]

Action Scale#

Documentation

class omnisafe.envs.wrapper.ActionScale(env, device, low, high)[source]#

Scale the action space to a given range.

Examples

>>> env = ActionScale(env, low=-1, high=1)
>>> env.action_space
Box(-1.0, 1.0, (1,), float32)
Parameters:
  • env (CMDP) – The environment to wrap.

  • device (torch.device) – The device to use.

  • low (int or float) – The lower bound of the action space.

  • high (int or float) – The upper bound of the action space.

Initialize an instance of ActionScale.

step(action)[source]#

Run one timestep of the environment’s dynamics using the agent actions.

Note

The action will be scaled to the original range for agent training.

Parameters:

action (torch.Tensor) – The action from the agent or random.

Returns:
  • observation – The agent’s observation of the current environment.

  • reward – The amount of reward returned after previous action.

  • cost – The amount of cost returned after previous action.

  • terminated – Whether the episode has ended.

  • truncated – Whether the episode has been truncated due to a time limit.

  • info – Some information logged by the environment.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor, dict[str, Any]]

Action Repeat#

Documentation

class omnisafe.envs.wrapper.ActionRepeat(env, times, device)[source]#

Repeat action given times.

Example

>>> env = ActionRepeat(env, times=3)

Initialize the wrapper.

Parameters:
  • env (CMDP) – The environment to wrap.

  • times (int) – The number of times to repeat the action.

  • device (device) – The device to use.

step(action)[source]#

Run self._times timesteps of the environment’s dynamics using the agent actions.

Parameters:

action (torch.Tensor) – The action from the agent or random.

Returns:
  • observation – The agent’s observation of the current environment.

  • reward – The amount of reward returned after previous action.

  • cost – The amount of cost returned after previous action.

  • terminated – Whether the episode has ended.

  • truncated – Whether the episode has been truncated due to a time limit.

  • info – Some information logged by the environment.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor, dict[str, Any]]

Unsqueeze Wrapper#

Documentation

class omnisafe.envs.wrapper.Unsqueeze(env, device)[source]#

Unsqueeze the observation, reward, cost, terminated, truncated and info.

Examples

>>> env = Unsqueeze(env)

Initialize an instance of Unsqueeze.

reset(seed=None, options=None)[source]#

Reset the environment and returns a new observation.

Note

The vector information will be unsqueezed to (1, dim) for agent training.

Parameters:
  • seed (int, optional) – The random seed. Defaults to None.

  • options (dict[str, Any], optional) – The options for the environment. Defaults to None.

Returns:
  • observation – The initial observation of the space.

  • info – Some information logged by the environment.

Return type:

tuple[torch.Tensor, dict[str, Any]]

step(action)[source]#

Run one timestep of the environment’s dynamics using the agent actions.

Note

The vector information will be unsqueezed to (1, dim) for agent training.

Parameters:

action (torch.Tensor) – The action from the agent or random.

Returns:
  • observation – The agent’s observation of the current environment.

  • reward – The amount of reward returned after previous action.

  • cost – The amount of cost returned after previous action.

  • terminated – Whether the episode has ended.

  • truncated – Whether the episode has been truncated due to a time limit.

  • info – Some information logged by the environment.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor, dict[str, Any]]