OmniSafe Wrapper#
|
Time limit wrapper for the environment. |
|
Auto reset the environment when the episode is terminated. |
|
Normalize the observation. |
|
Normalize the reward. |
|
Normalize the cost. |
|
Scale the action space to a given range. |
|
Unsqueeze the observation, reward, cost, terminated, truncated and info. |
Time Limit Wrapper#
Documentation
- class omnisafe.envs.wrapper.TimeLimit(env, time_limit, device)[source]#
Time limit wrapper for the environment.
Warning
The time limit wrapper only supports single environment.
Examples
>>> env = TimeLimit(env, time_limit=100)
- Parameters:
env (CMDP) – The environment to wrap.
time_limit (int) – The time limit for each episode.
device (torch.device) – The torch device to use.
- Variables:
_time_limit (int) – The time limit for each episode.
_time (int) – The current time step.
Initialize an instance of
TimeLimit
.- reset(seed=None, options=None)[source]#
Reset the environment.
Note
Additionally, the time step will be reset to 0.
- Parameters:
seed (int, optional) – The random seed. Defaults to None.
options (dict[str, Any], optional) – The options for the environment. Defaults to None.
- Returns:
observation – The initial observation of the space.
info – Some information logged by the environment.
- Return type:
tuple[torch.Tensor, dict[str, Any]]
- step(action)[source]#
Run one timestep of the environment’s dynamics using the agent actions.
Note
Additionally, the time step will be increased by 1.
- Parameters:
action (torch.Tensor) – The action from the agent or random.
- Returns:
observation – The agent’s observation of the current environment.
reward – The amount of reward returned after previous action.
cost – The amount of cost returned after previous action.
terminated – Whether the episode has ended.
truncated – Whether the episode has been truncated due to a time limit.
info – Some information logged by the environment.
- Return type:
tuple
[Tensor
,Tensor
,Tensor
,Tensor
,Tensor
,dict
[str
,Any
]]
Auto Reset Wrapper#
Documentation
- class omnisafe.envs.wrapper.AutoReset(env, device)[source]#
Auto reset the environment when the episode is terminated.
Examples
>>> env = AutoReset(env)
- Parameters:
env (CMDP) – The environment to wrap.
device (torch.device) – The torch device to use.
Initialize an instance of
AutoReset
.- step(action)[source]#
Run one timestep of the environment’s dynamics using the agent actions.
Note
If the episode is terminated, the environment will be reset. The
obs
will be the first observation of the new episode. And the true final observation will be stored ininfo['final_observation']
.- Parameters:
action (torch.Tensor) – The action from the agent or random.
- Returns:
observation – The agent’s observation of the current environment.
reward – The amount of reward returned after previous action.
cost – The amount of cost returned after previous action.
terminated – Whether the episode has ended.
truncated – Whether the episode has been truncated due to a time limit.
info – Some information logged by the environment.
- Return type:
tuple
[Tensor
,Tensor
,Tensor
,Tensor
,Tensor
,dict
[str
,Any
]]
Observation Normalization Wrapper#
Documentation
- class omnisafe.envs.wrapper.ObsNormalize(env, device, norm=None)[source]#
Normalize the observation.
Examples
>>> env = ObsNormalize(env) >>> norm = Normalizer(env.observation_space.shape) # load saved normalizer >>> env = ObsNormalize(env, norm)
- Parameters:
env (CMDP) – The environment to wrap.
device (torch.device) – The torch device to use.
norm (Normalizer or None, optional) – The normalizer to use. Defaults to None.
Initialize an instance of
ObsNormalize
.- reset(seed=None, options=None)[source]#
Reset the environment and returns an initial observation.
- Parameters:
seed (int, optional) – The random seed. Defaults to None.
options (dict[str, Any], optional) – The options for the environment. Defaults to None.
- Returns:
observation – The initial observation of the space.
info – Some information logged by the environment.
- Return type:
tuple[torch.Tensor, dict[str, Any]]
- save()[source]#
Save the observation normalizer. :rtype:
dict
[str
,Module
]Note
The saved components will be stored in the wrapped environment. If the environment is not wrapped, the saved components will be empty dict. common wrappers are obs_normalize, reward_normalize, and cost_normalize. When evaluating the saved model, the normalizer should be loaded.
- Returns:
The saved components, that is the observation normalizer.
- step(action)[source]#
Run one timestep of the environment’s dynamics using the agent actions.
Note
The observation and the
info['final_observation']
will be normalized.- Parameters:
action (torch.Tensor) – The action from the agent or random.
- Returns:
observation – The agent’s observation of the current environment.
reward – The amount of reward returned after previous action.
cost – The amount of cost returned after previous action.
terminated – Whether the episode has ended.
truncated – Whether the episode has been truncated due to a time limit.
info – Some information logged by the environment.
- Return type:
tuple
[Tensor
,Tensor
,Tensor
,Tensor
,Tensor
,dict
[str
,Any
]]
Reward Normalization Wrapper#
Documentation
- class omnisafe.envs.wrapper.RewardNormalize(env, device, norm=None)[source]#
Normalize the reward.
Examples
>>> env = RewardNormalize(env) >>> norm = Normalizer(()) # load saved normalizer >>> env = RewardNormalize(env, norm)
- Parameters:
env (CMDP) – The environment to wrap.
device (torch.device) – The torch device to use.
norm (Normalizer or None, optional) – The normalizer to use. Defaults to None.
Initialize an instance of
RewardNormalize
.- save()[source]#
Save the reward normalizer. :rtype:
dict
[str
,Module
]Note
The saved components will be stored in the wrapped environment. If the environment is not wrapped, the saved components will be empty dict. common wrappers are obs_normalize, reward_normalize, and cost_normalize.
- Returns:
The saved components, that is the reward normalizer.
- step(action)[source]#
Run one timestep of the environment’s dynamics using the agent actions.
Note
The reward will be normalized for agent training. Then the original reward will be stored in
info['original_reward']
for logging.- Parameters:
action (torch.Tensor) – The action from the agent or random.
- Returns:
observation – The agent’s observation of the current environment.
reward – The amount of reward returned after previous action.
cost – The amount of cost returned after previous action.
terminated – Whether the episode has ended.
truncated – Whether the episode has been truncated due to a time limit.
info – Some information logged by the environment.
- Return type:
tuple
[Tensor
,Tensor
,Tensor
,Tensor
,Tensor
,dict
[str
,Any
]]
Cost Normalization Wrapper#
Documentation
- class omnisafe.envs.wrapper.CostNormalize(env, device, norm=None)[source]#
Normalize the cost.
Examples
>>> env = CostNormalize(env) >>> norm = Normalizer(()) # load saved normalizer >>> env = CostNormalize(env, norm)
- Parameters:
env (CMDP) – The environment to wrap.
device (torch.device) – The torch device to use.
norm (Normalizer or None, optional) – The normalizer to use. Defaults to None.
Initialize an instance of
CostNormalize
.- save()[source]#
Save the cost normalizer. :rtype:
dict
[str
,Module
]Note
The saved components will be stored in the wrapped environment. If the environment is not wrapped, the saved components will be empty dict. common wrappers are obs_normalize, reward_normalize, and cost_normalize.
- Returns:
The saved components, that is the cost normalizer.
- step(action)[source]#
Run one timestep of the environment’s dynamics using the agent actions.
Note
The cost will be normalized for agent training. Then the original reward will be stored in
info['original_cost']
for logging.- Parameters:
action (torch.Tensor) – The action from the agent or random.
- Returns:
observation – The agent’s observation of the current environment.
reward – The amount of reward returned after previous action.
cost – The amount of cost returned after previous action.
terminated – Whether the episode has ended.
truncated – Whether the episode has been truncated due to a time limit.
info – Some information logged by the environment.
- Return type:
tuple
[Tensor
,Tensor
,Tensor
,Tensor
,Tensor
,dict
[str
,Any
]]
Action Scale#
Documentation
- class omnisafe.envs.wrapper.ActionScale(env, device, low, high)[source]#
Scale the action space to a given range.
Examples
>>> env = ActionScale(env, low=-1, high=1) >>> env.action_space Box(-1.0, 1.0, (1,), float32)
- Parameters:
env (CMDP) – The environment to wrap.
device (torch.device) – The device to use.
low (int or float) – The lower bound of the action space.
high (int or float) – The upper bound of the action space.
Initialize an instance of
ActionScale
.- step(action)[source]#
Run one timestep of the environment’s dynamics using the agent actions.
Note
The action will be scaled to the original range for agent training.
- Parameters:
action (torch.Tensor) – The action from the agent or random.
- Returns:
observation – The agent’s observation of the current environment.
reward – The amount of reward returned after previous action.
cost – The amount of cost returned after previous action.
terminated – Whether the episode has ended.
truncated – Whether the episode has been truncated due to a time limit.
info – Some information logged by the environment.
- Return type:
tuple
[Tensor
,Tensor
,Tensor
,Tensor
,Tensor
,dict
[str
,Any
]]
Action Repeat#
Documentation
- class omnisafe.envs.wrapper.ActionRepeat(env, times, device)[source]#
Repeat action given times.
Example
>>> env = ActionRepeat(env, times=3)
Initialize the wrapper.
- Parameters:
env (
CMDP
) – The environment to wrap.times (
int
) – The number of times to repeat the action.device (
device
) – The device to use.
- step(action)[source]#
Run self._times timesteps of the environment’s dynamics using the agent actions.
- Parameters:
action (torch.Tensor) – The action from the agent or random.
- Returns:
observation – The agent’s observation of the current environment.
reward – The amount of reward returned after previous action.
cost – The amount of cost returned after previous action.
terminated – Whether the episode has ended.
truncated – Whether the episode has been truncated due to a time limit.
info – Some information logged by the environment.
- Return type:
tuple
[Tensor
,Tensor
,Tensor
,Tensor
,Tensor
,dict
[str
,Any
]]
Unsqueeze Wrapper#
Documentation
- class omnisafe.envs.wrapper.Unsqueeze(env, device)[source]#
Unsqueeze the observation, reward, cost, terminated, truncated and info.
Examples
>>> env = Unsqueeze(env)
Initialize an instance of
Unsqueeze
.- reset(seed=None, options=None)[source]#
Reset the environment and returns a new observation.
Note
The vector information will be unsqueezed to (1, dim) for agent training.
- Parameters:
seed (int, optional) – The random seed. Defaults to None.
options (dict[str, Any], optional) – The options for the environment. Defaults to None.
- Returns:
observation – The initial observation of the space.
info – Some information logged by the environment.
- Return type:
tuple[torch.Tensor, dict[str, Any]]
- step(action)[source]#
Run one timestep of the environment’s dynamics using the agent actions.
Note
The vector information will be unsqueezed to (1, dim) for agent training.
- Parameters:
action (torch.Tensor) – The action from the agent or random.
- Returns:
observation – The agent’s observation of the current environment.
reward – The amount of reward returned after previous action.
cost – The amount of cost returned after previous action.
terminated – Whether the episode has ended.
truncated – Whether the episode has been truncated due to a time limit.
info – Some information logged by the environment.
- Return type:
tuple
[Tensor
,Tensor
,Tensor
,Tensor
,Tensor
,dict
[str
,Any
]]