OmniSafe Customization Interface of Environments#

CustomEnv(env_id, **kwargs)

Simplest environment for the example and template for environment customization.

CustomEnv#

Documentation

class omnisafe.envs.custom_env.CustomEnv(env_id, **kwargs)[source]#

Simplest environment for the example and template for environment customization.

If you wish for your environment to become part of the officially supported environments by OmniSafe, please refer to this document to implement environment embedding. We will welcome your GitHub pull request.

Customizing the environment in OmniSafe requires specifying the following parameters:

Variables:

_support_envs (ClassVar[list[str]]) – A list composed of strings, used to display all task names supported by the customized environment. For example: [‘Simple-v0’].
_action_space – The action space of the task. It can be defined by directly passing an OmniSafeSpace object, or specified in __init__() based on the characteristics of the customized environment.
_observation_space – The observation space of the task. It can be defined by directly passing an OmniSafeSpace object, or specified in __init__() based on the characteristics of the customized environment.
metadata (ClassVar[dict[str, int]]) – A class variable containing environment metadata, such as render FPS.
need_time_limit_wrapper (bool) – Whether the environment needs a time limit wrapper.
need_auto_reset_wrapper (bool) – Whether the environment needs an auto-reset wrapper.
_num_envs (int) – The number of parallel environments.

Warning

The omnisafe.adapter.OnlineAdapter, omnisafe.adapter.OfflineAdapter, and omnisafe.adapter.ModelBasedAdapter implemented by OmniSafe use omnisafe.envs.wrapper.AutoReset and omnisafe.envs.wrapper.TimeLimit in algorithm updates. We recommend setting need_auto_reset_wrapper and need_time_limit_wrapper to True. If you do not want to use these wrappers, you can add customized logic in the step() function of the customized environment.

Initialize CustomEnv with the given environment ID and optional keyword arguments.

Note

Optionally, you can specify some environment-specific information that needs to be logged. You need to complete this operation in two steps:

Define the environment information in dictionary format in __init__().
Log the environment information in spec_log(). Please note that the logging in
OmniSafe will occur at the end of each episode, so you need to consider how to reset the logging values for each episode.

Example

>>> # First, define the environment information in dictionary format in __init__.
>>> def __init__(self, env_id: str, **kwargs: Any) -> None:
>>>     self.env_spec_log = {'Env/Interaction': 0,}
>>>
>>> # Then, log and reset the environment information in spec_log.
>>> def spec_log(self, logger: Logger) -> dict[str, Any]:
>>>     logger.store({'Env/Interaction': self.env_spec_log['Env/Interaction']})
>>>     self.env_spec_log['Env/Interaction'] = 0

Parameters:

env_id (str) – The environment ID.
**kwargs (Any) – Additional keyword arguments.

close()[source]#

Close the environment.

Return type:: None

property max_episode_steps: int#: The max steps per episode.

render()[source]#

Render the environment.

Returns:: Any – An array representing the rendered environment.
Return type:: Any

reset(seed=None, options=None)[source]#

Reset the environment.

Parameters:

seed (int, optional) – The random seed to use for the environment. Defaults to None.
options (dict[str, Any], optional) – Additional options. Defaults to None.

Returns:

tuple[torch.Tensor, dict] – A tuple containing: - obs (torch.Tensor): The initial observation. - info (dict): Additional information.

Return type:

tuple[torch.Tensor, dict]

set_seed(seed)[source]#

Set the random seed for the environment.

Parameters:: seed (int) – The random seed.
Return type:: None

spec_log(logger)[source]#

Log specific environment into logger.

Note

This function will be called after each episode.

Parameters:: logger (Logger) – The logger to use for logging.
Return type:: None

step(action)[source]#

Run one timestep of the environment’s dynamics using the agent actions.

Note

You need to implement dynamic features related to environment interaction here. That is:

Update the environment state based on the action;
Calculate reward and cost based on the environment state;
Determine whether to terminate based on the environment state;
Record the information you need.

Parameters:

action (torch.Tensor) – The action from the agent or random.

Returns:

observation – The agent’s observation of the current environment.
reward – The amount of reward returned after previous action.
cost – The amount of cost returned after previous action.
terminated – Whether the episode has ended.
truncated – Whether the episode has been truncated due to a time limit.
info – Some information logged by the environment.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor, dict]