OmniSafe Customization Interface of Environments#
|
Simplest environment for the example and template for environment customization. |
CustomEnv#
Documentation
- class omnisafe.envs.custom_env.CustomEnv(env_id, **kwargs)[source]#
Simplest environment for the example and template for environment customization.
If you wish for your environment to become part of the officially supported environments by OmniSafe, please refer to this document to implement environment embedding. We will welcome your GitHub pull request.
Customizing the environment in OmniSafe requires specifying the following parameters:
- Variables:
_support_envs (ClassVar[list[str]]) – A list composed of strings, used to display all task names supported by the customized environment. For example: [‘Simple-v0’].
_action_space – The action space of the task. It can be defined by directly passing an
OmniSafeSpace
object, or specified in__init__()
based on the characteristics of the customized environment._observation_space – The observation space of the task. It can be defined by directly passing an
OmniSafeSpace
object, or specified in__init__()
based on the characteristics of the customized environment.metadata (ClassVar[dict[str, int]]) – A class variable containing environment metadata, such as render FPS.
need_time_limit_wrapper (bool) – Whether the environment needs a time limit wrapper.
need_auto_reset_wrapper (bool) – Whether the environment needs an auto-reset wrapper.
_num_envs (int) – The number of parallel environments.
Warning
The
omnisafe.adapter.OnlineAdapter
,omnisafe.adapter.OfflineAdapter
, andomnisafe.adapter.ModelBasedAdapter
implemented by OmniSafe useomnisafe.envs.wrapper.AutoReset
andomnisafe.envs.wrapper.TimeLimit
in algorithm updates. We recommend settingneed_auto_reset_wrapper
andneed_time_limit_wrapper
toTrue
. If you do not want to use these wrappers, you can add customized logic in thestep()
function of the customized environment.Initialize CustomEnv with the given environment ID and optional keyword arguments.
Note
Optionally, you can specify some environment-specific information that needs to be logged. You need to complete this operation in two steps:
Define the environment information in dictionary format in
__init__()
.- Log the environment information in
spec_log()
. Please note that the logging in OmniSafe will occur at the end of each episode, so you need to consider how to reset the logging values for each episode.
- Log the environment information in
Example
>>> # First, define the environment information in dictionary format in __init__. >>> def __init__(self, env_id: str, **kwargs: Any) -> None: >>> self.env_spec_log = {'Env/Interaction': 0,} >>> >>> # Then, log and reset the environment information in spec_log. >>> def spec_log(self, logger: Logger) -> dict[str, Any]: >>> logger.store({'Env/Interaction': self.env_spec_log['Env/Interaction']}) >>> self.env_spec_log['Env/Interaction'] = 0
- Parameters:
env_id (str) – The environment ID.
**kwargs (
Any
) – Additional keyword arguments.
- property max_episode_steps: int#
The max steps per episode.
- render()[source]#
Render the environment.
- Returns:
Any – An array representing the rendered environment.
- Return type:
Any
- reset(seed=None, options=None)[source]#
Reset the environment.
- Parameters:
seed (int, optional) – The random seed to use for the environment. Defaults to None.
options (dict[str, Any], optional) – Additional options. Defaults to None.
- Returns:
tuple[torch.Tensor, dict] – A tuple containing: - obs (torch.Tensor): The initial observation. - info (dict): Additional information.
- Return type:
tuple[torch.Tensor, dict]
- set_seed(seed)[source]#
Set the random seed for the environment.
- Parameters:
seed (int) – The random seed.
- Return type:
None
- spec_log(logger)[source]#
Log specific environment into logger.
Note
This function will be called after each episode.
- Parameters:
logger (Logger) – The logger to use for logging.
- Return type:
None
- step(action)[source]#
Run one timestep of the environment’s dynamics using the agent actions.
Note
You need to implement dynamic features related to environment interaction here. That is:
Update the environment state based on the action;
Calculate reward and cost based on the environment state;
Determine whether to terminate based on the environment state;
Record the information you need.
- Parameters:
action (torch.Tensor) – The action from the agent or random.
- Returns:
observation – The agent’s observation of the current environment.
reward – The amount of reward returned after previous action.
cost – The amount of cost returned after previous action.
terminated – Whether the episode has ended.
truncated – Whether the episode has been truncated due to a time limit.
info – Some information logged by the environment.
- Return type:
tuple
[Tensor
,Tensor
,Tensor
,Tensor
,Tensor
,dict
]