Mujoco Environment#

MujocoEnv Interface#

Documentation

class omnisafe.envs.mujoco_env.MujocoEnv(env_id, num_envs=1, device='cpu', **kwargs)[source]#

Gymnasium Mujoco environment.

Variables:
  • need_auto_reset_wrapper (bool) – Whether to use auto reset wrapper.

  • need_time_limit_wrapper (bool) – Whether to use time limit wrapper.

Initialize the environment.

Parameters:
  • env_id (str) – Environment id.

  • num_envs (int, optional) – Number of environments. Defaults to 1.

  • device (torch.device, optional) – Device to store the data. Defaults to ‘cpu’.

Keyword Arguments:
  • render_mode (str, optional) – The render mode, ranging from human, rgb_array, rgb_array_list. Defaults to rgb_array.

  • camera_name (str, optional) – The camera name.

  • camera_id (int, optional) – The camera id.

  • width (int, optional) – The width of the rendered image. Defaults to 256.

  • height (int, optional) – The height of the rendered image. Defaults to 256.

close()[source]#

Close the environment.

Return type:

None

property max_episode_steps: int#

The max steps per episode.

render()[source]#

Render the environment.

Returns:

Rendered environment.

Return type:

Any

reset(seed=None, options=None)[source]#

Reset the environment.

Parameters:
  • seed (int, optional) – The random seed. Defaults to None.

  • options (dict[str, Any], optional) – The options for the environment. Defaults to None.

Returns:
  • observation – Agent’s observation of the current environment.

  • info – Auxiliary diagnostic information (helpful for debugging, and sometimes learning).

Return type:

tuple[torch.Tensor, dict]

set_seed(seed)[source]#

Set the seed for the environment.

Parameters:

seed (int) – Seed to set.

Return type:

None

step(action)[source]#

Step the environment.

Note

OmniSafe use auto reset wrapper to reset the environment when the episode is terminated. So the obs will be the first observation of the next episode. And the true final_observation in info will be stored in the final_observation key of info.

Parameters:

action (torch.Tensor) – Action to take.

Returns:
  • observation – Agent’s observation of the current environment.

  • reward – Amount of reward returned after previous action.

  • cost – Amount of cost returned after previous action.

  • terminated – Whether the episode has ended.

  • truncated – Whether the episode has been truncated due to a time limit.

  • info – Auxiliary diagnostic information (helpful for debugging, and sometimes learning).

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor, dict[str, Any]]