Safety Gymnasium Environment#

Safety Gymnasium Interface#

Documentation

class omnisafe.envs.safety_gymnasium_env.SafetyGymnasiumEnv(env_id, num_envs=1, device=DEVICE_CPU, **kwargs)[source]#

Safety Gymnasium Environment.

Parameters:
  • env_id (str) – Environment id.

  • num_envs (int, optional) – Number of environments. Defaults to 1.

  • device (torch.device, optional) – Device to store the data. Defaults to torch.device('cpu').

Keyword Arguments:
  • render_mode (str, optional) – The render mode ranges from ‘human’ to ‘rgb_array’ and ‘rgb_array_list’. Defaults to ‘rgb_array’.

  • camera_name (str, optional) – The camera name.

  • camera_id (int, optional) – The camera id.

  • width (int, optional) – The width of the rendered image. Defaults to 256.

  • height (int, optional) – The height of the rendered image. Defaults to 256.

Variables:
  • need_auto_reset_wrapper (bool) – Whether to use auto reset wrapper.

  • need_time_limit_wrapper (bool) – Whether to use time limit wrapper.

Initialize an instance of SafetyGymnasiumEnv.

close()[source]#

Close the environment.

Return type:

None

property max_episode_steps: int#

The max steps per episode.

render()[source]#

Compute the render frames as specified by render_mode during the initialization of the environment.

Returns:

The render frames – we recommend to use np.ndarray which could construct video by moviepy.

Return type:

Any

reset(seed=None, options=None)[source]#

Reset the environment.

Parameters:
  • seed (int, optional) – The random seed. Defaults to None.

  • options (dict[str, Any], optional) – The options for the environment. Defaults to None.

Returns:
  • observation – Agent’s observation of the current environment.

  • info – Some information logged by the environment.

Return type:

tuple[torch.Tensor, dict[str, Any]]

set_seed(seed)[source]#

Set the seed for the environment.

Parameters:

seed (int) – Seed to set.

Return type:

None

step(action)[source]#

Step the environment.

Note

OmniSafe uses auto reset wrapper to reset the environment when the episode is terminated. So the obs will be the first observation of the next episode. And the true final_observation in info will be stored in the final_observation key of info.

Parameters:

action (torch.Tensor) – Action to take.

Returns:
  • observation – The agent’s observation of the current environment.

  • reward – The amount of reward returned after previous action.

  • cost – The amount of cost returned after previous action.

  • terminated – Whether the episode has ended.

  • truncated – Whether the episode has been truncated due to a time limit.

  • info – Some information logged by the environment.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor, dict[str, Any]]

Safety Gymnasium World Model#

Documentation

class omnisafe.envs.safety_gymnasium_modelbased.SafetyGymnasiumModelBased(env_id, num_envs=1, device='cpu', use_lidar=False, **kwargs)[source]#

Safety Gymnasium environment for Model-based algorithms.

Variables:
  • _support_envs (list[str]) – List of supported environments.

  • need_auto_reset_wrapper (bool) – Whether to use auto reset wrapper.

  • need_time_limit_wrapper (bool) – Whether to use time limit wrapper.

Initialize the environment.

Parameters:
  • env_id (str) – Environment id.

  • num_envs (int, optional) – Number of environments. Defaults to 1.

  • device (torch.device, optional) – Device to store the data. Defaults to ‘cpu’.

  • use_lidar (bool, optional) – Whether to use lidar observation. Defaults to False.

Keyword Arguments:
  • render_mode (str, optional) – The render mode, ranging from human, rgb_array, rgb_array_list. Defaults to rgb_array.

  • camera_name (str, optional) – The camera name.

  • camera_id (int, optional) – The camera id.

  • width (int, optional) – The width of the rendered image. Defaults to 256.

  • height (int, optional) – The height of the rendered image. Defaults to 256.

_dist_xy(pos1, pos2)[source]#

Return the distance from the robot to an XY position.

Parameters:
  • pos1 (np.ndarray | list[np.ndarray]) – The first position.

  • pos2 (np.ndarray | list[np.ndarray]) – The second position.

Returns:

distance – The distance between the two positions.

Return type:

float

_ego_xy(robot_matrix, robot_pos, pos)[source]#

Return the egocentric XY vector to a position from the robot.

Parameters:
  • robot_matrix (list[list[float]]) – 3x3 rotation matrix.

  • robot_pos (np.ndarray) – 2D robot position.

  • pos (np.ndarray) – 2D position.

Returns:

2D_egocentric_vector – The 2D egocentric vector.

Return type:

ndarray

_get_coordinate_sensor()[source]#

Return the coordinate observation and sensor observation.

We will ignore the z-axis coordinates in every poses. The returned obs coordinates are all in the robot coordinates.

Returns:

coordinate_obs – The coordinate observation.

Return type:

dict[str, Any]

_get_flat_coordinate(coordinate_obs)[source]#

Get the flattened obs.

Parameters:

coordinate_obs (dict[str, Any]) – The dict of coordinate and sensor observations.

Returns:

flat_obs – The flattened observation.

Return type:

ndarray

_obs_lidar_pseudo(robot_matrix, robot_pos, positions)[source]#

Return a robot-centric lidar observation of a list of positions.

Lidar is a set of bins around the robot (divided evenly in a circle). The detection directions are exclusive and exhaustive for a full 360 view. Each bin reads 0 if there are no objects in that direction. If there are multiple objects, the distance to the closest one is used. Otherwise the bin reads the fraction of the distance towards the robot.

E.g. if the object is 90% of lidar_max_dist away, the bin will read 0.1, and if the object is 10% of lidar_max_dist away, the bin will read 0.9. (The reading can be thought of as “closeness” or inverse distance)

This encoding has some desirable properties:
  • bins read 0 when empty

  • bins smoothly increase as objects get close

  • maximum reading is 1.0 (where the object overlaps the robot)

  • close objects occlude far objects

  • constant size observation with variable numbers of objects

Parameters:
  • robot_matrix (list[list[float]]) – 3x3 rotation matrix.

  • robot_pos (np.ndarray) – 2D robot position.

  • positions (list[np.ndarray]) – 2D positions.

Returns:

lidar_observation – The lidar observation.

Return type:

ndarray

close()[source]#

Close the environment.

Return type:

None

get_cost_from_obs_tensor(obs, is_binary=True)[source]#

Get batch cost from batch observation.

Parameters:
  • obs (torch.Tensor) – Batch observation.

  • is_binary (bool, optional) – Whether to use binary cost. Defaults to True.

Returns:

cost – Batch cost.

Return type:

Tensor

get_lidar_from_coordinate(obs)[source]#

Get lidar observation.

Parameters:

obs (np.ndarray) – The observation.

Returns:

lidar_obs – The lidar observation.

Return type:

Tensor

property max_episode_steps: int#

The max steps per episode.

render()[source]#

Render the environment.

Returns:
  • The rendered frames, we recommend using `np.ndarray` which could construct video by

  • moviepy.

Return type:

Any

reset(seed=None, options=None)[source]#

Reset the environment.

Parameters:
  • seed (int, optional) – The random seed. Defaults to None.

  • options (dict[str, Any], optional) – The options for the environment. Defaults to None.

Returns:
  • observation – The initial observation of the space.

  • info – Some information logged by the environment.

Return type:

tuple[torch.Tensor, dict[str, Any]]

set_seed(seed)[source]#

Set the seed for the environment.

Parameters:

seed (int) – Seed to set.

Return type:

None

step(action)[source]#

Step the environment.

Note

OmniSafe use auto reset wrapper to reset the environment when the episode is terminated. So the obs will be the first observation of the next episode. And the true final_observation in info will be stored in the final_observation key of info.

Parameters:

action (torch.Tensor) – Action to take.

Returns:
  • observation – The agent’s observation of the current environment.

  • reward – The amount of reward returned after previous action.

  • cost – The amount of cost returned after previous action.

  • terminated – Whether the episode has ended.

  • truncated – Whether the episode has been truncated due to a time limit.

  • info – Some information logged by the environment.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor, dict[str, Any]]

property task: str#

The name of the task.