OmniSafe Critic#

Critic(obs_space, act_space, hidden_sizes[, ...])

An abstract class for critic.

Base Critic#

Documentation

class omnisafe.models.base.Critic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]#

An abstract class for critic.

A critic approximates the value function that maps observations to values. Critic is parameterized by a neural network that takes observations as input, (Q critic also takes actions as input) and outputs the value estimated.

Note

OmniSafe provides two types of critic: Q critic (Input = observation + action , Output = value), and V critic (Input = observation , Output = value). You can also use this class to implement your own actor by inheriting it.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (list of int) – List of hidden layer sizes.

  • activation (Activation, optional) – Activation function. Defaults to 'relu'.

  • weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to 'kaiming_uniform'.

  • num_critics (int, optional) – Number of critics. Defaults to 1.

  • use_obs_encoder (bool, optional) – Whether to use observation encoder, only used in q critic. Defaults to False.

Initialize an instance of Critic.

CriticBuilder(obs_space, act_space, hidden_sizes)

Implementation of CriticBuilder.

QCritic(obs_space, act_space, hidden_sizes)

Implementation of Q Critic.

VCritic(obs_space, act_space, hidden_sizes)

Implementation of VCritic.

Critic Builder#

Documentation

class omnisafe.models.critic.CriticBuilder(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]#

Implementation of CriticBuilder.

Note

A CriticBuilder is a class for building a critic network. In OmniSafe, instead of building the critic network directly, we build it by integrating various types of critic networks into the CriticBuilder. The advantage of this is that each type of critic has a uniform way of passing parameters. This makes it easy for users to use existing critics, and also facilitates the extension of new critic types.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (list of int) – List of hidden layer sizes.

  • activation (Activation, optional) – Activation function. Defaults to 'relu'.

  • weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to 'kaiming_uniform'.

  • num_critics (int, optional) – Number of critics. Defaults to 1.

  • use_obs_encoder (bool, optional) – Whether to use observation encoder, only used in q critic. Defaults to False.

Initialize an instance of CriticBuilder.

build_critic(critic_type)[source]#

Build critic.

Currently, we support two types of critics: q and v. If you want to add a new critic type, you can simply add it here.

Parameters:

critic_type (str) – Critic type.

Returns:

An instance of V-Critic or Q-Critic

Raises:

NotImplementedError – If the critic type is not q or v.

Return type:

Critic

Q Critic#

Documentation

class omnisafe.models.critic.QCritic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]#

Implementation of Q Critic.

A Q-function approximator that uses a multi-layer perceptron (MLP) to map observation-action pairs to Q-values. This class is an inherit class of Critic. You can design your own Q-function approximator by inheriting this class or Critic.

The Q critic network has two modes:

Hint

  • use_obs_encoder = False: The input of the network is the concatenation of the

    observation and action.

  • use_obs_encoder = True: The input of the network is the concatenation of the output of

    the observation encoder and action.

For example, in DDPG, the action is not directly concatenated with the observation, but is concatenated with the output of the observation encoder.

Note

The Q critic network contains multiple critics, and the output of the network :meth`forward` is a list of Q-values. If you want to get the single Q-value of a specific critic, you need to use the index to get it.

Parameters:
  • obs_space (OmnisafeSpace) – observation space.

  • act_space (OmnisafeSpace) – action space.

  • hidden_sizes (list of int) – List of hidden layer sizes.

  • activation (Activation, optional) – Activation function. Defaults to 'relu'.

  • weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to 'kaiming_uniform'.

  • num_critics (int, optional) – Number of critics. Defaults to 1.

  • use_obs_encoder (bool, optional) – Whether to use observation encoder, only used in q critic. Defaults to False.

Initialize an instance of QCritic.

forward(obs, act)[source]#

Forward function.

As a multi-critic network, the output of the network is a list of Q-values. If you want to use it as a single-critic network, you only need to set the num_critics parameter to 1 when initializing the network, and then use the index 0 to get the Q-value.

Parameters:
  • obs (torch.Tensor) – Observation from environments.

  • act (torch.Tensor) – Action from actor .

Returns:

A list of Q critic values of action and observation pair.

Return type:

list[Tensor]

V Critic#

Documentation

class omnisafe.models.critic.VCritic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1)[source]#

Implementation of VCritic.

A V-function approximator that uses a multi-layer perceptron (MLP) to map observations to V-values. This class is an inherit class of Critic. You can design your own V-function approximator by inheriting this class or Critic.

Parameters:
  • obs_dim (int) – Observation dimension.

  • act_dim (int) – Action dimension.

  • hidden_sizes (list of int) – List of hidden layer sizes.

  • activation (Activation, optional) – Activation function. Defaults to 'relu'.

  • weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to 'kaiming_uniform'.

  • num_critics (int, optional) – Number of critics. Defaults to 1.

Initialize an instance of VCritic.

forward(obs)[source]#

Forward function.

Specifically, V function approximator maps observations to V-values.

Parameters:

obs (torch.Tensor) – Observations from environments.

Returns:

The V critic value of observation.

Return type:

list[Tensor]