OmniSafe Critic#
|
An abstract class for critic. |
Base Critic#
Documentation
- class omnisafe.models.base.Critic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]#
An abstract class for critic.
A critic approximates the value function that maps observations to values. Critic is parameterized by a neural network that takes observations as input, (Q critic also takes actions as input) and outputs the value estimated.
Note
OmniSafe provides two types of critic: Q critic (Input =
observation
+action
, Output =value
), and V critic (Input =observation
, Output =value
). You can also use this class to implement your own actor by inheriting it.- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (list of int) – List of hidden layer sizes.
activation (Activation, optional) – Activation function. Defaults to
'relu'
.weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to
'kaiming_uniform'
.num_critics (int, optional) – Number of critics. Defaults to 1.
use_obs_encoder (bool, optional) – Whether to use observation encoder, only used in q critic. Defaults to False.
Initialize an instance of
Critic
.
|
Implementation of CriticBuilder. |
|
Implementation of Q Critic. |
|
Implementation of VCritic. |
Critic Builder#
Documentation
- class omnisafe.models.critic.CriticBuilder(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]#
Implementation of CriticBuilder.
Note
A
CriticBuilder
is a class for building a critic network. In OmniSafe, instead of building the critic network directly, we build it by integrating various types of critic networks into theCriticBuilder
. The advantage of this is that each type of critic has a uniform way of passing parameters. This makes it easy for users to use existing critics, and also facilitates the extension of new critic types.- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (list of int) – List of hidden layer sizes.
activation (Activation, optional) – Activation function. Defaults to
'relu'
.weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to
'kaiming_uniform'
.num_critics (int, optional) – Number of critics. Defaults to 1.
use_obs_encoder (bool, optional) – Whether to use observation encoder, only used in q critic. Defaults to False.
Initialize an instance of
CriticBuilder
.- build_critic(critic_type)[source]#
Build critic.
Currently, we support two types of critics:
q
andv
. If you want to add a new critic type, you can simply add it here.- Parameters:
critic_type (str) – Critic type.
- Returns:
An instance of V-Critic or Q-Critic
- Raises:
NotImplementedError – If the critic type is not
q
orv
.- Return type:
Q Critic#
Documentation
- class omnisafe.models.critic.QCritic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1, use_obs_encoder=False)[source]#
Implementation of Q Critic.
A Q-function approximator that uses a multi-layer perceptron (MLP) to map observation-action pairs to Q-values. This class is an inherit class of
Critic
. You can design your own Q-function approximator by inheriting this class orCritic
.The Q critic network has two modes:
Hint
use_obs_encoder = False
: The input of the network is the concatenation of theobservation and action.
use_obs_encoder = True
: The input of the network is the concatenation of the output ofthe observation encoder and action.
For example, in
DDPG
, the action is not directly concatenated with the observation, but is concatenated with the output of the observation encoder.Note
The Q critic network contains multiple critics, and the output of the network :meth`forward` is a list of Q-values. If you want to get the single Q-value of a specific critic, you need to use the index to get it.
- Parameters:
obs_space (OmnisafeSpace) – observation space.
act_space (OmnisafeSpace) – action space.
hidden_sizes (list of int) – List of hidden layer sizes.
activation (Activation, optional) – Activation function. Defaults to
'relu'
.weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to
'kaiming_uniform'
.num_critics (int, optional) – Number of critics. Defaults to 1.
use_obs_encoder (bool, optional) – Whether to use observation encoder, only used in q critic. Defaults to False.
Initialize an instance of
QCritic
.- forward(obs, act)[source]#
Forward function.
As a multi-critic network, the output of the network is a list of Q-values. If you want to use it as a single-critic network, you only need to set the
num_critics
parameter to 1 when initializing the network, and then use the index 0 to get the Q-value.- Parameters:
obs (torch.Tensor) – Observation from environments.
act (torch.Tensor) – Action from actor .
- Returns:
A list of Q critic values of action and observation pair.
- Return type:
list
[Tensor
]
V Critic#
Documentation
- class omnisafe.models.critic.VCritic(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform', num_critics=1)[source]#
Implementation of VCritic.
A V-function approximator that uses a multi-layer perceptron (MLP) to map observations to V-values. This class is an inherit class of
Critic
. You can design your own V-function approximator by inheriting this class orCritic
.- Parameters:
obs_dim (int) – Observation dimension.
act_dim (int) – Action dimension.
hidden_sizes (list of int) – List of hidden layer sizes.
activation (Activation, optional) – Activation function. Defaults to
'relu'
.weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to
'kaiming_uniform'
.num_critics (int, optional) – Number of critics. Defaults to 1.
Initialize an instance of
VCritic
.