OmniSafe Actor#
Base Actor#
Documentation
- class omnisafe.models.base.Actor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#
An abstract class for actor.
An actor approximates the policy function that maps observations to actions. Actor is parameterized by a neural network that takes observations as input, and outputs the mean and standard deviation of the action distribution.
Note
You can use this class to implement your own actor by inheriting it.
- Parameters:
obs_space (OmnisafeSpace) – observation space.
act_space (OmnisafeSpace) – action space.
hidden_sizes (list of int) – List of hidden layer sizes.
activation (Activation, optional) – Activation function. Defaults to
'relu'
.weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to
'kaiming_uniform'
.
Initialize an instance of
Actor
.- abstract _distribution(obs)[source]#
Return the distribution of action.
An actor generates a distribution, which is used to sample actions during training. When training, the mean and the variance of the distribution are used to calculate the loss. When testing, the mean of the distribution is used directly as actions.
For example, if the action is continuous, the actor can generate a Gaussian distribution.
(3)#\[p (a | s) = N (\mu (s), \sigma (s))\]where \(\mu (s)\) and \(\sigma (s)\) are the mean and standard deviation of the distribution.
Warning
The distribution is a private method, which is only used to sample actions during training. You should not use it directly in your code, instead, you should use the public method
predict()
to sample actions.- Parameters:
obs (torch.Tensor) – Observation from environments.
- Returns:
The distribution of action.
- Return type:
Distribution
- abstract forward(obs)[source]#
Return the distribution of action.
- Parameters:
obs (torch.Tensor) – Observation from environments.
- Return type:
Distribution
- abstract log_prob(act)[source]#
Return the log probability of action under the distribution.
log_prob()
only can be called after callingpredict()
orforward()
.- Parameters:
act (torch.Tensor) – The action.
- Returns:
The log probability of action under the distribution.
- Return type:
Tensor
- abstract predict(obs, deterministic=False)[source]#
Predict deterministic or stochastic action based on observation.
deterministic
=True
orFalse
When training the actor, one important trick to avoid local minimum is to use stochastic actions, which can simply be achieved by sampling actions from the distribution (set
deterministic=False
).When testing the actor, we want to know the actual action that the agent will take, so we should use deterministic actions (set
deterministic=True
).(4)#\[L = -\underset{s \sim p(s)}{\mathbb{E}}[ \log p (a | s) A^R (s, a) ]\]where \(p (s)\) is the distribution of observation, \(p (a | s)\) is the distribution of action, and \(\log p (a | s)\) is the log probability of action under the distribution., and \(A^R (s, a)\) is the advantage function.
- Parameters:
obs (torch.Tensor) – Observation from environments.
deterministic (bool, optional) – Whether to predict deterministic action. Defaults to False.
- Return type:
Tensor
|
Class for building actor networks. |
|
An abstract class for normal distribution actor. |
|
Implementation of GaussianLearningActor. |
|
Implementation of GaussianSACActor. |
Actor Builder#
Documentation
- class omnisafe.models.actor.ActorBuilder(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#
Class for building actor networks.
- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (list of int) – List of hidden layer sizes.
activation (Activation, optional) – Activation function. Defaults to
'relu'
.weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to
'kaiming_uniform'
.
Initialize an instance of
ActorBuilder
.- build_actor(actor_type)[source]#
Build actor network.
- Currently, we support the following actor types:
gaussian_learning
: Gaussian actor with learnable standard deviation parameters.gaussian_sac
: Gaussian actor with learnable standard deviation network.mlp
: Multi-layer perceptron actor, used inDDPG
andTD3
.
- Parameters:
actor_type (ActorType) – Type of actor network, e.g.
gaussian_learning
.- Returns:
Actor network, ranging from GaussianLearningActor, GaussianSACActor to MLPActor.
- Raises:
NotImplementedError – If the actor type is not implemented.
- Return type:
Gaussian Actor#
Documentation
- class omnisafe.models.actor.GaussianActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#
An abstract class for normal distribution actor.
A NormalActor inherits from Actor and use Normal distribution to approximate the policy function.
Note
You can use this class to implement your own actor by inheriting it.
Initialize an instance of
Actor
.- abstract property std: float#
Get the standard deviation of the normal distribution.
Gaussian Learning Actor#
Documentation
- class omnisafe.models.actor.GaussianLearningActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#
Implementation of GaussianLearningActor.
GaussianLearningActor is a Gaussian actor with a learnable standard deviation. It is used in on-policy algorithms such as
PPO
,TRPO
and so on.- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (list of int) – List of hidden layer sizes.
activation (Activation, optional) – Activation function. Defaults to
'relu'
.weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to
'kaiming_uniform'
.
Initialize an instance of
GaussianLearningActor
.- _distribution(obs)[source]#
Get the distribution of the actor.
Warning
This method is not supposed to be called by users. You should call
forward()
instead.- Parameters:
obs (torch.Tensor) – Observation from environments.
- Returns:
The normal distribution of the mean and standard deviation from the actor.
- Return type:
Normal
- forward(obs)[source]#
Forward method.
- Parameters:
obs (torch.Tensor) – Observation from environments.
- Returns:
The current distribution.
- Return type:
Distribution
- predict(obs, deterministic=False)[source]#
Predict the action given observation.
The predicted action depends on the
deterministic
flag.If
deterministic
isTrue
, the predicted action is the mean of the distribution.If
deterministic
isFalse
, the predicted action is sampled from the distribution.
- Parameters:
obs (torch.Tensor) – Observation from environments.
deterministic (bool, optional) – Whether to use deterministic policy. Defaults to False.
- Returns:
The mean of the distribution if deterministic is True, otherwise the sampled action.
- Return type:
Tensor
- property std: float#
Standard deviation of the distribution.
Gaussian SAC Actor#
Documentation
- class omnisafe.models.actor.GaussianSACActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#
Implementation of GaussianSACActor.
GaussianSACActor is a Gaussian actor with a learnable standard deviation network. It is used in
SAC
, and other offline or model-based algorithms related toSAC
.- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (list of int) – List of hidden layer sizes.
activation (Activation, optional) – Activation function. Defaults to
'relu'
.weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to
'kaiming_uniform'
.
Initialize an instance of
GaussianSACActor
.- _distribution(obs)[source]#
Get the distribution of the actor.
Warning
This method is not supposed to be called by users. You should call
forward()
instead.Specifically, this method will clip the standard deviation to a range of [-20, 2].
- Parameters:
obs (torch.Tensor) – Observation from environments.
- Returns:
The normal distribution of the mean and standard deviation from the actor.
- Return type:
Normal
- forward(obs)[source]#
Forward method.
- Parameters:
obs (torch.Tensor) – Observation from environments.
- Returns:
The current distribution.
- Return type:
- log_prob(act)[source]#
Compute the log probability of the action given the current distribution.
Note
In this method, we will regularize the log probability of the action. The regularization is as follows:
(6)#\[\log prob = \log \pi (a|s) - \sum_{i=1}^n (2 \log 2 - a_i - \log (1 + e^{-2 a_i}))\]where \(a\) is the action, \(s\) is the observation, and \(n\) is the dimension of the action.
- predict(obs, deterministic=False)[source]#
Predict the action given observation.
The predicted action depends on the
deterministic
flag.If
deterministic
isTrue
, the predicted action is the mean of the distribution.If
deterministic
isFalse
, the predicted action is sampled from the distribution.
- Parameters:
obs (torch.Tensor) – Observation from environments.
deterministic (bool, optional) – Whether to use deterministic policy. Defaults to False.
- Returns:
The mean of the distribution if deterministic is True, otherwise the sampled action.
- Return type:
Tensor
- property std: float#
Standard deviation of the distribution.
Perturbation Actor#
Documentation
- class omnisafe.models.actor.PerturbationActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#
Class for Perturbation Actor.
Perturbation Actor is used in offline algorithms such as
BCQ
and so on. Perturbation Actor is a combination of VAE and a perturbation network, algorithm BCQ uses the perturbation network to perturb the action predicted by VAE, which trained like behavior cloning.- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (list) – List of hidden layer sizes.
latent_dim (Optional[int]) – Latent dimension, if None, latent_dim = act_dim * 2.
activation (Activation) – Activation function.
weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to
'kaiming_uniform'
.
Initialize an instance of
PerturbationActor
.- _distribution(obs)[source]#
Return the distribution of action.
An actor generates a distribution, which is used to sample actions during training. When training, the mean and the variance of the distribution are used to calculate the loss. When testing, the mean of the distribution is used directly as actions.
For example, if the action is continuous, the actor can generate a Gaussian distribution.
(8)#\[p (a | s) = N (\mu (s), \sigma (s))\]where \(\mu (s)\) and \(\sigma (s)\) are the mean and standard deviation of the distribution.
Warning
The distribution is a private method, which is only used to sample actions during training. You should not use it directly in your code, instead, you should use the public method
predict()
to sample actions.- Parameters:
obs (torch.Tensor) – Observation from environments.
- Returns:
The distribution of action.
- Return type:
Distribution
- forward(obs)[source]#
Forward is not used in this method, it is just for compatibility.
- Return type:
Distribution
- log_prob(act)[source]#
log_prob is not used in this method, it is just for compatibility.
- Return type:
Tensor
- property phi: float#
Return phi, which is the maximum perturbation.
- predict(obs, deterministic=False)[source]#
Predict action from observation.
deterministic is not used in this method, it is just for compatibility.
- Parameters:
obs (torch.Tensor) – Observation.
deterministic (bool, optional) – Whether to return deterministic action. Defaults to False.
- Returns:
torch.Tensor – Action.
- Return type:
Tensor
VAE Actor#
Documentation
- class omnisafe.models.actor.VAE(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#
Class for VAE.
VAE is a variational auto-encoder. It is used in offline algorithms such as
BCQ
and so on.- Parameters:
obs_space (OmnisafeSpace) – Observation space.
act_space (OmnisafeSpace) – Action space.
hidden_sizes (list) – List of hidden layer sizes.
latent_dim (Optional[int]) – Latent dimension, if None, latent_dim = act_dim * 2.
activation (Activation) – Activation function.
weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to
'kaiming_uniform'
.
Initialize an instance of
VAE
.- _distribution(obs)[source]#
Return the distribution of action.
An actor generates a distribution, which is used to sample actions during training. When training, the mean and the variance of the distribution are used to calculate the loss. When testing, the mean of the distribution is used directly as actions.
For example, if the action is continuous, the actor can generate a Gaussian distribution.
(10)#\[p (a | s) = N (\mu (s), \sigma (s))\]where \(\mu (s)\) and \(\sigma (s)\) are the mean and standard deviation of the distribution.
Warning
The distribution is a private method, which is only used to sample actions during training. You should not use it directly in your code, instead, you should use the public method
predict()
to sample actions.- Parameters:
obs (torch.Tensor) – Observation from environments.
- Returns:
The distribution of action.
- Return type:
Distribution
- decode(obs, latent=None)[source]#
Decode latent vector to action.
When
latent
is None, sample latent vector from standard normal distribution.- Parameters:
obs (torch.Tensor) – Observation.
latent (Optional[torch.Tensor], optional) – Latent vector. Defaults to None.
- Returns:
torch.Tensor – Action.
- Return type:
Tensor
- forward(obs)[source]#
Forward is not used in this method, it is just for compatibility.
- Return type:
Distribution
- log_prob(act)[source]#
log_prob is not used in this method, it is just for compatibility.
- Return type:
Tensor
- predict(obs, deterministic=False)[source]#
Predict the action given observation.
deterministic if not used in VAE model. VAE actor’s default behavior is stochastic, sampling from the latent standard normal distribution.
- Parameters:
obs (torch.Tensor) – Observation from environments.
deterministic (bool, optional) – Whether to use deterministic policy. Defaults to False.
- Returns:
torch.Tensor – Predicted action.
- Return type:
Tensor