OmniSafe Actor#

Base Actor#

Documentation

class omnisafe.models.base.Actor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

An abstract class for actor.

An actor approximates the policy function that maps observations to actions. Actor is parameterized by a neural network that takes observations as input, and outputs the mean and standard deviation of the action distribution.

Note

You can use this class to implement your own actor by inheriting it.

Parameters:
  • obs_space (OmnisafeSpace) – observation space.

  • act_space (OmnisafeSpace) – action space.

  • hidden_sizes (list of int) – List of hidden layer sizes.

  • activation (Activation, optional) – Activation function. Defaults to 'relu'.

  • weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to 'kaiming_uniform'.

Initialize an instance of Actor.

abstract _distribution(obs)[source]#

Return the distribution of action.

An actor generates a distribution, which is used to sample actions during training. When training, the mean and the variance of the distribution are used to calculate the loss. When testing, the mean of the distribution is used directly as actions.

For example, if the action is continuous, the actor can generate a Gaussian distribution.

(3)#\[p (a | s) = N (\mu (s), \sigma (s))\]

where \(\mu (s)\) and \(\sigma (s)\) are the mean and standard deviation of the distribution.

Warning

The distribution is a private method, which is only used to sample actions during training. You should not use it directly in your code, instead, you should use the public method predict() to sample actions.

Parameters:

obs (torch.Tensor) – Observation from environments.

Returns:

The distribution of action.

Return type:

Distribution

abstract forward(obs)[source]#

Return the distribution of action.

Parameters:

obs (torch.Tensor) – Observation from environments.

Return type:

Distribution

abstract log_prob(act)[source]#

Return the log probability of action under the distribution.

log_prob() only can be called after calling predict() or forward().

Parameters:

act (torch.Tensor) – The action.

Returns:

The log probability of action under the distribution.

Return type:

Tensor

abstract predict(obs, deterministic=False)[source]#

Predict deterministic or stochastic action based on observation.

  • deterministic = True or False

When training the actor, one important trick to avoid local minimum is to use stochastic actions, which can simply be achieved by sampling actions from the distribution (set deterministic=False).

When testing the actor, we want to know the actual action that the agent will take, so we should use deterministic actions (set deterministic=True).

(4)#\[L = -\underset{s \sim p(s)}{\mathbb{E}}[ \log p (a | s) A^R (s, a) ]\]

where \(p (s)\) is the distribution of observation, \(p (a | s)\) is the distribution of action, and \(\log p (a | s)\) is the log probability of action under the distribution., and \(A^R (s, a)\) is the advantage function.

Parameters:
  • obs (torch.Tensor) – Observation from environments.

  • deterministic (bool, optional) – Whether to predict deterministic action. Defaults to False.

Return type:

Tensor

ActorBuilder(obs_space, act_space, hidden_sizes)

Class for building actor networks.

GaussianActor(obs_space, act_space, hidden_sizes)

An abstract class for normal distribution actor.

GaussianLearningActor(obs_space, act_space, ...)

Implementation of GaussianLearningActor.

GaussianSACActor(obs_space, act_space, ...)

Implementation of GaussianSACActor.

Actor Builder#

Documentation

class omnisafe.models.actor.ActorBuilder(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

Class for building actor networks.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (list of int) – List of hidden layer sizes.

  • activation (Activation, optional) – Activation function. Defaults to 'relu'.

  • weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to 'kaiming_uniform'.

Initialize an instance of ActorBuilder.

build_actor(actor_type)[source]#

Build actor network.

Currently, we support the following actor types:
  • gaussian_learning: Gaussian actor with learnable standard deviation parameters.

  • gaussian_sac: Gaussian actor with learnable standard deviation network.

  • mlp: Multi-layer perceptron actor, used in DDPG and TD3.

Parameters:

actor_type (ActorType) – Type of actor network, e.g. gaussian_learning.

Returns:

Actor network, ranging from GaussianLearningActor, GaussianSACActor to MLPActor.

Raises:

NotImplementedError – If the actor type is not implemented.

Return type:

Actor

Gaussian Actor#

Documentation

class omnisafe.models.actor.GaussianActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

An abstract class for normal distribution actor.

A NormalActor inherits from Actor and use Normal distribution to approximate the policy function.

Note

You can use this class to implement your own actor by inheriting it.

Initialize an instance of Actor.

abstract property std: float#

Get the standard deviation of the normal distribution.

Gaussian Learning Actor#

Documentation

class omnisafe.models.actor.GaussianLearningActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

Implementation of GaussianLearningActor.

GaussianLearningActor is a Gaussian actor with a learnable standard deviation. It is used in on-policy algorithms such as PPO, TRPO and so on.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (list of int) – List of hidden layer sizes.

  • activation (Activation, optional) – Activation function. Defaults to 'relu'.

  • weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to 'kaiming_uniform'.

Initialize an instance of GaussianLearningActor.

_distribution(obs)[source]#

Get the distribution of the actor.

Warning

This method is not supposed to be called by users. You should call forward() instead.

Parameters:

obs (torch.Tensor) – Observation from environments.

Returns:

The normal distribution of the mean and standard deviation from the actor.

Return type:

Normal

forward(obs)[source]#

Forward method.

Parameters:

obs (torch.Tensor) – Observation from environments.

Returns:

The current distribution.

Return type:

Distribution

log_prob(act)[source]#

Compute the log probability of the action given the current distribution.

Warning

You must call forward() or predict() before calling this method.

Parameters:

act (torch.Tensor) – Action from predict() or forward() .

Returns:

Log probability of the action.

Return type:

Tensor

predict(obs, deterministic=False)[source]#

Predict the action given observation.

The predicted action depends on the deterministic flag.

  • If deterministic is True, the predicted action is the mean of the distribution.

  • If deterministic is False, the predicted action is sampled from the distribution.

Parameters:
  • obs (torch.Tensor) – Observation from environments.

  • deterministic (bool, optional) – Whether to use deterministic policy. Defaults to False.

Returns:

The mean of the distribution if deterministic is True, otherwise the sampled action.

Return type:

Tensor

property std: float#

Standard deviation of the distribution.

Gaussian SAC Actor#

Documentation

class omnisafe.models.actor.GaussianSACActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

Implementation of GaussianSACActor.

GaussianSACActor is a Gaussian actor with a learnable standard deviation network. It is used in SAC, and other offline or model-based algorithms related to SAC.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (list of int) – List of hidden layer sizes.

  • activation (Activation, optional) – Activation function. Defaults to 'relu'.

  • weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to 'kaiming_uniform'.

Initialize an instance of GaussianSACActor.

_distribution(obs)[source]#

Get the distribution of the actor.

Warning

This method is not supposed to be called by users. You should call forward() instead.

Specifically, this method will clip the standard deviation to a range of [-20, 2].

Parameters:

obs (torch.Tensor) – Observation from environments.

Returns:

The normal distribution of the mean and standard deviation from the actor.

Return type:

Normal

forward(obs)[source]#

Forward method.

Parameters:

obs (torch.Tensor) – Observation from environments.

Returns:

The current distribution.

Return type:

TanhNormal

log_prob(act)[source]#

Compute the log probability of the action given the current distribution.

Warning

You must call forward() or predict() before calling this method.

Note

In this method, we will regularize the log probability of the action. The regularization is as follows:

(6)#\[\log prob = \log \pi (a|s) - \sum_{i=1}^n (2 \log 2 - a_i - \log (1 + e^{-2 a_i}))\]

where \(a\) is the action, \(s\) is the observation, and \(n\) is the dimension of the action.

Parameters:

act (torch.Tensor) – Action from predict() or forward().

Returns:

Log probability of the action.

Return type:

Tensor

predict(obs, deterministic=False)[source]#

Predict the action given observation.

The predicted action depends on the deterministic flag.

  • If deterministic is True, the predicted action is the mean of the distribution.

  • If deterministic is False, the predicted action is sampled from the distribution.

Parameters:
  • obs (torch.Tensor) – Observation from environments.

  • deterministic (bool, optional) – Whether to use deterministic policy. Defaults to False.

Returns:

The mean of the distribution if deterministic is True, otherwise the sampled action.

Return type:

Tensor

property std: float#

Standard deviation of the distribution.

Perturbation Actor#

Documentation

class omnisafe.models.actor.PerturbationActor(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

Class for Perturbation Actor.

Perturbation Actor is used in offline algorithms such as BCQ and so on. Perturbation Actor is a combination of VAE and a perturbation network, algorithm BCQ uses the perturbation network to perturb the action predicted by VAE, which trained like behavior cloning.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (list) – List of hidden layer sizes.

  • latent_dim (Optional[int]) – Latent dimension, if None, latent_dim = act_dim * 2.

  • activation (Activation) – Activation function.

  • weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to 'kaiming_uniform'.

Initialize an instance of PerturbationActor.

_distribution(obs)[source]#

Return the distribution of action.

An actor generates a distribution, which is used to sample actions during training. When training, the mean and the variance of the distribution are used to calculate the loss. When testing, the mean of the distribution is used directly as actions.

For example, if the action is continuous, the actor can generate a Gaussian distribution.

(8)#\[p (a | s) = N (\mu (s), \sigma (s))\]

where \(\mu (s)\) and \(\sigma (s)\) are the mean and standard deviation of the distribution.

Warning

The distribution is a private method, which is only used to sample actions during training. You should not use it directly in your code, instead, you should use the public method predict() to sample actions.

Parameters:

obs (torch.Tensor) – Observation from environments.

Returns:

The distribution of action.

Return type:

Distribution

forward(obs)[source]#

Forward is not used in this method, it is just for compatibility.

Return type:

Distribution

log_prob(act)[source]#

log_prob is not used in this method, it is just for compatibility.

Return type:

Tensor

property phi: float#

Return phi, which is the maximum perturbation.

predict(obs, deterministic=False)[source]#

Predict action from observation.

deterministic is not used in this method, it is just for compatibility.

Parameters:
  • obs (torch.Tensor) – Observation.

  • deterministic (bool, optional) – Whether to return deterministic action. Defaults to False.

Returns:

torch.Tensor – Action.

Return type:

Tensor

VAE Actor#

Documentation

class omnisafe.models.actor.VAE(obs_space, act_space, hidden_sizes, activation='relu', weight_initialization_mode='kaiming_uniform')[source]#

Class for VAE.

VAE is a variational auto-encoder. It is used in offline algorithms such as BCQ and so on.

Parameters:
  • obs_space (OmnisafeSpace) – Observation space.

  • act_space (OmnisafeSpace) – Action space.

  • hidden_sizes (list) – List of hidden layer sizes.

  • latent_dim (Optional[int]) – Latent dimension, if None, latent_dim = act_dim * 2.

  • activation (Activation) – Activation function.

  • weight_initialization_mode (InitFunction, optional) – Weight initialization mode. Defaults to 'kaiming_uniform'.

Initialize an instance of VAE.

_distribution(obs)[source]#

Return the distribution of action.

An actor generates a distribution, which is used to sample actions during training. When training, the mean and the variance of the distribution are used to calculate the loss. When testing, the mean of the distribution is used directly as actions.

For example, if the action is continuous, the actor can generate a Gaussian distribution.

(10)#\[p (a | s) = N (\mu (s), \sigma (s))\]

where \(\mu (s)\) and \(\sigma (s)\) are the mean and standard deviation of the distribution.

Warning

The distribution is a private method, which is only used to sample actions during training. You should not use it directly in your code, instead, you should use the public method predict() to sample actions.

Parameters:

obs (torch.Tensor) – Observation from environments.

Returns:

The distribution of action.

Return type:

Distribution

decode(obs, latent=None)[source]#

Decode latent vector to action.

When latent is None, sample latent vector from standard normal distribution.

Parameters:
  • obs (torch.Tensor) – Observation.

  • latent (Optional[torch.Tensor], optional) – Latent vector. Defaults to None.

Returns:

torch.Tensor – Action.

Return type:

Tensor

encode(obs, act)[source]#

Encode observation to latent distribution.

Parameters:
  • obs (torch.Tensor) – Observation.

  • act (torch.Tensor) – Action from predict() or forward() .

Returns:

Normal – Latent distribution.

Return type:

Normal

forward(obs)[source]#

Forward is not used in this method, it is just for compatibility.

Return type:

Distribution

log_prob(act)[source]#

log_prob is not used in this method, it is just for compatibility.

Return type:

Tensor

loss(obs, act)[source]#

Compute loss for VAE.

Parameters:
  • obs (torch.Tensor) – Observation.

  • act (torch.Tensor) – Action from predict() or forward() .

Return type:

Tuple[Tensor, Tensor]

predict(obs, deterministic=False)[source]#

Predict the action given observation.

deterministic if not used in VAE model. VAE actor’s default behavior is stochastic, sampling from the latent standard normal distribution.

Parameters:
  • obs (torch.Tensor) – Observation from environments.

  • deterministic (bool, optional) – Whether to use deterministic policy. Defaults to False.

Returns:

torch.Tensor – Predicted action.

Return type:

Tensor