OmniSafe Actor Critic#

ActorCritic(obs_space, act_space, ...)

Class for ActorCritic.

ActorQCritic(obs_space, act_space, ...)

Class for ActorQCritic.

ConstraintActorCritic(obs_space, act_space, ...)

ConstraintActorCritic is a wrapper around ActorCritic that adds a cost critic to the model.

ConstraintActorQCritic(obs_space, act_space, ...)

ConstraintActorQCritic is a wrapper around ActorCritic that adds a cost critic to the model.

Actor Critic#

Documentation

class omnisafe.models.actor_critic.ActorCritic(obs_space, act_space, model_cfgs, epochs)[source]#

Class for ActorCritic.

In OmniSafe, we combine the actor and critic into one this class.

Model

Description

Actor

Input is observation. Output is action.

Reward V Critic

Input is observation. Output is reward value.

Parameters:
  • obs_space (OmnisafeSpace) – The observation space.

  • act_space (OmnisafeSpace) – The action space.

  • model_cfgs (ModelConfig) – The model configurations.

  • epochs (int) – The number of epochs.

Variables:
  • actor (Actor) – The actor network.

  • reward_critic (Critic) – The critic network.

  • std_schedule (Schedule) – The schedule for the standard deviation of the Gaussian distribution.

Initialize an instance of ActorCritic.

annealing(epoch)[source]#

Set the annealing mode for the actor.

Parameters:

epoch (int) – The current epoch.

Return type:

None

forward(obs, deterministic=False)[source]#

Choose the action based on the observation. used in training with gradient.

Parameters:
  • obs (torch.tensor) – The observation from environments.

  • deterministic (bool, optional) – Whether to use deterministic action. Defaults to False.

Returns:
  • action – The deterministic action if deterministic is True, otherwise the action with Gaussian noise.

  • value_r – The reward value of the observation.

  • log_prob – The log probability of the action.

Return type:

tuple[Tensor, ...]

set_annealing(epochs, std)[source]#

Set the annealing mode for the actor.

Parameters:
  • epochs (list of int) – The list of epochs.

  • std (list of float) – The list of standard deviation.

Return type:

None

step(obs, deterministic=False)[source]#

Choose the action based on the observation. used in rollout without gradient.

Parameters:
  • obs (torch.tensor) – The observation from environments.

  • deterministic (bool, optional) – Whether to use deterministic action. Defaults to False.

Returns:
  • action – The deterministic action if deterministic is True, otherwise the action with Gaussian noise.

  • value_r – The reward value of the observation.

  • log_prob – The log probability of the action.

Return type:

tuple[Tensor, ...]

Actor Q Critic#

Documentation

class omnisafe.models.actor_critic.ActorQCritic(obs_space, act_space, model_cfgs, epochs)[source]#

Class for ActorQCritic.

In OmniSafe, we combine the actor and critic into one this class.

Model

Description

Actor

Input is observation. Output is action.

Reward Q Critic

Input is obs-action pair. Output is reward value.

Parameters:
  • obs_space (OmnisafeSpace) – The observation space.

  • act_space (OmnisafeSpace) – The action space.

  • model_cfgs (ModelConfig) – The model configurations.

  • epochs (int) – The number of epochs.

Variables:
  • actor (Actor) – The actor network.

  • target_actor (Actor) – The target actor network.

  • reward_critic (Critic) – The critic network.

  • target_reward_critic (Critic) – The target critic network.

  • actor_optimizer (Optimizer) – The optimizer for the actor network.

  • reward_critic_optimizer (Optimizer) – The optimizer for the critic network.

  • std_schedule (Schedule) – The schedule for the standard deviation of the Gaussian distribution.

Initialize an instance of ActorQCritic.

forward(obs, deterministic=False)[source]#

Choose the action based on the observation. used in training with gradient.

Parameters:
  • obs (torch.tensor) – The observation from environments.

  • deterministic (bool, optional) – Whether to use deterministic action. Defaults to False.

Returns:
  • The deterministic action if deterministic is True.

  • Action with noise other wise.

Return type:

Tensor

polyak_update(tau)[source]#

Update the target network with polyak averaging.

Parameters:

tau (float) – The polyak averaging factor.

Return type:

None

step(obs, deterministic=False)[source]#

Choose the action based on the observation. used in rollout without gradient.

Parameters:
  • obs (torch.tensor) – The observation from environments.

  • deterministic (bool, optional) – Whether to use deterministic action. Defaults to False.

Returns:
  • The deterministic action if deterministic is True.

  • Action with noise other wise.

Return type:

Tensor

Constraint Actor Critic#

Documentation

class omnisafe.models.actor_critic.ConstraintActorCritic(obs_space, act_space, model_cfgs, epochs)[source]#

ConstraintActorCritic is a wrapper around ActorCritic that adds a cost critic to the model.

In OmniSafe, we combine the actor and critic into one this class.

Model

Description

Actor

Input is observation. Output is action.

Reward V Critic

Input is observation. Output is reward value.

Cost V Critic

Input is observation. Output is cost value.

Parameters:
  • obs_space (OmnisafeSpace) – The observation space.

  • act_space (OmnisafeSpace) – The action space.

  • model_cfgs (ModelConfig) – The model configurations.

  • epochs (int) – The number of epochs.

Variables:
  • actor (Actor) – The actor network.

  • reward_critic (Critic) – The critic network.

  • cost_critic (Critic) – The critic network.

  • std_schedule (Schedule) – The schedule for the standard deviation of the Gaussian distribution.

Initialize an instance of ConstraintActorCritic.

forward(obs, deterministic=False)[source]#

Choose action based on observation.

Parameters:
  • obs (torch.Tensor) – Observation from environments.

  • deterministic (bool, optional) – Whether to use deterministic policy. Defaults to False.

Returns:
  • action – The deterministic action if deterministic is True, otherwise the action with Gaussian noise.

  • value_r – The reward value of the observation.

  • value_c – The cost value of the observation.

  • log_prob – The log probability of the action.

Return type:

tuple[Tensor, ...]

step(obs, deterministic=False)[source]#

Choose action based on observation.

Parameters:
  • obs (torch.Tensor) – Observation from environments.

  • deterministic (bool, optional) – Whether to use deterministic policy. Defaults to False.

Returns:
  • action – The deterministic action if deterministic is True, otherwise the action with Gaussian noise.

  • value_r – The reward value of the observation.

  • value_c – The cost value of the observation.

  • log_prob – The log probability of the action.

Return type:

tuple[Tensor, ...]

Constraint Actor Q Critic#

Documentation

class omnisafe.models.actor_critic.ConstraintActorQCritic(obs_space, act_space, model_cfgs, epochs)[source]#

ConstraintActorQCritic is a wrapper around ActorCritic that adds a cost critic to the model.

In OmniSafe, we combine the actor and critic into one this class.

Model

Description

Actor

Input is observation. Output is action.

Reward Q Critic

Input is obs-action pair, Output is reward value.

Cost Q Critic

Input is obs-action pair. Output is cost value.

Parameters:
  • obs_space (OmnisafeSpace) – The observation space.

  • act_space (OmnisafeSpace) – The action space.

  • model_cfgs (ModelConfig) – The model configurations.

  • epochs (int) – The number of epochs.

Variables:
  • actor (Actor) – The actor network.

  • target_actor (Actor) – The target actor network.

  • reward_critic (Critic) – The critic network.

  • target_reward_critic (Critic) – The target critic network.

  • cost_critic (Critic) – The critic network.

  • target_cost_critic (Critic) – The target critic network.

  • actor_optimizer (Optimizer) – The optimizer for the actor network.

  • reward_critic_optimizer (Optimizer) – The optimizer for the critic network.

  • std_schedule (Schedule) – The schedule for the standard deviation of the Gaussian distribution.

Initialize an instance of ConstraintActorQCritic.

polyak_update(tau)[source]#

Update the target network with polyak averaging.

Parameters:

tau (float) – The polyak averaging factor.

Return type:

None