OmniSafe Actor Critic#

`ActorCritic`(obs_space, act_space, ...)	Class for ActorCritic.
`ActorQCritic`(obs_space, act_space, ...)	Class for ActorQCritic.
`ConstraintActorCritic`(obs_space, act_space, ...)	ConstraintActorCritic is a wrapper around ActorCritic that adds a cost critic to the model.
`ConstraintActorQCritic`(obs_space, act_space, ...)	ConstraintActorQCritic is a wrapper around ActorCritic that adds a cost critic to the model.

Actor Critic#

Documentation

class omnisafe.models.actor_critic.ActorCritic(obs_space, act_space, model_cfgs, epochs)[source]#

Class for ActorCritic.

In OmniSafe, we combine the actor and critic into one this class.

Model	Description
Actor	Input is observation. Output is action.
Reward V Critic	Input is observation. Output is reward value.

Parameters:

obs_space (OmnisafeSpace) – The observation space.
act_space (OmnisafeSpace) – The action space.
model_cfgs (ModelConfig) – The model configurations.
epochs (int) – The number of epochs.

Variables:

actor (Actor) – The actor network.
reward_critic (Critic) – The critic network.
std_schedule (Schedule) – The schedule for the standard deviation of the Gaussian distribution.

Initialize an instance of ActorCritic.

annealing(epoch)[source]#

Set the annealing mode for the actor.

Parameters:: epoch (int) – The current epoch.
Return type:: None

forward(obs, deterministic=False)[source]#

Choose the action based on the observation. used in training with gradient.

Parameters:

obs (torch.tensor) – The observation from environments.
deterministic (bool, optional) – Whether to use deterministic action. Defaults to False.

Returns:

action – The deterministic action if deterministic is True, otherwise the action with Gaussian noise.
value_r – The reward value of the observation.
log_prob – The log probability of the action.

Return type:

tuple[Tensor, ...]

set_annealing(epochs, std)[source]#

Set the annealing mode for the actor.

Parameters:

epochs (list of int) – The list of epochs.
std (list of float) – The list of standard deviation.

Return type:

None

step(obs, deterministic=False)[source]#

Choose the action based on the observation. used in rollout without gradient.

Parameters:

obs (torch.tensor) – The observation from environments.
deterministic (bool, optional) – Whether to use deterministic action. Defaults to False.

Returns:

action – The deterministic action if deterministic is True, otherwise the action with Gaussian noise.
value_r – The reward value of the observation.
log_prob – The log probability of the action.

Return type:

tuple[Tensor, ...]

Actor Q Critic#

Documentation

class omnisafe.models.actor_critic.ActorQCritic(obs_space, act_space, model_cfgs, epochs)[source]#

Class for ActorQCritic.

In OmniSafe, we combine the actor and critic into one this class.

Model	Description
Actor	Input is observation. Output is action.
Reward Q Critic	Input is obs-action pair. Output is reward value.

Parameters:

obs_space (OmnisafeSpace) – The observation space.
act_space (OmnisafeSpace) – The action space.
model_cfgs (ModelConfig) – The model configurations.
epochs (int) – The number of epochs.

Variables:

actor (Actor) – The actor network.
target_actor (Actor) – The target actor network.
reward_critic (Critic) – The critic network.
target_reward_critic (Critic) – The target critic network.
actor_optimizer (Optimizer) – The optimizer for the actor network.
reward_critic_optimizer (Optimizer) – The optimizer for the critic network.
std_schedule (Schedule) – The schedule for the standard deviation of the Gaussian distribution.

Initialize an instance of ActorQCritic.

forward(obs, deterministic=False)[source]#

Choose the action based on the observation. used in training with gradient.

Parameters:

obs (torch.tensor) – The observation from environments.
deterministic (bool, optional) – Whether to use deterministic action. Defaults to False.

Returns:

The deterministic action if deterministic is True.
Action with noise other wise.

Return type:

Tensor

polyak_update(tau)[source]#

Update the target network with polyak averaging.

Parameters:: tau (float) – The polyak averaging factor.
Return type:: None

step(obs, deterministic=False)[source]#

Choose the action based on the observation. used in rollout without gradient.

Parameters:

obs (torch.tensor) – The observation from environments.
deterministic (bool, optional) – Whether to use deterministic action. Defaults to False.

Returns:

The deterministic action if deterministic is True.
Action with noise other wise.

Return type:

Tensor

Constraint Actor Critic#

Documentation

class omnisafe.models.actor_critic.ConstraintActorCritic(obs_space, act_space, model_cfgs, epochs)[source]#

ConstraintActorCritic is a wrapper around ActorCritic that adds a cost critic to the model.

In OmniSafe, we combine the actor and critic into one this class.

Model	Description
Actor	Input is observation. Output is action.
Reward V Critic	Input is observation. Output is reward value.
Cost V Critic	Input is observation. Output is cost value.

Parameters:

obs_space (OmnisafeSpace) – The observation space.
act_space (OmnisafeSpace) – The action space.
model_cfgs (ModelConfig) – The model configurations.
epochs (int) – The number of epochs.

Variables:

actor (Actor) – The actor network.
reward_critic (Critic) – The critic network.
cost_critic (Critic) – The critic network.
std_schedule (Schedule) – The schedule for the standard deviation of the Gaussian distribution.

Initialize an instance of ConstraintActorCritic.

forward(obs, deterministic=False)[source]#

Choose action based on observation.

Parameters:

obs (torch.Tensor) – Observation from environments.
deterministic (bool, optional) – Whether to use deterministic policy. Defaults to False.

Returns:

action – The deterministic action if deterministic is True, otherwise the action with Gaussian noise.
value_r – The reward value of the observation.
value_c – The cost value of the observation.
log_prob – The log probability of the action.

Return type:

tuple[Tensor, ...]

step(obs, deterministic=False)[source]#

Choose action based on observation.

Parameters:

obs (torch.Tensor) – Observation from environments.
deterministic (bool, optional) – Whether to use deterministic policy. Defaults to False.

Returns:

action – The deterministic action if deterministic is True, otherwise the action with Gaussian noise.
value_r – The reward value of the observation.
value_c – The cost value of the observation.
log_prob – The log probability of the action.

Return type:

tuple[Tensor, ...]

Constraint Actor Q Critic#

Documentation

class omnisafe.models.actor_critic.ConstraintActorQCritic(obs_space, act_space, model_cfgs, epochs)[source]#

ConstraintActorQCritic is a wrapper around ActorCritic that adds a cost critic to the model.

In OmniSafe, we combine the actor and critic into one this class.

Model	Description
Actor	Input is observation. Output is action.
Reward Q Critic	Input is obs-action pair, Output is reward value.
Cost Q Critic	Input is obs-action pair. Output is cost value.

Parameters:

obs_space (OmnisafeSpace) – The observation space.
act_space (OmnisafeSpace) – The action space.
model_cfgs (ModelConfig) – The model configurations.
epochs (int) – The number of epochs.

Variables:

actor (Actor) – The actor network.
target_actor (Actor) – The target actor network.
reward_critic (Critic) – The critic network.
target_reward_critic (Critic) – The target critic network.
cost_critic (Critic) – The critic network.
target_cost_critic (Critic) – The target critic network.
actor_optimizer (Optimizer) – The optimizer for the actor network.
reward_critic_optimizer (Optimizer) – The optimizer for the critic network.
std_schedule (Schedule) – The schedule for the standard deviation of the Gaussian distribution.

Initialize an instance of ConstraintActorQCritic.

polyak_update(tau)[source]#

Update the target network with polyak averaging.

Parameters:: tau (float) – The polyak averaging factor.
Return type:: None