OmniSafe Actor Critic#
|
Class for ActorCritic. |
|
Class for ActorQCritic. |
|
ConstraintActorCritic is a wrapper around ActorCritic that adds a cost critic to the model. |
|
ConstraintActorQCritic is a wrapper around ActorCritic that adds a cost critic to the model. |
Actor Critic#
Documentation
- class omnisafe.models.actor_critic.ActorCritic(obs_space, act_space, model_cfgs, epochs)[source]#
Class for ActorCritic.
In OmniSafe, we combine the actor and critic into one this class.
Model
Description
Actor
Input is observation. Output is action.
Reward V Critic
Input is observation. Output is reward value.
- Parameters:
obs_space (OmnisafeSpace) – The observation space.
act_space (OmnisafeSpace) – The action space.
model_cfgs (ModelConfig) – The model configurations.
epochs (int) – The number of epochs.
- Variables:
Initialize an instance of
ActorCritic
.- annealing(epoch)[source]#
Set the annealing mode for the actor.
- Parameters:
epoch (int) – The current epoch.
- Return type:
None
- forward(obs, deterministic=False)[source]#
Choose the action based on the observation. used in training with gradient.
- Parameters:
obs (torch.tensor) – The observation from environments.
deterministic (bool, optional) – Whether to use deterministic action. Defaults to False.
- Returns:
action – The deterministic action if
deterministic
is True, otherwise the action with Gaussian noise.value_r – The reward value of the observation.
log_prob – The log probability of the action.
- Return type:
tuple
[Tensor
,...
]
- set_annealing(epochs, std)[source]#
Set the annealing mode for the actor.
- Parameters:
epochs (list of int) – The list of epochs.
std (list of float) – The list of standard deviation.
- Return type:
None
- step(obs, deterministic=False)[source]#
Choose the action based on the observation. used in rollout without gradient.
- Parameters:
obs (torch.tensor) – The observation from environments.
deterministic (bool, optional) – Whether to use deterministic action. Defaults to False.
- Returns:
action – The deterministic action if
deterministic
is True, otherwise the action with Gaussian noise.value_r – The reward value of the observation.
log_prob – The log probability of the action.
- Return type:
tuple
[Tensor
,...
]
Actor Q Critic#
Documentation
- class omnisafe.models.actor_critic.ActorQCritic(obs_space, act_space, model_cfgs, epochs)[source]#
Class for ActorQCritic.
In OmniSafe, we combine the actor and critic into one this class.
Model
Description
Actor
Input is observation. Output is action.
Reward Q Critic
Input is obs-action pair. Output is reward value.
- Parameters:
obs_space (OmnisafeSpace) – The observation space.
act_space (OmnisafeSpace) – The action space.
model_cfgs (ModelConfig) – The model configurations.
epochs (int) – The number of epochs.
- Variables:
actor (Actor) – The actor network.
target_actor (Actor) – The target actor network.
reward_critic (Critic) – The critic network.
target_reward_critic (Critic) – The target critic network.
actor_optimizer (Optimizer) – The optimizer for the actor network.
reward_critic_optimizer (Optimizer) – The optimizer for the critic network.
std_schedule (Schedule) – The schedule for the standard deviation of the Gaussian distribution.
Initialize an instance of
ActorQCritic
.- forward(obs, deterministic=False)[source]#
Choose the action based on the observation. used in training with gradient.
- Parameters:
obs (torch.tensor) – The observation from environments.
deterministic (bool, optional) – Whether to use deterministic action. Defaults to False.
- Returns:
The deterministic action if deterministic is True.
Action with noise other wise.
- Return type:
Tensor
- polyak_update(tau)[source]#
Update the target network with polyak averaging.
- Parameters:
tau (float) – The polyak averaging factor.
- Return type:
None
- step(obs, deterministic=False)[source]#
Choose the action based on the observation. used in rollout without gradient.
- Parameters:
obs (torch.tensor) – The observation from environments.
deterministic (bool, optional) – Whether to use deterministic action. Defaults to False.
- Returns:
The deterministic action if deterministic is True.
Action with noise other wise.
- Return type:
Tensor
Constraint Actor Critic#
Documentation
- class omnisafe.models.actor_critic.ConstraintActorCritic(obs_space, act_space, model_cfgs, epochs)[source]#
ConstraintActorCritic is a wrapper around ActorCritic that adds a cost critic to the model.
In OmniSafe, we combine the actor and critic into one this class.
Model
Description
Actor
Input is observation. Output is action.
Reward V Critic
Input is observation. Output is reward value.
Cost V Critic
Input is observation. Output is cost value.
- Parameters:
obs_space (OmnisafeSpace) – The observation space.
act_space (OmnisafeSpace) – The action space.
model_cfgs (ModelConfig) – The model configurations.
epochs (int) – The number of epochs.
- Variables:
Initialize an instance of
ConstraintActorCritic
.- forward(obs, deterministic=False)[source]#
Choose action based on observation.
- Parameters:
obs (torch.Tensor) – Observation from environments.
deterministic (bool, optional) – Whether to use deterministic policy. Defaults to False.
- Returns:
action – The deterministic action if
deterministic
is True, otherwise the action with Gaussian noise.value_r – The reward value of the observation.
value_c – The cost value of the observation.
log_prob – The log probability of the action.
- Return type:
tuple
[Tensor
,...
]
- step(obs, deterministic=False)[source]#
Choose action based on observation.
- Parameters:
obs (torch.Tensor) – Observation from environments.
deterministic (bool, optional) – Whether to use deterministic policy. Defaults to False.
- Returns:
action – The deterministic action if
deterministic
is True, otherwise the action with Gaussian noise.value_r – The reward value of the observation.
value_c – The cost value of the observation.
log_prob – The log probability of the action.
- Return type:
tuple
[Tensor
,...
]
Constraint Actor Q Critic#
Documentation
- class omnisafe.models.actor_critic.ConstraintActorQCritic(obs_space, act_space, model_cfgs, epochs)[source]#
ConstraintActorQCritic is a wrapper around ActorCritic that adds a cost critic to the model.
In OmniSafe, we combine the actor and critic into one this class.
Model
Description
Actor
Input is observation. Output is action.
Reward Q Critic
Input is obs-action pair, Output is reward value.
Cost Q Critic
Input is obs-action pair. Output is cost value.
- Parameters:
obs_space (OmnisafeSpace) – The observation space.
act_space (OmnisafeSpace) – The action space.
model_cfgs (ModelConfig) – The model configurations.
epochs (int) – The number of epochs.
- Variables:
actor (Actor) – The actor network.
target_actor (Actor) – The target actor network.
reward_critic (Critic) – The critic network.
target_reward_critic (Critic) – The target critic network.
cost_critic (Critic) – The critic network.
target_cost_critic (Critic) – The target critic network.
actor_optimizer (Optimizer) – The optimizer for the actor network.
reward_critic_optimizer (Optimizer) – The optimizer for the critic network.
std_schedule (Schedule) – The schedule for the standard deviation of the Gaussian distribution.
Initialize an instance of
ConstraintActorQCritic
.