OmniSafe Lagrange Multiplier#
|
Base class for Lagrangian-base Algorithms. |
Lagrange Multiplier#
Documentation
- class omnisafe.common.lagrange.Lagrange(cost_limit, lagrangian_multiplier_init, lambda_lr, lambda_optimizer, lagrangian_upper_bound=None)[source]#
Base class for Lagrangian-base Algorithms.
This class implements the Lagrange multiplier update and the Lagrange loss.
Note
Any traditional policy gradient algorithm can be converted to a Lagrangian-based algorithm by inheriting from this class and implementing the
_loss_pi()
method.Examples
>>> from omnisafe.common.lagrange import Lagrange >>> def loss_pi(self, data): ... # implement your own loss function here ... return loss
You can also inherit this class to implement your own Lagrangian-based algorithm, with any policy gradient method you like in OmniSafe.
Examples
>>> from omnisafe.common.lagrange import Lagrange >>> class CustomAlgo: ... def __init(self) -> None: ... # initialize your own algorithm here ... super().__init__() ... # initialize the Lagrange multiplier ... self.lagrange = Lagrange(**self._cfgs.lagrange_cfgs)
- Parameters:
cost_limit (float) – The cost limit.
lagrangian_multiplier_init (float) – The initial value of the Lagrange multiplier.
lambda_lr (float) – The learning rate of the Lagrange multiplier.
lambda_optimizer (str) – The optimizer for the Lagrange multiplier.
lagrangian_upper_bound (float or None, optional) – The upper bound of the Lagrange multiplier. Defaults to None.
- Variables:
cost_limit (float) – The cost limit.
lambda_lr (float) – The learning rate of the Lagrange multiplier.
lagrangian_upper_bound (float, optional) – The upper bound of the Lagrange multiplier. Defaults to None.
lagrangian_multiplier (torch.nn.Parameter) – The Lagrange multiplier.
lambda_range_projection (torch.nn.ReLU) – The projection function for the Lagrange multiplier.
Initialize an instance of
Lagrange
.- compute_lambda_loss(mean_ep_cost)[source]#
Penalty loss for Lagrange multiplier.
Note
mean_ep_cost
is obtained fromself.logger.get_stats('EpCosts')[0]
, which is already averaged across MPI processes.- Parameters:
mean_ep_cost (float) – mean episode cost.
- Returns:
Penalty loss for Lagrange multiplier.
- Return type:
Tensor
- update_lagrange_multiplier(Jc)[source]#
Update Lagrange multiplier (lambda).
We update the Lagrange multiplier by minimizing the penalty loss, which is defined as:
(2)#\[\lambda ^{'} = \lambda + \eta \cdot (J_C - J_C^*)\]where \(\lambda\) is the Lagrange multiplier, \(\eta\) is the learning rate, \(J_C\) is the mean episode cost, and \(J_C^*\) is the cost limit.
- Parameters:
Jc (float) – mean episode cost.
- Return type:
None
|
PID version of Lagrangian. |
PIDLagrange#
Documentation
- class omnisafe.common.pid_lagrange.PIDLagrangian(pid_kp, pid_ki, pid_kd, pid_d_delay, pid_delta_p_ema_alpha, pid_delta_d_ema_alpha, sum_norm, diff_norm, penalty_max, lagrangian_multiplier_init, cost_limit)[source]#
PID version of Lagrangian.
Similar to the
Lagrange
module, this module implements the PID version of the lagrangian method.Note
The PID-Lagrange is more general than the Lagrange, and can be used in any policy gradient algorithm. As PID_Lagrange use the PID controller to control the lagrangian multiplier, it is more stable than the naive Lagrange.
- Parameters:
pid_kp (float) – The proportional gain of the PID controller.
pid_ki (float) – The integral gain of the PID controller.
pid_kd (float) – The derivative gain of the PID controller.
pid_d_delay (int) – The delay of the derivative term.
pid_delta_p_ema_alpha (float) – The exponential moving average alpha of the delta_p.
pid_delta_d_ema_alpha (float) – The exponential moving average alpha of the delta_d.
sum_norm (bool) – Whether to use the sum norm.
diff_norm (bool) – Whether to use the diff norm.
penalty_max (int) – The maximum penalty.
lagrangian_multiplier_init (float) – The initial value of the lagrangian multiplier.
cost_limit (float) – The cost limit.
References
Title: Responsive Safety in Reinforcement Learning by PID Lagrangian Methods
Authors: Adam Stooke, Joshua Achiam, Pieter Abbeel.
URL: PID Lagrange
Initialize an instance of
PIDLagrangian
.- property lagrangian_multiplier: float#
The lagrangian multiplier.
- pid_update(ep_cost_avg)[source]#
Update the PID controller.
PID controller update the lagrangian multiplier following the next equation:
(4)#\[\lambda_{t+1} = \lambda_t + (K_p e_p + K_i \int e_p dt + K_d \frac{d e_p}{d t}) \eta\]where \(e_p\) is the error between the current episode cost and the cost limit, \(K_p\), \(K_i\), \(K_d\) are the PID parameters, and \(\eta\) is the learning rate.
- Parameters:
ep_cost_avg (float) – The average cost of the current episode.
- Return type:
None