OmniSafe Lagrange Multiplier#

Lagrange(cost_limit, ...[, ...])

Base class for Lagrangian-base Algorithms.

Lagrange Multiplier#

Documentation

class omnisafe.common.lagrange.Lagrange(cost_limit, lagrangian_multiplier_init, lambda_lr, lambda_optimizer, lagrangian_upper_bound=None)[source]#

Base class for Lagrangian-base Algorithms.

This class implements the Lagrange multiplier update and the Lagrange loss.

Note

Any traditional policy gradient algorithm can be converted to a Lagrangian-based algorithm by inheriting from this class and implementing the _loss_pi() method.

Examples

>>> from omnisafe.common.lagrange import Lagrange
>>> def loss_pi(self, data):
...     # implement your own loss function here
...     return loss

You can also inherit this class to implement your own Lagrangian-based algorithm, with any policy gradient method you like in OmniSafe.

Examples

>>> from omnisafe.common.lagrange import Lagrange
>>> class CustomAlgo:
...     def __init(self) -> None:
...         # initialize your own algorithm here
...         super().__init__()
...         # initialize the Lagrange multiplier
...         self.lagrange = Lagrange(**self._cfgs.lagrange_cfgs)
Parameters:
  • cost_limit (float) – The cost limit.

  • lagrangian_multiplier_init (float) – The initial value of the Lagrange multiplier.

  • lambda_lr (float) – The learning rate of the Lagrange multiplier.

  • lambda_optimizer (str) – The optimizer for the Lagrange multiplier.

  • lagrangian_upper_bound (float or None, optional) – The upper bound of the Lagrange multiplier. Defaults to None.

Variables:
  • cost_limit (float) – The cost limit.

  • lambda_lr (float) – The learning rate of the Lagrange multiplier.

  • lagrangian_upper_bound (float, optional) – The upper bound of the Lagrange multiplier. Defaults to None.

  • lagrangian_multiplier (torch.nn.Parameter) – The Lagrange multiplier.

  • lambda_range_projection (torch.nn.ReLU) – The projection function for the Lagrange multiplier.

Initialize an instance of Lagrange.

compute_lambda_loss(mean_ep_cost)[source]#

Penalty loss for Lagrange multiplier.

Note

mean_ep_cost is obtained from self.logger.get_stats('EpCosts')[0], which is already averaged across MPI processes.

Parameters:

mean_ep_cost (float) – mean episode cost.

Returns:

Penalty loss for Lagrange multiplier.

Return type:

Tensor

update_lagrange_multiplier(Jc)[source]#

Update Lagrange multiplier (lambda).

We update the Lagrange multiplier by minimizing the penalty loss, which is defined as:

(2)#\[\lambda ^{'} = \lambda + \eta \cdot (J_C - J_C^*)\]

where \(\lambda\) is the Lagrange multiplier, \(\eta\) is the learning rate, \(J_C\) is the mean episode cost, and \(J_C^*\) is the cost limit.

Parameters:

Jc (float) – mean episode cost.

Return type:

None

PIDLagrangian(pid_kp, pid_ki, pid_kd, ...)

PID version of Lagrangian.

PIDLagrange#

Documentation

class omnisafe.common.pid_lagrange.PIDLagrangian(pid_kp, pid_ki, pid_kd, pid_d_delay, pid_delta_p_ema_alpha, pid_delta_d_ema_alpha, sum_norm, diff_norm, penalty_max, lagrangian_multiplier_init, cost_limit)[source]#

PID version of Lagrangian.

Similar to the Lagrange module, this module implements the PID version of the lagrangian method.

Note

The PID-Lagrange is more general than the Lagrange, and can be used in any policy gradient algorithm. As PID_Lagrange use the PID controller to control the lagrangian multiplier, it is more stable than the naive Lagrange.

Parameters:
  • pid_kp (float) – The proportional gain of the PID controller.

  • pid_ki (float) – The integral gain of the PID controller.

  • pid_kd (float) – The derivative gain of the PID controller.

  • pid_d_delay (int) – The delay of the derivative term.

  • pid_delta_p_ema_alpha (float) – The exponential moving average alpha of the delta_p.

  • pid_delta_d_ema_alpha (float) – The exponential moving average alpha of the delta_d.

  • sum_norm (bool) – Whether to use the sum norm.

  • diff_norm (bool) – Whether to use the diff norm.

  • penalty_max (int) – The maximum penalty.

  • lagrangian_multiplier_init (float) – The initial value of the lagrangian multiplier.

  • cost_limit (float) – The cost limit.

References

  • Title: Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

  • Authors: Adam Stooke, Joshua Achiam, Pieter Abbeel.

  • URL: PID Lagrange

Initialize an instance of PIDLagrangian.

property lagrangian_multiplier: float#

The lagrangian multiplier.

pid_update(ep_cost_avg)[source]#

Update the PID controller.

PID controller update the lagrangian multiplier following the next equation:

(4)#\[\lambda_{t+1} = \lambda_t + (K_p e_p + K_i \int e_p dt + K_d \frac{d e_p}{d t}) \eta\]

where \(e_p\) is the error between the current episode cost and the cost limit, \(K_p\), \(K_i\), \(K_d\) are the PID parameters, and \(\eta\) is the learning rate.

Parameters:

ep_cost_avg (float) – The average cost of the current episode.

Return type:

None