OmniSafe Model-based Model#

Standard Scaler#

Documentation

class omnisafe.algorithms.model_based.base.ensemble.StandardScaler(device)[source]#

Normalizes data using standardization.

This class provides methods to fit the scaler to the input data and transform the input data using the parameters learned during the fitting process.

Parameters:

device (torch.device) – The device to use.

Initialize an instance of StandardScaler.

fit(data)[source]#

Fits the scaler to the input data.

Parameters:

data (np.ndarray) – A numpy array containing the input.

Return type:

None

transform(data)[source]#

Transforms the input matrix data using the parameters of this scaler.

Parameters:

data (torch.Tensor) – The input data to transform.

Returns:

transformed_data – The transformed data.

Return type:

Tensor

Initialize Weight#

Documentation

omnisafe.algorithms.model_based.base.ensemble.init_weights(layer)[source]#

Initialize network weight.

Parameters:

layer (nn.Module) – The layer to initialize.

Return type:

None

Unbatched Forward#

Documentation

omnisafe.algorithms.model_based.base.ensemble.unbatched_forward(layer, input_data, index)[source]#

Special forward for nn.Sequential modules which contain BatchedLinear layers we want to use.

Parameters:
  • layer (nn.Module | EnsembleFC) – The layer to forward through.

  • input_data (torch.Tensor) – The input data.

  • index (int) – The index of the model to use.

Returns:

output – The output of the layer.

Return type:

torch.Tensor

Ensemble Fully-Connected Layer#

Documentation

class omnisafe.algorithms.model_based.base.ensemble.EnsembleFC(in_features, out_features, ensemble_size, weight_decay=0.0, bias=True)[source]#

Ensemble fully connected network.

A fully connected network with ensemble_size models.

Parameters:
  • in_features (int) – The number of input features.

  • out_features (int) – The number of output features.

  • ensemble_size (int) – The number of models in the ensemble.

  • weight_decay (float) – The decaying factor.

  • bias (bool) – Whether to use bias.

Variables:
  • in_features (int) – The number of input features.

  • out_features (int) – The number of output features.

  • ensemble_size (int) – The number of models in the ensemble.

  • weight (nn.Parameter) – The weight of the network.

  • bias (nn.Parameter) – The bias of the network.

Initialize an instance of fully connected network.

forward(input_data)[source]#

Forward pass.

Parameters:

input_data (torch.Tensor) – The input data.

Returns:

The forward output of the network.

Return type:

Tensor

Ensemble Model#

Documentation

class omnisafe.algorithms.model_based.base.ensemble.EnsembleModel(device, state_size, action_size, reward_size, cost_size, ensemble_size, predict_reward, predict_cost=False, hidden_size=200, learning_rate=1e-3, use_decay=False)[source]#

Ensemble dynamics model.

A dynamics model with ensemble_size models.

Parameters:
  • device (torch.device) – The device to use.

  • state_size (int) – The size of the state.

  • action_size (int) – The size of the action.

  • reward_size (int) – The size of the reward.

  • cost_size (int) – The size of the cost.

  • ensemble_size (int) – The number of models in the ensemble.

  • predict_reward (bool) – Whether to predict reward.

  • predict_cost (bool) – Whether to predict cost.

  • hidden_size (int) – The size of the hidden layer.

  • learning_rate (float) – The learning rate.

  • use_decay (bool) – Whether to use weight decay.

Variables:
  • max_logvar (torch.Tensor) – The maximum log variance.

  • min_logvar (torch.Tensor) – The minimum log variance.

  • scaler (StandardScaler) – The scaler.

Initialize network weight.

_get_decay_loss()[source]#

Get decay loss.

Return type:

Tensor

forward(data, ret_log_var=False)[source]#

Compute next state, reward, cost using all models.

Parameters:
  • data (torch.Tensor) – Input data.

  • ret_log_var (bool, optional) – Whether to return the log variance, defaults to False.

Returns:
  • mean – Mean of the next state, reward, cost.

  • logvar or var – Log variance of the next state, reward, cost.

Return type:

tuple[torch.Tensor, torch.Tensor]

forward_idx(data, idx_model, ret_log_var=False)[source]#

Compute next state, reward, cost from an certain model.

Parameters:
  • data (torch.Tensor | np.ndarray) – Input data.

  • idx_model (int) – Index of the model.

  • ret_log_var (bool) – Whether to return the log variance.

Returns:
  • mean – Mean of the next state, reward, cost.

  • logvar or var – Log variance of the next state, reward, cost.

Return type:

tuple[torch.Tensor, torch.Tensor]

loss(mean, logvar, labels, inc_var_loss=True)[source]#

Compute loss.

Parameters:
  • mean (torch.Tensor) – Mean of the next state, reward, cost.

  • logvar (torch.Tensor) – Log variance of the next state, reward, cost.

  • labels (torch.Tensor) – Ground truth of the next state, reward, cost.

  • inc_var_loss (bool, optional) – Whether to include the variance loss. Defaults to True.

Returns:
  • total_loss (torch.Tensor) – Total loss.

  • mse_loss (torch.Tensor) – MSE loss.

Return type:

tuple[Tensor, Tensor]

train_ensemble(loss)[source]#

Train the dynamics model.

Parameters:

loss (torch.Tensor) – The loss of the dynamics model.

Return type:

None

Ensemble Dynamics Model#

Documentation

class omnisafe.algorithms.model_based.base.ensemble.EnsembleDynamicsModel(model_cfgs, device, state_shape, action_shape, actor_critic=None, rew_func=None, cost_func=None, terminal_func=None)[source]#

Dynamics model for predict next state, reward and cost.

Parameters:
  • model_cfgs (Config) – The configuration of the dynamics model.

  • device (torch.device) – The device to use.

  • state_shape (tuple[int, ...]) – The shape of the state.

  • action_shape (tuple[int, ...]) – The shape of the action.

  • actor_critic (ConstraintActorCritic | ConstraintActorQCritic | None, optional) – The actor critic model. Defaults to None.

  • rew_func (Callable[[torch.Tensor], torch.Tensor] | None, optional) – The reward function. Defaults to None.

  • cost_func (Callable[[torch.Tensor], torch.Tensor] | None, optional) – The cost function. Defaults to None.

  • terminal_func (Callable[[torch.Tensor], torch.Tensor] | None, optional) – The terminal function. Defaults to None.

Variables:

elite_model_idxes (list[int]) – The index of the elite models.

Initialize the dynamics model.

_compute_cost(network_output)[source]#

Compute the cost from the network output.

Parameters:

network_output (torch.Tensor) – The output of the network.

Returns:

cost – The cost, from the network output or the cost function.

Raises:

ValueError – If the cost function is not defined.

Return type:

Tensor

_compute_reward(network_output)[source]#

Compute the reward from the network output.

Parameters:

network_output (torch.Tensor) – The output of the network.

Returns:

reward – The reward, from the network output or the reward function.

Raises:

ValueError – If the reward function is not defined.

Return type:

Tensor

_compute_terminal(network_output)[source]#

Compute the terminal from the network output.

Parameters:

network_output (torch.Tensor) – The output of the network.

Returns:

terminal – The terminal signal, from the network output or the terminal function.

Raises:

ValueError – If the terminal function is not defined.

Return type:

Tensor

_predict(inputs, batch_size=1024, idx=None, ret_log_var=False)[source]#

Input type and output type both are tensor, used for planning loop.

Parameters:
  • inputs (torch.Tensor) – the inputs to the network.

  • batch_size (int, optional) – the batch size for prediction.

  • idx (Union[int, None], optional) – the index of the model to use.

  • ret_log_var (bool, optional) – whether to return the log variance.

Returns:
  • ensemble_mean_tensor – The mean of the ensemble.

  • ensemble_var_tensor – The variance of the ensemble.

Return type:

tuple[torch.Tensor, torch.Tensor]

_save_best(epoch, holdout_losses)[source]#

Save the best model.

Parameters:
  • epoch (int) – The current epoch.

  • holdout_losses (list) – The holdout loss.

Returns:

Whether to break the training.

Return type:

bool

property ensemble_model: EnsembleModel#

The ensemble model.

imagine(states, horizon, actions=None, actor_critic=None, idx=None)[source]#

Imagine the future states and rewards from the ensemble model.

Parameters:
  • states (torch.Tensor) – the states.

  • horizon (int) – the horizon.

  • actions (torch.Tensor, optional) – the actions.

  • actor_critic (ConstraintActorQCritic, optional) – the actor_critic to use if actions is None.

  • idx (int, optional) – the index of the model to use.

Returns:

traj – the trajectory dict, contains the states, rewards, etc.

Return type:

dict[str, torch.Tensor]

property num_models: int#

The number of ensemble.

sample(states, actions, idx=None, deterministic=False)[source]#

Sample states and rewards from the ensemble model.

Parameters:
  • states (torch.Tensor) – the states.

  • actions (torch.Tensor) – the actions.

  • idx (Union[int, None], optional) – the index of the model to use. Defaults to None.

  • deterministic (bool, optional) – whether to use the deterministic version of the model. Defaults to False.

Returns:
  • sample_states (torch.Tensor) – the sampled states.

  • rewards (torch.Tensor) – the rewards.

  • info – the info dict, contains the costs if use_cost is True.

Return type:

tuple[torch.Tensor, torch.Tensor, dict[str, list[torch.Tensor]]]

property state_size: int#

The state size.

train(inputs, labels, holdout_ratio=0.0)[source]#

Train the dynamics, holdout_ratio is the data ratio hold out for validation.

Parameters:
  • inputs (np.ndarray) – Input data.

  • labels (np.ndarray) – Ground truth of the next state, reward, cost.

  • holdout_ratio (float) – The ratio of the data hold out for validation.

Returns:
  • train_mse_losses – The training loss.

  • val_mse_losses – The validation loss.

Return type:

tuple[ndarray, ndarray]