OmniSafe Model-based Model#

Standard Scaler#

Documentation

class omnisafe.algorithms.model_based.base.ensemble.StandardScaler(device)[source]#

Normalizes data using standardization.

This class provides methods to fit the scaler to the input data and transform the input data using the parameters learned during the fitting process.

Parameters:: device (torch.device) – The device to use.

Initialize an instance of StandardScaler.

fit(data)[source]#

Fits the scaler to the input data.

Parameters:: data (np.ndarray) – A numpy array containing the input.
Return type:: None

transform(data)[source]#

Transforms the input matrix data using the parameters of this scaler.

Parameters:: data (torch.Tensor) – The input data to transform.
Returns:: transformed_data – The transformed data.
Return type:: Tensor

Initialize Weight#

Documentation

omnisafe.algorithms.model_based.base.ensemble.init_weights(layer)[source]#

Initialize network weight.

Parameters:: layer (nn.Module) – The layer to initialize.
Return type:: None

Unbatched Forward#

Documentation

omnisafe.algorithms.model_based.base.ensemble.unbatched_forward(layer, input_data, index)[source]#

Special forward for nn.Sequential modules which contain BatchedLinear layers we want to use.

Parameters:

layer (nn.Module | EnsembleFC) – The layer to forward through.
input_data (torch.Tensor) – The input data.
index (int) – The index of the model to use.

Returns:

output – The output of the layer.

Return type:

torch.Tensor

Ensemble Fully-Connected Layer#

Documentation

class omnisafe.algorithms.model_based.base.ensemble.EnsembleFC(in_features, out_features, ensemble_size, weight_decay=0.0, bias=True)[source]#

Ensemble fully connected network.

A fully connected network with ensemble_size models.

Parameters:

in_features (int) – The number of input features.
out_features (int) – The number of output features.
ensemble_size (int) – The number of models in the ensemble.
weight_decay (float) – The decaying factor.
bias (bool) – Whether to use bias.

Variables:

in_features (int) – The number of input features.
out_features (int) – The number of output features.
ensemble_size (int) – The number of models in the ensemble.
weight (nn.Parameter) – The weight of the network.
bias (nn.Parameter) – The bias of the network.

Initialize an instance of fully connected network.

forward(input_data)[source]#

Forward pass.

Parameters:: input_data (torch.Tensor) – The input data.
Returns:: The forward output of the network.
Return type:: Tensor

Ensemble Model#

Documentation

class omnisafe.algorithms.model_based.base.ensemble.EnsembleModel(device, state_size, action_size, reward_size, cost_size, ensemble_size, predict_reward, predict_cost=False, hidden_size=200, learning_rate=1e-3, use_decay=False)[source]#

Ensemble dynamics model.

A dynamics model with ensemble_size models.

Parameters:

device (torch.device) – The device to use.
state_size (int) – The size of the state.
action_size (int) – The size of the action.
reward_size (int) – The size of the reward.
cost_size (int) – The size of the cost.
ensemble_size (int) – The number of models in the ensemble.
predict_reward (bool) – Whether to predict reward.
predict_cost (bool) – Whether to predict cost.
hidden_size (int) – The size of the hidden layer.
learning_rate (float) – The learning rate.
use_decay (bool) – Whether to use weight decay.

Variables:

max_logvar (torch.Tensor) – The maximum log variance.
min_logvar (torch.Tensor) – The minimum log variance.
scaler (StandardScaler) – The scaler.

Initialize network weight.

_get_decay_loss()[source]#

Get decay loss.

Return type:: Tensor

forward(data, ret_log_var=False)[source]#

Compute next state, reward, cost using all models.

Parameters:

data (torch.Tensor) – Input data.
ret_log_var (bool, optional) – Whether to return the log variance, defaults to False.

Returns:

mean – Mean of the next state, reward, cost.
logvar or var – Log variance of the next state, reward, cost.

Return type:

tuple[torch.Tensor, torch.Tensor]

forward_idx(data, idx_model, ret_log_var=False)[source]#

Compute next state, reward, cost from an certain model.

Parameters:

data (torch.Tensor | np.ndarray) – Input data.
idx_model (int) – Index of the model.
ret_log_var (bool) – Whether to return the log variance.

Returns:

mean – Mean of the next state, reward, cost.
logvar or var – Log variance of the next state, reward, cost.

Return type:

tuple[torch.Tensor, torch.Tensor]

loss(mean, logvar, labels, inc_var_loss=True)[source]#

Compute loss.

Parameters:

mean (torch.Tensor) – Mean of the next state, reward, cost.
logvar (torch.Tensor) – Log variance of the next state, reward, cost.
labels (torch.Tensor) – Ground truth of the next state, reward, cost.
inc_var_loss (bool, optional) – Whether to include the variance loss. Defaults to True.

Returns:

total_loss (torch.Tensor) – Total loss.
mse_loss (torch.Tensor) – MSE loss.

Return type:

tuple[Tensor, Tensor]

train_ensemble(loss)[source]#

Train the dynamics model.

Parameters:: loss (torch.Tensor) – The loss of the dynamics model.
Return type:: None

Ensemble Dynamics Model#

Documentation

class omnisafe.algorithms.model_based.base.ensemble.EnsembleDynamicsModel(model_cfgs, device, state_shape, action_shape, actor_critic=None, rew_func=None, cost_func=None, terminal_func=None)[source]#

Dynamics model for predict next state, reward and cost.

Parameters:

model_cfgs (Config) – The configuration of the dynamics model.
device (torch.device) – The device to use.
state_shape (tuple[int, ...]) – The shape of the state.
action_shape (tuple[int, ...]) – The shape of the action.
actor_critic (ConstraintActorCritic | ConstraintActorQCritic | None, optional) – The actor critic model. Defaults to None.
rew_func (Callable[[torch.Tensor], torch.Tensor] | None, optional) – The reward function. Defaults to None.
cost_func (Callable[[torch.Tensor], torch.Tensor] | None, optional) – The cost function. Defaults to None.
terminal_func (Callable[[torch.Tensor], torch.Tensor] | None, optional) – The terminal function. Defaults to None.

Variables:

elite_model_idxes (list[int]) – The index of the elite models.

Initialize the dynamics model.

_compute_cost(network_output)[source]#

Compute the cost from the network output.

Parameters:: network_output (torch.Tensor) – The output of the network.
Returns:: cost – The cost, from the network output or the cost function.
Raises:: ValueError – If the cost function is not defined.
Return type:: Tensor

_compute_reward(network_output)[source]#

Compute the reward from the network output.

Parameters:: network_output (torch.Tensor) – The output of the network.
Returns:: reward – The reward, from the network output or the reward function.
Raises:: ValueError – If the reward function is not defined.
Return type:: Tensor

_compute_terminal(network_output)[source]#

Compute the terminal from the network output.

Parameters:: network_output (torch.Tensor) – The output of the network.
Returns:: terminal – The terminal signal, from the network output or the terminal function.
Raises:: ValueError – If the terminal function is not defined.
Return type:: Tensor

_predict(inputs, batch_size=1024, idx=None, ret_log_var=False)[source]#

Input type and output type both are tensor, used for planning loop.

Parameters:

inputs (torch.Tensor) – the inputs to the network.
batch_size (int, optional) – the batch size for prediction.
idx (Union[int, None], optional) – the index of the model to use.
ret_log_var (bool, optional) – whether to return the log variance.

Returns:

ensemble_mean_tensor – The mean of the ensemble.
ensemble_var_tensor – The variance of the ensemble.

Return type:

tuple[torch.Tensor, torch.Tensor]

_save_best(epoch, holdout_losses)[source]#

Save the best model.

Parameters:

epoch (int) – The current epoch.
holdout_losses (list) – The holdout loss.

Returns:

Whether to break the training.

Return type:

bool

property ensemble_model: EnsembleModel#: The ensemble model.

imagine(states, horizon, actions=None, actor_critic=None, idx=None)[source]#

Imagine the future states and rewards from the ensemble model.

Parameters:

states (torch.Tensor) – the states.
horizon (int) – the horizon.
actions (torch.Tensor, optional) – the actions.
actor_critic (ConstraintActorQCritic, optional) – the actor_critic to use if actions is None.
idx (int, optional) – the index of the model to use.

Returns:

traj – the trajectory dict, contains the states, rewards, etc.

Return type:

dict[str, torch.Tensor]

property num_models: int#: The number of ensemble.

sample(states, actions, idx=None, deterministic=False)[source]#

Sample states and rewards from the ensemble model.

Parameters:

states (torch.Tensor) – the states.
actions (torch.Tensor) – the actions.
idx (Union[int, None], optional) – the index of the model to use. Defaults to None.
deterministic (bool, optional) – whether to use the deterministic version of the model. Defaults to False.

Returns:

sample_states (torch.Tensor) – the sampled states.
rewards (torch.Tensor) – the rewards.
info – the info dict, contains the costs if use_cost is True.

Return type:

tuple[torch.Tensor, torch.Tensor, dict[str, list[torch.Tensor]]]

property state_size: int#: The state size.

train(inputs, labels, holdout_ratio=0.0)[source]#

Train the dynamics, holdout_ratio is the data ratio hold out for validation.

Parameters:

inputs (np.ndarray) – Input data.
labels (np.ndarray) – Ground truth of the next state, reward, cost.
holdout_ratio (float) – The ratio of the data hold out for validation.

Returns:

train_mse_losses – The training loss.
val_mse_losses – The validation loss.

Return type:

tuple[ndarray, ndarray]