OmniSafe Model-based Model#
Standard Scaler#
Documentation
- class omnisafe.algorithms.model_based.base.ensemble.StandardScaler(device)[source]#
Normalizes data using standardization.
This class provides methods to fit the scaler to the input data and transform the input data using the parameters learned during the fitting process.
- Parameters:
device (torch.device) – The device to use.
Initialize an instance of
StandardScaler
.
Initialize Weight#
Unbatched Forward#
Documentation
- omnisafe.algorithms.model_based.base.ensemble.unbatched_forward(layer, input_data, index)[source]#
Special forward for nn.Sequential modules which contain BatchedLinear layers we want to use.
- Parameters:
layer (nn.Module | EnsembleFC) – The layer to forward through.
input_data (torch.Tensor) – The input data.
index (int) – The index of the model to use.
- Returns:
output – The output of the layer.
- Return type:
torch.Tensor
Ensemble Fully-Connected Layer#
Documentation
- class omnisafe.algorithms.model_based.base.ensemble.EnsembleFC(in_features, out_features, ensemble_size, weight_decay=0.0, bias=True)[source]#
Ensemble fully connected network.
A fully connected network with ensemble_size models.
- Parameters:
in_features (int) – The number of input features.
out_features (int) – The number of output features.
ensemble_size (int) – The number of models in the ensemble.
weight_decay (float) – The decaying factor.
bias (bool) – Whether to use bias.
- Variables:
in_features (int) – The number of input features.
out_features (int) – The number of output features.
ensemble_size (int) – The number of models in the ensemble.
weight (nn.Parameter) – The weight of the network.
bias (nn.Parameter) – The bias of the network.
Initialize an instance of fully connected network.
Ensemble Model#
Documentation
- class omnisafe.algorithms.model_based.base.ensemble.EnsembleModel(device, state_size, action_size, reward_size, cost_size, ensemble_size, predict_reward, predict_cost=False, hidden_size=200, learning_rate=1e-3, use_decay=False)[source]#
Ensemble dynamics model.
A dynamics model with ensemble_size models.
- Parameters:
device (torch.device) – The device to use.
state_size (int) – The size of the state.
action_size (int) – The size of the action.
reward_size (int) – The size of the reward.
cost_size (int) – The size of the cost.
ensemble_size (int) – The number of models in the ensemble.
predict_reward (bool) – Whether to predict reward.
predict_cost (bool) – Whether to predict cost.
hidden_size (int) – The size of the hidden layer.
learning_rate (float) – The learning rate.
use_decay (bool) – Whether to use weight decay.
- Variables:
max_logvar (torch.Tensor) – The maximum log variance.
min_logvar (torch.Tensor) – The minimum log variance.
scaler (StandardScaler) – The scaler.
Initialize network weight.
- forward(data, ret_log_var=False)[source]#
Compute next state, reward, cost using all models.
- Parameters:
data (torch.Tensor) – Input data.
ret_log_var (bool, optional) – Whether to return the log variance, defaults to False.
- Returns:
mean – Mean of the next state, reward, cost.
logvar or var – Log variance of the next state, reward, cost.
- Return type:
tuple[torch.Tensor, torch.Tensor]
- forward_idx(data, idx_model, ret_log_var=False)[source]#
Compute next state, reward, cost from an certain model.
- Parameters:
data (torch.Tensor | np.ndarray) – Input data.
idx_model (int) – Index of the model.
ret_log_var (bool) – Whether to return the log variance.
- Returns:
mean – Mean of the next state, reward, cost.
logvar or var – Log variance of the next state, reward, cost.
- Return type:
tuple[torch.Tensor, torch.Tensor]
- loss(mean, logvar, labels, inc_var_loss=True)[source]#
Compute loss.
- Parameters:
mean (torch.Tensor) – Mean of the next state, reward, cost.
logvar (torch.Tensor) – Log variance of the next state, reward, cost.
labels (torch.Tensor) – Ground truth of the next state, reward, cost.
inc_var_loss (bool, optional) – Whether to include the variance loss. Defaults to True.
- Returns:
total_loss (torch.Tensor) – Total loss.
mse_loss (torch.Tensor) – MSE loss.
- Return type:
tuple
[Tensor
,Tensor
]
Ensemble Dynamics Model#
Documentation
- class omnisafe.algorithms.model_based.base.ensemble.EnsembleDynamicsModel(model_cfgs, device, state_shape, action_shape, actor_critic=None, rew_func=None, cost_func=None, terminal_func=None)[source]#
Dynamics model for predict next state, reward and cost.
- Parameters:
model_cfgs (Config) – The configuration of the dynamics model.
device (torch.device) – The device to use.
state_shape (tuple[int, ...]) – The shape of the state.
action_shape (tuple[int, ...]) – The shape of the action.
actor_critic (ConstraintActorCritic | ConstraintActorQCritic | None, optional) – The actor critic model. Defaults to None.
rew_func (Callable[[torch.Tensor], torch.Tensor] | None, optional) – The reward function. Defaults to None.
cost_func (Callable[[torch.Tensor], torch.Tensor] | None, optional) – The cost function. Defaults to None.
terminal_func (Callable[[torch.Tensor], torch.Tensor] | None, optional) – The terminal function. Defaults to None.
- Variables:
elite_model_idxes (list[int]) – The index of the elite models.
Initialize the dynamics model.
- _compute_cost(network_output)[source]#
Compute the cost from the network output.
- Parameters:
network_output (torch.Tensor) – The output of the network.
- Returns:
cost – The cost, from the network output or the cost function.
- Raises:
ValueError – If the cost function is not defined.
- Return type:
Tensor
- _compute_reward(network_output)[source]#
Compute the reward from the network output.
- Parameters:
network_output (torch.Tensor) – The output of the network.
- Returns:
reward – The reward, from the network output or the reward function.
- Raises:
ValueError – If the reward function is not defined.
- Return type:
Tensor
- _compute_terminal(network_output)[source]#
Compute the terminal from the network output.
- Parameters:
network_output (torch.Tensor) – The output of the network.
- Returns:
terminal – The terminal signal, from the network output or the terminal function.
- Raises:
ValueError – If the terminal function is not defined.
- Return type:
Tensor
- _predict(inputs, batch_size=1024, idx=None, ret_log_var=False)[source]#
Input type and output type both are tensor, used for planning loop.
- Parameters:
inputs (torch.Tensor) – the inputs to the network.
batch_size (int, optional) – the batch size for prediction.
idx (Union[int, None], optional) – the index of the model to use.
ret_log_var (bool, optional) – whether to return the log variance.
- Returns:
ensemble_mean_tensor – The mean of the ensemble.
ensemble_var_tensor – The variance of the ensemble.
- Return type:
tuple[torch.Tensor, torch.Tensor]
- _save_best(epoch, holdout_losses)[source]#
Save the best model.
- Parameters:
epoch (int) – The current epoch.
holdout_losses (list) – The holdout loss.
- Returns:
Whether to break the training.
- Return type:
bool
- property ensemble_model: EnsembleModel#
The ensemble model.
- imagine(states, horizon, actions=None, actor_critic=None, idx=None)[source]#
Imagine the future states and rewards from the ensemble model.
- Parameters:
states (torch.Tensor) – the states.
horizon (int) – the horizon.
actions (torch.Tensor, optional) – the actions.
actor_critic (ConstraintActorQCritic, optional) – the actor_critic to use if actions is None.
idx (int, optional) – the index of the model to use.
- Returns:
traj – the trajectory dict, contains the states, rewards, etc.
- Return type:
dict[str, torch.Tensor]
- property num_models: int#
The number of ensemble.
- sample(states, actions, idx=None, deterministic=False)[source]#
Sample states and rewards from the ensemble model.
- Parameters:
states (torch.Tensor) – the states.
actions (torch.Tensor) – the actions.
idx (Union[int, None], optional) – the index of the model to use. Defaults to None.
deterministic (bool, optional) – whether to use the deterministic version of the model. Defaults to False.
- Returns:
sample_states (torch.Tensor) – the sampled states.
rewards (torch.Tensor) – the rewards.
info – the info dict, contains the costs if use_cost is True.
- Return type:
tuple[torch.Tensor, torch.Tensor, dict[str, list[torch.Tensor]]]
- property state_size: int#
The state size.
- train(inputs, labels, holdout_ratio=0.0)[source]#
Train the dynamics, holdout_ratio is the data ratio hold out for validation.
- Parameters:
inputs (np.ndarray) – Input data.
labels (np.ndarray) – Ground truth of the next state, reward, cost.
holdout_ratio (float) – The ratio of the data hold out for validation.
- Returns:
train_mse_losses – The training loss.
val_mse_losses – The validation loss.
- Return type:
tuple
[ndarray
,ndarray
]