botorch.utils

Constraints

Helpers for handling input or outcome constraints.

botorch.utils.constraints.get_outcome_constraint_transforms(outcome_constraints)[source]

Create outcome constraint callables from outcome constraint tensors.

Parameters:

outcome_constraints (Tuple[Tensor, Tensor] | None) – A tuple of (A, b). For k outcome constraints and m outputs at f(x)`, A is k x m and b is k x 1 such that A f(x) <= b.

Returns:

A list of callables, each mapping a Tensor of size b x q x m to a tensor of size b x q, where m is the number of outputs of the model. Negative values imply feasibility. The callables support broadcasting (e.g. for calling on a tensor of shape mc_samples x b x q x m).

Return type:

List[Callable[[Tensor], Tensor]] | None

Example

>>> # constrain `f(x)[0] <= 0`
>>> A = torch.tensor([[1., 0.]])
>>> b = torch.tensor([[0.]])
>>> outcome_constraints = get_outcome_constraint_transforms((A, b))
botorch.utils.constraints.get_monotonicity_constraints(d, descending=False, dtype=None, device=None)[source]

Returns a system of linear inequalities (A, b) that generically encodes order constraints on the elements of a d-dimsensional space, i.e. A @ x < b implies x[i] < x[i + 1] for a d-dimensional vector x.

Idea: Could encode A as sparse matrix, if it is supported well.

Parameters:
  • d (int) – Dimensionality of the constraint space, i.e. number of monotonic parameters.

  • descending (bool) – If True, forces the elements of a vector to be monotonically de- creasing and be monotonically increasing otherwise.

  • dtype (dtype | None) – The dtype of the returned Tensors.

  • device (device | None) – The device of the returned Tensors.

Returns:

A tuple of Tensors (A, b) representing the monotonicity constraint as a system of linear inequalities A @ x < b. A is (d - 1) x d-dimensional and b is (d - 1) x 1-dimensional.

Return type:

Tuple[Tensor, Tensor]

Containers

Representations for different kinds of data.

class botorch.utils.containers.DenseContainer(values, event_shape)[source]

Bases: BotorchContainer

Basic representation of data stored as a dense Tensor.

Parameters:
  • values (Tensor)

  • event_shape (Size)

values: Tensor
event_shape: Size
property shape: Size
property device: device
property dtype: dtype
class botorch.utils.containers.SliceContainer(values, indices, event_shape)[source]

Bases: BotorchContainer

Represent data points formed by concatenating (n-1)-dimensional slices taken from the leading dimension of an n-dimensional source tensor.

Parameters:
  • values (Tensor)

  • indices (LongTensor)

  • event_shape (Size)

values: Tensor
indices: LongTensor
event_shape: Size
property shape: Size
property device: device
property dtype: dtype

Context Managers

Utilities for optimization.

class botorch.utils.context_managers.TensorCheckpoint(values, device, dtype)[source]

Bases: NamedTuple

Create new instance of TensorCheckpoint(values, device, dtype)

Parameters:
  • values (Tensor)

  • device (device | None)

  • dtype (dtype | None)

values: Tensor

Alias for field number 0

device: device | None

Alias for field number 1

dtype: dtype | None

Alias for field number 2

botorch.utils.context_managers.delattr_ctx(instance, *attrs, enforce_hasattr=False)[source]

Contextmanager for temporarily deleting attributes.

Parameters:
  • instance (object)

  • attrs (str)

  • enforce_hasattr (bool)

Return type:

Generator[None, None, None]

botorch.utils.context_managers.parameter_rollback_ctx(parameters, checkpoint=None, **tkwargs)[source]

Contextmanager that exits by rolling back a module’s state_dict.

Parameters:
  • module – Module instance.

  • name_filter – Optional Boolean function used to filter items by name.

  • checkpoint (Dict[str, TensorCheckpoint] | None) – Optional cache of values and tensor metadata specifying the rollback state for the module (or some subset thereof).

  • **tkwargs (Any) – Keyword arguments passed to torch.Tensor.to when copying data from each tensor in module.state_dict() to the internally created checkpoint. Only adhered to when the checkpoint argument is None.

  • parameters (Dict[str, Tensor])

Yields:

A dictionary of TensorCheckpoints for the module’s state_dict. Any in-places changes to the checkpoint will be observed at rollback time. If the checkpoint is cleared, no rollback will occur.

Return type:

Generator[Dict[str, TensorCheckpoint], None, None]

botorch.utils.context_managers.module_rollback_ctx(module, name_filter=None, checkpoint=None, **tkwargs)[source]

Contextmanager that exits by rolling back a module’s state_dict.

Parameters:
  • module (Module) – Module instance.

  • name_filter (Callable[[str], bool] | None) – Optional Boolean function used to filter items by name.

  • checkpoint (Dict[str, TensorCheckpoint] | None) – Optional cache of values and tensor metadata specifying the rollback state for the module (or some subset thereof).

  • **tkwargs (Any) – Keyword arguments passed to torch.Tensor.to when copying data from each tensor in module.state_dict() to the internally created checkpoint. Only adhered to when the checkpoint argument is None.

Yields:

A dictionary of TensorCheckpoints for the module’s state_dict. Any in-places changes to the checkpoint will be observed at rollback time. If the checkpoint is cleared, no rollback will occur.

Return type:

Generator[Dict[str, TensorCheckpoint], None, None]

botorch.utils.context_managers.zero_grad_ctx(parameters, zero_on_enter=True, zero_on_exit=False)[source]
Parameters:
  • parameters (Dict[str, Tensor] | Iterable[Tensor])

  • zero_on_enter (bool)

  • zero_on_exit (bool)

Return type:

Generator[None, None, None]

Datasets

Representations for different kinds of datasets.

class botorch.utils.datasets.SupervisedDataset(X, Y, *, feature_names, outcome_names, Yvar=None, validate_init=True)[source]

Bases: object

Base class for datasets consisting of labelled pairs (X, Y) and an optional Yvar that stipulates observations variances so that Y[i] ~ N(f(X[i]), Yvar[i]).

Example:

X = torch.rand(16, 2)
Y = torch.rand(16, 1)
feature_names = ["learning_rate", "embedding_dim"]
outcome_names = ["neg training loss"]
A = SupervisedDataset(
    X=X,
    Y=Y,
    feature_names=feature_names,
    outcome_names=outcome_names,
)
B = SupervisedDataset(
    X=DenseContainer(X, event_shape=X.shape[-1:]),
    Y=DenseContainer(Y, event_shape=Y.shape[-1:]),
    feature_names=feature_names,
    outcome_names=outcome_names,
)
assert A == B

Constructs a SupervisedDataset.

Parameters:
  • X (Union[BotorchContainer, Tensor]) – A Tensor or BotorchContainer representing the input features.

  • Y (Union[BotorchContainer, Tensor]) – A Tensor or BotorchContainer representing the outcomes.

  • feature_names (List[str]) – A list of names of the features in X.

  • outcome_names (List[str]) – A list of names of the outcomes in Y.

  • Yvar (Union[BotorchContainer, Tensor, None]) – An optional Tensor or BotorchContainer representing the observation noise.

  • validate_init (bool) – If True, validates the input shapes.

property X: Tensor
property Y: Tensor
property Yvar: Tensor | None
class botorch.utils.datasets.FixedNoiseDataset(X, Y, Yvar, feature_names, outcome_names, validate_init=True)[source]

Bases: SupervisedDataset

A SupervisedDataset with an additional field Yvar that stipulates observations variances so that Y[i] ~ N(f(X[i]), Yvar[i]).

NOTE: This is deprecated. Use SupervisedDataset instead. Will be removed in a future release (~v0.11).

Initialize a FixedNoiseDataset – deprecated!

Parameters:
  • X (Union[BotorchContainer, Tensor])

  • Y (Union[BotorchContainer, Tensor])

  • Yvar (Union[BotorchContainer, Tensor])

  • feature_names (List[str])

  • outcome_names (List[str])

  • validate_init (bool)

class botorch.utils.datasets.RankingDataset(X, Y, feature_names, outcome_names, validate_init=True)[source]

Bases: SupervisedDataset

A SupervisedDataset whose labelled pairs (x, y) consist of m-ary combinations x ∈ Z^{m} of elements from a ground set Z = (z_1, …) and ranking vectors y {0, …, m - 1}^{m} with properties:

  1. Ranks start at zero, i.e. min(y) = 0.

  2. Sorted ranks are contiguous unless one or more ties are present.

  3. k ranks are skipped after a k-way tie.

Example:

X = SliceContainer(
    values=torch.rand(16, 2),
    indices=torch.stack([torch.randperm(16)[:3] for _ in range(8)]),
    event_shape=torch.Size([3 * 2]),
)
Y = DenseContainer(
    torch.stack([torch.randperm(3) for _ in range(8)]),
    event_shape=torch.Size([3])
)
feature_names = ["item_0", "item_1"]
outcome_names = ["ranking outcome"]
dataset = RankingDataset(
    X=X,
    Y=Y,
    feature_names=feature_names,
    outcome_names=outcome_names,
)

Construct a RankingDataset.

Parameters:
  • X (SliceContainer) – A SliceContainer representing the input features being ranked.

  • Y (Union[BotorchContainer, Tensor]) – A Tensor or BotorchContainer representing the rankings.

  • feature_names (List[str]) – A list of names of the features in X.

  • outcome_names (List[str]) – A list of names of the outcomes in Y.

  • validate_init (bool) – If True, validates the input shapes.

class botorch.utils.datasets.MultiTaskDataset(datasets, target_outcome_name, task_feature_index=None)[source]

Bases: SupervisedDataset

This is a multi-task dataset that is constructed from the datasets of individual tasks. It offers functionality to combine parts of individual datasets to construct the inputs necessary for the MultiTaskGP models.

The datasets of individual tasks are allowed to represent different sets of features. When there are heterogeneous feature sets, calling MultiTaskDataset.X will result in an error.

Construct a MultiTaskDataset.

Parameters:
  • datasets (List[SupervisedDataset]) – A list of the datasets of individual tasks. Each dataset is expected to contain data for only one outcome.

  • target_outcome_name (str) – Name of the target outcome to be modeled.

  • task_feature_index (Optional[int]) – If the task feature is included in the Xs of the individual datasets, this should be used to specify its index. If omitted, the task feature will be appended while concatenating Xs. If given, we sanity-check that the names of the task features match between all datasets.

classmethod from_joint_dataset(dataset, task_feature_index, target_task_value, outcome_names_per_task=None)[source]

Construct a MultiTaskDataset from a joint dataset that includes the data for all tasks with the task feature index.

This will break down the joint dataset into individual datasets by the value of the task feature. Each resulting dataset will have its outcome name set based on outcome_names_per_task, with the missing values defaulting to task_<task_feature> (except for the target task, which will retain the original outcome name from the dataset).

Parameters:
  • dataset (SupervisedDataset) – The joint dataset.

  • task_feature_index (int) – The column index of the task feature in dataset.X.

  • target_task_value (int) – The value of the task feature for the target task in the dataset. The data for the target task is filtered according to dataset.X[task_feature_index] == target_task_value.

  • outcome_names_per_task (Dict[int, str] | None) – Optional dictionary mapping task feature values to the outcome names for each task. If not provided, the auxiliary tasks will be named task_<task_feature> and the target task will retain the outcome name from the dataset.

Returns:

A MultiTaskDataset instance.

Return type:

MultiTaskDataset

property X: Tensor

Appends task features, if needed, and concatenates the Xs of datasets to produce the train_X expected by MultiTaskGP and subclasses.

If appending the task features, 0 is reserved for the target task and the remaining tasks are populated with 1, 2, …, len(datasets) - 1.

property Y: Tensor

Concatenates Ys of the datasets.

property Yvar: Tensor | None

Concatenates Yvars of the datasets if they exist.

get_dataset_without_task_feature(outcome_name)[source]

A helper for extracting the child datasets with their task features removed.

If the task feature index is None, the dataset will be returned as is.

Parameters:

outcome_name (str) – The outcome name for the dataset to extract.

Returns:

The dataset without the task feature.

Return type:

SupervisedDataset

class botorch.utils.datasets.ContextualDataset(datasets, parameter_decomposition, metric_decomposition=None)[source]

Bases: SupervisedDataset

This is a contextual dataset that is constructed from either a single dateset containing overall outcome or a list of datasets that each corresponds to a context breakdown.

Construct a ContextualDataset.

Parameters:
  • datasets (List[SupervisedDataset]) – A list of the datasets of individual tasks. Each dataset is expected to contain data for only one outcome.

  • parameter_decomposition (Dict[str, List[str]]) – Dict from context name to list of feature names corresponding to that context.

  • metric_decomposition (Optional[Dict[str, List[str]]]) – Context breakdown metrics. Keys are context names. Values are the lists of metric names belonging to the context: {‘context1’: [‘m1_c1’], ‘context2’: [‘m1_c2’],}.

property X: Tensor
property Y: Tensor

Concatenates the Ys from the child datasets to create the Y expected by LCEM model if there are multiple datasets; Or return the Y expected by LCEA model if there is only one dataset.

property Yvar: Tensor

Concatenates the Yvars from the child datasets to create the Y expected by LCEM model if there are multiple datasets; Or return the Yvar expected by LCEA model if there is only one dataset.

Dispatcher

botorch.utils.dispatcher.type_bypassing_encoder(arg)[source]
Parameters:

arg (Any)

Return type:

Type

class botorch.utils.dispatcher.Dispatcher(name, doc=None, encoder=<class 'type'>)[source]

Bases: Dispatcher

Clearing house for multiple dispatch functionality. This class extends <multipledispatch.Dispatcher> by: (i) generalizing the argument encoding convention during method lookup, (ii) implementing __getitem__ as a dedicated method lookup function.

Parameters:
  • name (str) – A string identifier for the Dispatcher instance.

  • doc (Optional[str]) – A docstring for the multiply dispatched method(s).

  • encoder (Callable[Any, Type]) – A callable that individually transforms the arguments passed at runtime in order to construct the key used for method lookup as tuple(map(encoder, args)). Defaults to type.

dispatch(*types)[source]

Method lookup strategy. Checks for an exact match before traversing the set of registered methods according to the current ordering.

Parameters:

types (Type) – A tuple of types that gets compared with the signatures of registered methods to determine compatibility.

Returns:

The first method encountered with a matching signature.

Return type:

Callable

encode_args(args)[source]

Converts arguments into a tuple of types used during method lookup.

Parameters:

args (Any)

Return type:

Tuple[Type]

help(*args, **kwargs)[source]

Prints the retrieved method’s docstring.

Parameters:
  • args (Any)

  • kwargs (Any)

Return type:

None

source(*args, **kwargs)[source]

Prints the retrieved method’s source types.

Return type:

None

property encoder: Callable[[Any], Type]
name
funcs
doc

Low-Rank Cholesky Update Utils

botorch.utils.low_rank.extract_batch_covar(mt_mvn)[source]

Extract a batched independent covariance matrix from an MTMVN.

Parameters:

mt_mvn (MultitaskMultivariateNormal) – A multi-task multivariate normal with a block diagonal covariance matrix.

Returns:

A lazy covariance matrix consisting of a batch of the blocks of

the diagonal of the MultitaskMultivariateNormal.

Return type:

LinearOperator

botorch.utils.low_rank.sample_cached_cholesky(posterior, baseline_L, q, base_samples, sample_shape, max_tries=6)[source]

Get posterior samples at the q new points from the joint multi-output posterior.

Parameters:
  • posterior (GPyTorchPosterior) – The joint posterior is over (X_baseline, X).

  • baseline_L (Tensor) – The baseline lower triangular cholesky factor.

  • q (int) – The number of new points in X.

  • base_samples (Tensor) – The base samples.

  • sample_shape (Size) – The sample shape.

  • max_tries (int) – The number of tries for computing the Cholesky decomposition with increasing jitter.

Returns:

A sample_shape x batch_shape x q x m-dim tensor of posterior

samples at the new points.

Return type:

Tensor

Multi-Task Distribution Utils

Helpers for multitask modeling.

botorch.utils.multitask.separate_mtmvn(mvn)[source]

Separate a MTMVN into a list of MVNs, where covariance across data within each task are preserved, while covariance across task are dropped.

Parameters:

mvn (MultitaskMultivariateNormal)

Return type:

List[MultivariateNormal]

Objective

Helpers for handling objectives.

botorch.utils.objective.get_objective_weights_transform(weights)[source]

Create a linear objective callable from a set of weights.

Create a callable mapping a Tensor of size b x q x m and an (optional) Tensor of size b x q x d to a Tensor of size b x q, where m is the number of outputs of the model using scalarization via the objective weights. This callable supports broadcasting (e.g. for calling on a tensor of shape mc_samples x b x q x m). For m = 1, the objective weight is used to determine the optimization direction.

Parameters:

weights (Tensor | None) – a 1-dimensional Tensor containing a weight for each task. If not provided, the identity mapping is used.

Returns:

Transform function using the objective weights.

Return type:

Callable[[Tensor, Tensor | None], Tensor]

Example

>>> weights = torch.tensor([0.75, 0.25])
>>> transform = get_objective_weights_transform(weights)
botorch.utils.objective.apply_constraints_nonnegative_soft(obj, constraints, samples, eta)[source]

Applies constraints to a non-negative objective.

This function uses a sigmoid approximation to an indicator function for each constraint.

Parameters:
  • obj (Tensor) – A n_samples x b x q (x m’)-dim Tensor of objective values.

  • constraints (List[Callable[[Tensor], Tensor]]) – A list of callables, each mapping a Tensor of size b x q x m to a Tensor of size b x q, where negative values imply feasibility. This callable must support broadcasting. Only relevant for multi- output models (m > 1).

  • samples (Tensor) – A n_samples x b x q x m Tensor of samples drawn from the posterior.

  • eta (Tensor | float) – The temperature parameter for the sigmoid function. Can be either a float or a 1-dim tensor. In case of a float the same eta is used for every constraint in constraints. In case of a tensor the length of the tensor must match the number of provided constraints. The i-th constraint is then estimated with the i-th eta value.

Returns:

A n_samples x b x q (x m’)-dim tensor of feasibility-weighted objectives.

Return type:

Tensor

botorch.utils.objective.compute_feasibility_indicator(constraints, samples, marginalize_dim=None)[source]

Computes the feasibility of a list of constraints given posterior samples.

Parameters:
  • constraints (List[Callable[[Tensor], Tensor]] | None) – A list of callables, each mapping a batch_shape x q x m`-dim Tensor to a batch_shape x q-dim Tensor, where negative values imply feasibility.

  • samples (Tensor) – A batch_shape x q x m`-dim Tensor of posterior samples.

  • marginalize_dim (int | None) – A batch dimension that should be marginalized. For example, this is useful when using a batched fully Bayesian model.

Returns:

A batch_shape x q-dim tensor of Boolean feasibility values.

Return type:

Tensor

botorch.utils.objective.compute_smoothed_feasibility_indicator(constraints, samples, eta, log=False, fat=False)[source]

Computes the smoothed feasibility indicator of a list of constraints.

Given posterior samples, using a sigmoid to smoothly approximate the feasibility indicator of each individual constraint to ensure differentiability and high gradient signal. The fat and log options improve the numerical behavior of the smooth approximation.

NOTE: Negative constraint values are associated with feasibility.

Parameters:
  • constraints (List[Callable[[Tensor], Tensor]]) – A list of callables, each mapping a Tensor of size b x q x m to a Tensor of size b x q, where negative values imply feasibility. This callable must support broadcasting. Only relevant for multi- output models (m > 1).

  • samples (Tensor) – A n_samples x b x q x m Tensor of samples drawn from the posterior.

  • eta (Tensor | float) – The temperature parameter for the sigmoid function. Can be either a float or a 1-dim tensor. In case of a float the same eta is used for every constraint in constraints. In case of a tensor the length of the tensor must match the number of provided constraints. The i-th constraint is then estimated with the i-th eta value.

  • log (bool) – Toggles the computation of the log-feasibility indicator.

  • fat (bool) – Toggles the computation of the fat-tailed feasibility indicator.

Returns:

A n_samples x b x q-dim tensor of feasibility indicator values.

Return type:

Tensor

botorch.utils.objective.apply_constraints(obj, constraints, samples, infeasible_cost, eta=0.001)[source]

Apply constraints using an infeasible_cost M for negative objectives.

This allows feasibility-weighting an objective for the case where the objective can be negative by using the following strategy: (1) Add M to make obj non-negative; (2) Apply constraints using the sigmoid approximation; (3) Shift by -M.

Parameters:
  • obj (Tensor) – A n_samples x b x q (x m’)-dim Tensor of objective values.

  • constraints (List[Callable[[Tensor], Tensor]]) – A list of callables, each mapping a Tensor of size b x q x m to a Tensor of size b x q, where negative values imply feasibility. This callable must support broadcasting. Only relevant for multi- output models (m > 1).

  • samples (Tensor) – A n_samples x b x q x m Tensor of samples drawn from the posterior.

  • infeasible_cost (float) – The infeasible value.

  • eta (Tensor | float) – The temperature parameter of the sigmoid function. Can be either a float or a 1-dim tensor. In case of a float the same eta is used for every constraint in constraints. In case of a tensor the length of the tensor must match the number of provided constraints. The i-th constraint is then estimated with the i-th eta value.

Returns:

A n_samples x b x q (x m’)-dim tensor of feasibility-weighted objectives.

Return type:

Tensor

Rounding

Discretization (rounding) functions for acquisition optimization.

References

[Daulton2022bopr] (1,2)

S. Daulton, X. Wan, D. Eriksson, M. Balandat, M. A. Osborne, E. Bakshy. Bayesian Optimization over Discrete and Mixed Spaces via Probabilistic Reparameterization. Advances in Neural Information Processing Systems 35, 2022.

botorch.utils.rounding.approximate_round(X, tau=0.001)[source]

Diffentiable approximate rounding function.

This method is a piecewise approximation of a rounding function where each piece is a hyperbolic tangent function.

Parameters:
  • X (Tensor) – The tensor to round to the nearest integer (element-wise).

  • tau (float) – A temperature hyperparameter.

Returns:

The approximately rounded input tensor.

Return type:

Tensor

class botorch.utils.rounding.IdentitySTEFunction(*args, **kwargs)[source]

Bases: Function

Base class for functions using straight through gradient estimators.

This class approximates the gradient with the identity function.

static backward(ctx, grad_output)[source]

Use a straight-through estimator the gradient.

This uses the identity function.

Parameters:

grad_output (Tensor) – A tensor of gradients.

Returns:

The provided tensor.

Return type:

Tensor

class botorch.utils.rounding.RoundSTE(*args, **kwargs)[source]

Bases: IdentitySTEFunction

Round the input tensor and use a straight-through gradient estimator.

[Daulton2022bopr] proposes using this in acquisition optimization.

static forward(ctx, X)[source]

Round the input tensor element-wise.

Parameters:

X (Tensor) – The tensor to be rounded.

Returns:

A tensor where each element is rounded to the nearest integer.

Return type:

Tensor

class botorch.utils.rounding.OneHotArgmaxSTE(*args, **kwargs)[source]

Bases: IdentitySTEFunction

Discretize a continuous relaxation of a one-hot encoded categorical.

This returns a one-hot encoded categorical and use a straight-through gradient estimator via an identity function.

[Daulton2022bopr] proposes using this in acquisition optimization.

static forward(ctx, X)[source]

Discretize the input tensor.

This applies a argmax along the last dimensions of the input tensor and one-hot encodes the result.

Parameters:

X (Tensor) – The tensor to be rounded.

Returns:

A tensor where each element is rounded to the nearest integer.

Return type:

Tensor

Sampling

Utilities for MC and qMC sampling.

References

[Trikalinos2014polytope]

T. A. Trikalinos and G. van Valkenhoef. Efficient sampling from uniform density n-polytopes. Technical report, Brown University, 2014.

botorch.utils.sampling.manual_seed(seed=None)[source]

Contextmanager for manual setting the torch.random seed.

Parameters:

seed (int | None) – The seed to set the random number generator to.

Returns:

Generator

Return type:

Generator[None, None, None]

Example

>>> with manual_seed(1234):
>>>     X = torch.rand(3)
botorch.utils.sampling.draw_sobol_samples(bounds, n, q, batch_shape=None, seed=None)[source]

Draw qMC samples from the box defined by bounds.

Parameters:
  • bounds (Tensor) – A 2 x d dimensional tensor specifying box constraints on a d-dimensional space, where bounds[0, :] and bounds[1, :] correspond to lower and upper bounds, respectively.

  • n (int) – The number of (q-batch) samples. As a best practice, use powers of 2.

  • q (int) – The size of each q-batch.

  • batch_shape (Iterable[int] | Size | None) – The batch shape of the samples. If given, returns samples of shape n x batch_shape x q x d, where each batch is an n x q x d-dim tensor of qMC samples.

  • seed (int | None) – The seed used for initializing Owen scrambling. If None (default), use a random seed.

Returns:

A n x batch_shape x q x d-dim tensor of qMC samples from the box defined by bounds.

Return type:

Tensor

Example

>>> bounds = torch.stack([torch.zeros(3), torch.ones(3)])
>>> samples = draw_sobol_samples(bounds, 16, 2)
botorch.utils.sampling.draw_sobol_normal_samples(d, n, device=None, dtype=None, seed=None)[source]

Draw qMC samples from a multi-variate standard normal N(0, I_d).

A primary use-case for this functionality is to compute an QMC average of f(X) over X where each element of X is drawn N(0, 1).

Parameters:
  • d (int) – The dimension of the normal distribution.

  • n (int) – The number of samples to return. As a best practice, use powers of 2.

  • device (device | None) – The torch device.

  • dtype (dtype | None) – The torch dtype.

  • seed (int | None) – The seed used for initializing Owen scrambling. If None (default), use a random seed.

Returns:

A tensor of qMC standard normal samples with dimension n x d with device and dtype specified by the input.

Return type:

Tensor

Example

>>> samples = draw_sobol_normal_samples(2, 16)
botorch.utils.sampling.sample_hypersphere(d, n=1, qmc=False, seed=None, device=None, dtype=None)[source]

Sample uniformly from a unit d-sphere.

Parameters:
  • d (int) – The dimension of the hypersphere.

  • n (int) – The number of samples to return.

  • qmc (bool) – If True, use QMC Sobol sampling (instead of i.i.d. uniform).

  • seed (int | None) – If provided, use as a seed for the RNG.

  • device (device | None) – The torch device.

  • dtype (dtype | None) – The torch dtype.

Returns:

An n x d tensor of uniform samples from from the d-hypersphere.

Return type:

Tensor

Example

>>> sample_hypersphere(d=5, n=10)
botorch.utils.sampling.sample_simplex(d, n=1, qmc=False, seed=None, device=None, dtype=None)[source]

Sample uniformly from a d-simplex.

Parameters:
  • d (int) – The dimension of the simplex.

  • n (int) – The number of samples to return.

  • qmc (bool) – If True, use QMC Sobol sampling (instead of i.i.d. uniform).

  • seed (int | None) – If provided, use as a seed for the RNG.

  • device (device | None) – The torch device.

  • dtype (dtype | None) – The torch dtype.

Returns:

An n x d tensor of uniform samples from from the d-simplex.

Return type:

Tensor

Example

>>> sample_simplex(d=3, n=10)
botorch.utils.sampling.sample_polytope(A, b, x0, n=10000, n0=100, n_thinning=1, seed=None)[source]

Hit and run sampler from uniform sampling points from a polytope, described via inequality constraints A*x<=b.

Parameters:
  • A (Tensor) – A m x d-dim Tensor describing inequality constraints so that all samples satisfy Ax <= b.

  • b (Tensor) – A m-dim Tensor describing the inequality constraints so that all samples satisfy Ax <= b.

  • x0 (Tensor) – A d-dim Tensor representing a starting point of the chain satisfying the constraints.

  • n (int) – The number of resulting samples kept in the output.

  • n0 (int) – The number of burn-in samples. The chain will produce n+n0 samples but the first n0 samples are not saved.

  • n_thinning (int) – The amount of thinnning. This function will return every n_thinning-th sample from the chain (after burn-in).

  • seed (int | None) – The seed for the sampler. If omitted, use a random seed.

Returns:

(n, d) dim Tensor containing the resulting samples.

Return type:

Tensor

botorch.utils.sampling.batched_multinomial(weights, num_samples, replacement=False, generator=None, out=None)[source]

Sample from multinomial with an arbitrary number of batch dimensions.

Parameters:
  • weights (Tensor) – A batch_shape x num_categories tensor of weights. For each batch index i, j, …, this functions samples from a multinomial with input weights[i, j, …, :]. Note that the weights need not sum to one, but must be non-negative, finite and have a non-zero sum.

  • num_samples (int) – The number of samples to draw for each batch index. Must be smaller than num_categories if replacement=False.

  • replacement (bool) – If True, samples are drawn with replacement.

  • generator (Generator | None) – A a pseudorandom number generator for sampling.

  • out (Tensor | None) – The output tensor (optional). If provided, must be of size batch_shape x num_samples.

Returns:

A batch_shape x num_samples tensor of samples.

Return type:

LongTensor

This is a thin wrapper around torch.multinomial that allows weight (input) tensors with an arbitrary number of batch dimensions (torch.multinomial only allows a single batch dimension). The calling signature is the same as for torch.multinomial.

Example

>>> weights = torch.rand(2, 3, 10)
>>> samples = batched_multinomial(weights, 4)  # shape is 2 x 3 x 4
botorch.utils.sampling.find_interior_point(A, b, A_eq=None, b_eq=None)[source]

Find an interior point of a polytope via linear programming.

Parameters:
  • A (ndarray) – A n_ineq x d-dim numpy array containing the coefficients of the constraint inequalities.

  • b (ndarray) – A n_ineq x 1-dim numpy array containing the right hand sides of the constraint inequalities.

  • A_eq (ndarray | None) – A n_eq x d-dim numpy array containing the coefficients of the constraint equalities.

  • b_eq (ndarray | None) – A n_eq x 1-dim numpy array containing the right hand sides of the constraint equalities.

Returns:

A d-dim numpy array containing an interior point of the polytope. This function will raise a ValueError if there is no such point.

Return type:

ndarray

This method solves the following Linear Program:

min -s subject to A @ x <= b - 2 * s, s >= 0, A_eq @ x = b_eq

In case the polytope is unbounded, then it will also constrain the slack variable s to s<=1.

class botorch.utils.sampling.HitAndRunPolytopeSampler(inequality_constraints=None, equality_constraints=None, bounds=None, interior_point=None, n_burnin=200, n_thinning=20, seed=None)[source]

Bases: PolytopeSampler

A sampler for sampling from a polyope using a hit-and-run algorithm.

A sampler for sampling from a polyope using a hit-and-run algorithm.

Parameters:
  • inequality_constraints (Optional[Tuple[Tensor, Tensor]]) – Tensors (A, b) describing inequality constraints A @ x <= b, where A is a n_ineq_con x d-dim Tensor and b is a n_ineq_con x 1-dim Tensor, with n_ineq_con the number of inequalities and d the dimension of the sample space.

  • equality_constraints (Optional[Tuple[Tensor, Tensor]]) – Tensors (C, d) describing the equality constraints C @ x = d, where C is a n_eq_con x d-dim Tensor and d is a n_eq_con x 1-dim Tensor with n_eq_con the number of equalities.

  • bounds (Optional[Tensor]) – A 2 x d-dim tensor of box bounds, where inf (-inf) means that the respective dimension is unbounded from above (below). If omitted, no bounds (in addition to the above constraints) are applied.

  • interior_point (Optional[Tensor]) – A d x 1-dim Tensor representing a point in the (relative) interior of the polytope. If omitted, determined automatically by solving a Linear Program.

  • n_burnin (int) – The number of burn in samples. The sampler will discard n_burnin samples before returning the first sample.

  • n_thinning (int) – The amount of thinning. The sampler will return every n_thinning sample (after burn-in). This may need to be increased for sets of constraints that are difficult to satisfy (i.e. in which case the volume of the constraint polytope is small relative to that of its bounding box).

  • seed (Optional[int]) – The random seed.

draw(n=1)[source]

Draw samples from the polytope.

Parameters:

n (int) – The number of samples.

Returns:

A n x d Tensor of samples from the polytope.

Return type:

Tensor

class botorch.utils.sampling.DelaunayPolytopeSampler(inequality_constraints=None, equality_constraints=None, bounds=None, interior_point=None)[source]

Bases: PolytopeSampler

A polytope sampler using Delaunay triangulation.

This sampler first enumerates the vertices of the constraint polytope and then uses a Delaunay triangulation to tesselate its convex hull.

The sampling happens in two stages: 1. First, we sample from the set of hypertriangles generated by the Delaunay triangulation (i.e. which hyper-triangle to draw the sample from) with probabilities proportional to the triangle volumes. 2. Then, we sample uniformly from the chosen hypertriangle by sampling uniformly from the unit simplex of the appropriate dimension, and then computing the convex combination of the vertices of the hypertriangle according to that draw from the simplex.

The best reference (not exactly the same, but functionally equivalent) is [Trikalinos2014polytope]. A simple R implementation is available at https://github.com/gertvv/tesselample.

Initialize DelaunayPolytopeSampler.

Parameters:
  • inequality_constraints (Optional[Tuple[Tensor, Tensor]]) – Tensors (A, b) describing inequality constraints A @ x <= b, where A is a n_ineq_con x d-dim Tensor and b is a n_ineq_con x 1-dim Tensor, with n_ineq_con the number of inequalities and d the dimension of the sample space.

  • equality_constraints (Optional[Tuple[Tensor, Tensor]]) – Tensors (C, d) describing the equality constraints C @ x = d, where C is a n_eq_con x d-dim Tensor and d is a n_eq_con x 1-dim Tensor with n_eq_con the number of equalities.

  • bounds (Optional[Tensor]) – A 2 x d-dim tensor of box bounds, where inf (-inf) means that the respective dimension is unbounded from above (below).

  • interior_point (Optional[Tensor]) – A d x 1-dim Tensor representing a point in the (relative) interior of the polytope. If omitted, determined automatically by solving a Linear Program.

Warning: The vertex enumeration performed in this algorithm can become extremely costly if there are a large number of inequalities. Similarly, the triangulation can get very expensive in high dimensions. Only use this algorithm for moderate dimensions / moderately complex constraint sets. An alternative is the HitAndRunPolytopeSampler.

draw(n=1, seed=None)[source]

Draw samples from the polytope.

Parameters:
  • n (int) – The number of samples.

  • seed (int | None) – The random seed.

Returns:

A n x d Tensor of samples from the polytope.

Return type:

Tensor

botorch.utils.sampling.normalize_sparse_linear_constraints(bounds, constraints)[source]

Normalize sparse linear constraints to the unit cube.

Parameters:
  • bounds (Tensor) – A 2 x d-dim tensor containing the box bounds.

  • constraints (List[Tuple[Tensor, Tensor, float]]) – A list of tuples (indices, coefficients, rhs), with each tuple encoding an inequality constraint of the form sum_i (X[indices[i]] * coefficients[i]) >= rhs or sum_i (X[indices[i]] * coefficients[i]) = rhs.

Return type:

List[Tuple[Tensor, Tensor, float]]

botorch.utils.sampling.normalize_dense_linear_constraints(bounds, constraints)[source]

Normalize dense linear constraints to the unit cube.

Parameters:
  • bounds (Tensor) – A 2 x d-dim tensor containing the box bounds.

  • constraints (Tuple[Tensor, Tensor]) – A tensor tuple (A, b) describing constraints A @ x (<)= b, where A is a n_con x d-dim Tensor and b is a n_con x 1-dim Tensor, with n_con the number of constraints and d the dimension of the sample space.

Returns:

A tensor tuple (A_nlz, b_nlz) of normalized constraints.

Return type:

Tuple[Tensor, Tensor]

botorch.utils.sampling.get_polytope_samples(n, bounds, inequality_constraints=None, equality_constraints=None, seed=None, n_burnin=10000, n_thinning=32)[source]

Sample from polytope defined by box bounds and (in)equality constraints.

This uses a hit-and-run Markov chain sampler.

NOTE: Much of the functionality of this method has been moved into HitAndRunPolytopeSampler. If you want to repeatedly draw samples, you should use HitAndRunPolytopeSampler directly in order to avoid repeatedly running a burn-in of the chain. To do so, you need to convert the sparse constraint format that get_polytope_samples expects to the dense constraint format that HitAndRunPolytopeSampler expects. This can be done via the sparse_to_dense_constraints method (but remember to adjust the constraint from the Ax >= b format expecxted here to the Ax <= b format expected by PolytopeSampler by multiplying both A and b by -1.)

Parameters:
  • n (int) – The number of samples.

  • bounds (Tensor) – A 2 x d-dim tensor containing the box bounds.

  • constraints (equality) – A list of tuples (indices, coefficients, rhs), with each tuple encoding an inequality constraint of the form sum_i (X[indices[i]] * coefficients[i]) >= rhs.

  • constraints – A list of tuples (indices, coefficients, rhs), with each tuple encoding an inequality constraint of the form sum_i (X[indices[i]] * coefficients[i]) = rhs.

  • seed (int | None) – The random seed.

  • n_burnin (int) – The number of burn-in samples for the Markov chain sampler.

  • n_thinning (int) – The amount of thinnning. This function will return every n_thinning-th sample from the chain (after burn-in).

  • inequality_constraints (List[Tuple[Tensor, Tensor, float]] | None)

  • equality_constraints (List[Tuple[Tensor, Tensor, float]] | None)

Returns:

A n x d-dim tensor of samples.

Return type:

Tensor

botorch.utils.sampling.sparse_to_dense_constraints(d, constraints)[source]

Convert parameter constraints from a sparse format into a dense format.

This method converts sparse triples of the form (indices, coefficients, rhs) to constraints of the form Ax >= b or Ax = b.

Parameters:
  • d (int) – The input dimension.

  • constraints (inequality) – A list of tuples (indices, coefficients, rhs), with each tuple encoding an (in)equality constraint of the form sum_i (X[indices[i]] * coefficients[i]) >= rhs or sum_i (X[indices[i]] * coefficients[i]) = rhs.

Returns:

  • A: A n_constraints x d-dim tensor of coefficients.

  • b: A n_constraints x 1-dim tensor of right hand sides.

Return type:

A two-element tuple containing

botorch.utils.sampling.optimize_posterior_samples(paths, bounds, candidates=None, raw_samples=1024, num_restarts=20, maximize=True, **kwargs)[source]

Cheaply maximizes posterior samples by random querying followed by vanilla gradient descent on the best num_restarts points.

Parameters:
  • paths (SamplePath) – Random Fourier Feature-based sample paths from the GP

  • bounds (Tensor) – The bounds on the search space.

  • candidates (Optional[Tensor]) – A priori good candidates (typically previous design points) which acts as extra initial guesses for the optimization routine.

  • raw_samples (Optional[int]) – The number of samples with which to query the samples initially.

  • num_restarts (int) – The number of points selected for gradient-based optimization.

  • maximize (bool) – Boolean indicating whether to maimize or minimize

  • kwargs (Any)

Returns:

  • X_opt: A num_optima x [batch_size] x d-dim tensor of optimal inputs x*.

  • f_opt: A num_optima x [batch_size] x 1-dim tensor of optimal outputs f*.

Return type:

A two-element tuple containing

Sampling from GP priors

class botorch.utils.gp_sampling.GPDraw(model, seed=None)[source]

Bases: Module

Convenience wrapper for sampling a function from a GP prior.

This wrapper implicitly defines the GP sample as a self-updating function by keeping track of the evaluated points and respective base samples used during the evaluation.

This does not yet support multi-output models.

Construct a GP function sampler.

Parameters:
  • model (Model) – The Model defining the GP prior.

  • seed (Optional[int])

property Xs: Tensor

A (batch_shape) x n_eval x d-dim tensor of locations at which the GP was evaluated (or None if the sample has never been evaluated).

property Ys: Tensor

A (batch_shape) x n_eval x d-dim tensor of associated function values (or None if the sample has never been evaluated).

forward(X)[source]

Evaluate the GP sample function at a set of points X.

Parameters:

X (Tensor) – A batch_shape x n x d-dim tensor of points

Returns:

The value of the GP sample at the n points.

Return type:

Tensor

class botorch.utils.gp_sampling.RandomFourierFeatures(kernel, input_dim, num_rff_features, sample_shape=None)[source]

Bases: Module

A class that represents Random Fourier Features.

Initialize RandomFourierFeatures.

Parameters:
  • kernel (Kernel) – The GP kernel.

  • input_dim (int) – The input dimension to the GP kernel.

  • num_rff_features (int) – The number of Fourier features.

  • sample_shape (Optional[torch.Size]) – The shape of a single sample. For a single-element torch.Size object, this is simply the number of RFF draws.

forward(X)[source]

Get Fourier basis features for the provided inputs.

Note that the right-most subset of the batch shape of X should be (sample_shape) x (kernel_batch_shape) if using either the sample_shape argument or a batched kernel. In other words, X should be of shape (added_batch_shape) x (sample_shape) x (kernel_batch_shape) x n x input_dim, where parantheses denote that the given batch shape can be empty. X can always be a tensor of shape n x input_dim, in which case broadcasting will take care of the batch shape. This will raise a ValueError if the batch shapes are not compatible.

Parameters:

X (Tensor) – Input tensor of shape (batch_shape) x n x input_dim.

Returns:

A Tensor of shape (batch_shape) x n x rff. If X does not have a batch_shape, the output batch_shape will be (sample_shape) x (kernel_batch_shape).

Return type:

Tensor

botorch.utils.gp_sampling.get_deterministic_model_multi_samples(weights, bases)[source]

Get a batched deterministic model that batch evaluates n_samples function samples. This supports multi-output models as well.

Parameters:
  • weights (List[Tensor]) – A list of weights with num_outputs elements. Each weight is of shape (batch_shape_input) x n_samples x num_rff_features, where (batch_shape_input) is the batch shape of the inputs used to obtain the posterior weights.

  • bases (List[RandomFourierFeatures]) – A list of RandomFourierFeatures with num_outputs elements. Each basis has a sample shape of n_samples.

  • n_samples – The number of function samples.

Returns:

A batched GenericDeterministicModel`s that batch evaluates `n_samples function samples.

Return type:

GenericDeterministicModel

botorch.utils.gp_sampling.get_eval_gp_sample_callable(w, basis)[source]
Parameters:
Return type:

Tensor

botorch.utils.gp_sampling.get_deterministic_model(weights, bases)[source]

Get a deterministic model using the provided weights and bases for each output.

Parameters:
  • weights (List[Tensor]) – A list of weights with m elements.

  • bases (List[RandomFourierFeatures]) – A list of RandomFourierFeatures with m elements.

Returns:

A deterministic model.

Return type:

GenericDeterministicModel

botorch.utils.gp_sampling.get_deterministic_model_list(weights, bases)[source]

Get a deterministic model list using the provided weights and bases for each output.

Parameters:
  • weights (List[Tensor]) – A list of weights with m elements.

  • bases (List[RandomFourierFeatures]) – A list of RandomFourierFeatures with m elements.

Returns:

A deterministic model.

Return type:

ModelList

botorch.utils.gp_sampling.get_weights_posterior(X, y, sigma_sq)[source]

Sample bayesian linear regression weights.

Parameters:
  • X (Tensor) – A tensor of inputs with shape (*batch_shape, n num_rff_features).

  • y (Tensor) – A tensor of outcomes with shape (*batch_shape, n).

  • sigma_sq (Tensor) – The likelihood noise variance. This should be a tensor with shape kernel_batch_shape, 1, 1 if using a batched kernel. Otherwise, it should be a scalar tensor.

Returns:

The posterior distribution over the weights.

Return type:

MultivariateNormal

botorch.utils.gp_sampling.get_gp_samples(model, num_outputs, n_samples, num_rff_features=512)[source]

Sample functions from GP posterior using RFFs. The returned GenericDeterministicModel effectively wraps num_outputs models, each of which has a batch shape of n_samples. Refer get_deterministic_model_multi_samples for more details.

NOTE: If using input / outcome transforms, the gp samples must be accessed via the gp_sample.posterior(X) call. Otherwise, gp_sample(X) will produce bogus values that do not agree with the underlying model. It is also highly recommended to use outcome transforms to standardize the input data, since the gp samples do not work well when training outcomes are not zero-mean.

Parameters:
  • model (Model) – The model.

  • num_outputs (int) – The number of outputs.

  • n_samples (int) – The number of functions to be sampled IID.

  • num_rff_features (int) – The number of random Fourier features.

Returns:

A GenericDeterministicModel that evaluates n_samples sampled functions. If n_samples > 1, this will be a batched model.

Return type:

GenericDeterministicModel

Testing

class botorch.utils.testing.BotorchTestCase(methodName='runTest')[source]

Bases: TestCase

Basic test case for Botorch.

This
  1. sets the default device to be torch.device(“cpu”)

  2. ensures that no warnings are suppressed by default.

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

device = device(type='cpu')
setUp(suppress_input_warnings=True)[source]

Hook method for setting up the test fixture before exercising it.

Parameters:

suppress_input_warnings (bool)

Return type:

None

assertAllClose(input, other, rtol=1e-05, atol=1e-08, equal_nan=False)[source]

Calls torch.testing.assert_close, using the signature and default behavior of torch.allclose.

Example output:

AssertionError: Scalars are not close!

Absolute difference: 1.0000034868717194 (up to 0.0001 allowed) Relative difference: 0.8348668001940709 (up to 1e-05 allowed)

Parameters:
  • input (Any)

  • other (Any)

  • rtol (float)

  • atol (float)

  • equal_nan (bool)

Return type:

None

class botorch.utils.testing.BaseTestProblemTestCaseMixIn[source]

Bases: object

test_forward()[source]
abstract property functions: Sequence[BaseTestProblem]
class botorch.utils.testing.SyntheticTestFunctionTestCaseMixin[source]

Bases: object

test_optimal_value()[source]
test_optimizer()[source]
class botorch.utils.testing.MultiObjectiveTestProblemTestCaseMixin[source]

Bases: object

test_attributes()[source]
test_max_hv()[source]
test_ref_point()[source]
class botorch.utils.testing.ConstrainedTestProblemTestCaseMixin[source]

Bases: object

test_num_constraints()[source]
test_evaluate_slack()[source]
class botorch.utils.testing.MockPosterior(mean=None, variance=None, samples=None, base_shape=None, batch_range=None)[source]

Bases: Posterior

Mock object that implements dummy methods and feeds through specified outputs

Parameters:
  • mean – The mean of the posterior.

  • variance – The variance of the posterior.

  • samples – Samples to return from rsample, unless base_samples is provided.

  • base_shape – If given, this is returned as base_sample_shape, and also used as the base of the _extended_shape.

  • batch_range – If given, this is returned as batch_range. Defaults to (0, -2).

property device: device

The torch device of the distribution.

property dtype: dtype

The torch dtype of the distribution.

property batch_shape: Size
property base_sample_shape: Size

The base shape of the base samples expected in rsample.

Informs the sampler to produce base samples of shape sample_shape x base_sample_shape.

property batch_range: Tuple[int, int]

The t-batch range.

This is used in samplers to identify the t-batch component of the base_sample_shape. The base samples are expanded over the t-batches to provide consistency in the acquisition values, i.e., to ensure that a candidate produces same value regardless of its position on the t-batch.

property mean
property variance
rsample(sample_shape=None)[source]

Mock sample by repeating self._samples. If base_samples is provided, do a shape check but return the same mock samples.

Parameters:

sample_shape (Size | None)

Return type:

Tensor

rsample_from_base_samples(sample_shape, base_samples)[source]

Sample from the posterior (with gradients) using base samples.

This is intended to be used with a sampler that produces the corresponding base samples, and enables acquisition optimization via Sample Average Approximation.

Parameters:
  • sample_shape (Size) – A torch.Size object specifying the sample shape. To draw n samples, set to torch.Size([n]). To draw b batches of n samples each, set to torch.Size([b, n]).

  • base_samples (Tensor) – The base samples, obtained from the appropriate sampler. This is a tensor of shape sample_shape x base_sample_shape.

Returns:

Samples from the posterior, a tensor of shape self._extended_shape(sample_shape=sample_shape).

Return type:

Tensor

class botorch.utils.testing.MockModel(posterior)[source]

Bases: Model, FantasizeMixin

Mock object that implements dummy methods and feeds through specified outputs

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters:

posterior (MockPosterior)

posterior(X, output_indices=None, posterior_transform=None, observation_noise=False)[source]

Computes the posterior over model outputs at the provided points.

Note: The input transforms should be applied here using

self.transform_inputs(X) after the self.eval() call and before any model.forward or model.likelihood calls.

Parameters:
  • X (Tensor) – A b x q x d-dim Tensor, where d is the dimension of the feature space, q is the number of points considered jointly, and b is the batch dimension.

  • output_indices (List[int] | None) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.

  • observation_noise (bool) – For models with an inferred noise level, if True, include observation noise. For models with an observed noise level, this must be a model_batch_shape x 1 x m-dim tensor or a model_batch_shape x n’ x m-dim tensor containing the average noise for each batch and output. noise must be in the outcome-transformed space if an outcome transform is used.

  • posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.

Returns:

A Posterior object, representing a batch of b joint distributions over q points and m outputs each.

Return type:

MockPosterior

property num_outputs: int

The number of outputs of the model.

property batch_shape: Size

The batch shape of the model.

This is a batch shape from an I/O perspective, independent of the internal representation of the model (as e.g. in BatchedMultiOutputGPyTorchModel). For a model with m outputs, a test_batch_shape x q x d-shaped input X to the posterior method returns a Posterior object over an output of shape broadcast(test_batch_shape, model.batch_shape) x q x m.

state_dict()[source]

Return a dictionary containing references to the whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Parameters and buffers set to None are not included.

Note

The returned object is a shallow copy. It contains references to the module’s parameters and buffers.

Warning

Currently state_dict() also accepts positional arguments for destination, prefix and keep_vars in order. However, this is being deprecated and keyword arguments will be enforced in future releases.

Warning

Please avoid the use of argument destination as it is not designed for end-users.

Parameters:
  • destination (dict, optional) – If provided, the state of module will be updated into the dict and the same object is returned. Otherwise, an OrderedDict will be created and returned. Default: None.

  • prefix (str, optional) – a prefix added to parameter and buffer names to compose the keys in state_dict. Default: ''.

  • keep_vars (bool, optional) – by default the Tensor s returned in the state dict are detached from autograd. If it’s set to True, detaching will not be performed. Default: False.

Returns:

a dictionary containing a whole state of the module

Return type:

dict

Example:

>>> # xdoctest: +SKIP("undefined vars")
>>> module.state_dict().keys()
['bias', 'weight']
load_state_dict(state_dict=None, strict=False)[source]

Copy parameters and buffers from state_dict into this module and its descendants.

If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Warning

If assign is True the optimizer must be created after the call to load_state_dict unless get_swap_module_params_on_conversion() is True.

Parameters:
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

  • assign (bool, optional) – When False, the properties of the tensors in the current module are preserved while when True, the properties of the Tensors in the state dict are preserved. The only exception is the requires_grad field of Default: ``False`

Returns:

  • missing_keys is a list of str containing the missing keys

  • unexpected_keys is a list of str containing the unexpected keys

Return type:

NamedTuple with missing_keys and unexpected_keys fields

Note

If a parameter or buffer is registered as None and its corresponding key exists in state_dict, load_state_dict() will raise a RuntimeError.

class botorch.utils.testing.MockAcquisitionFunction[source]

Bases: object

Mock acquisition function object that implements dummy methods.

set_X_pending(X_pending=None)[source]
Parameters:

X_pending (Tensor | None)

Test Helpers

Dummy classes and other helpers that are used in multiple test files should be defined here to avoid relative imports.

botorch.utils.test_helpers.get_sample_moments(samples, sample_shape)[source]

Computes the mean and covariance of a set of samples.

Parameters:
  • samples (Tensor) – A tensor of shape sample_shape x batch_shape x q.

  • sample_shape (Size) – The sample_shape input used while generating the samples using the pathwise sampling API.

Return type:

Tuple[Tensor, Tensor]

botorch.utils.test_helpers.standardize_moments(transform, loc, covariance_matrix)[source]

Standardizes the loc and covariance_matrix using the mean and standard deviations from a Standardize transform.

Parameters:
  • transform (Standardize)

  • loc (Tensor)

  • covariance_matrix (Tensor)

Return type:

Tuple[Tensor, Tensor]

botorch.utils.test_helpers.gen_multi_task_dataset(yvar=None, task_values=None, skip_task_features_in_datasets=False, **tkwargs)[source]

Constructs a multi-task dataset with two tasks, each with 10 data points.

Parameters:
  • yvar (float | None) – The noise level to use for train_Yvar. If None, uses train_Yvar=None.

  • task_values (List[int] | None) – The values of the task features. If None, uses [0, 1].

  • skip_task_features_in_datasets (bool) – If True, the task features are not included in Xs of the datasets used to construct the datasets. This is useful for testing MultiTaskDataset.

Return type:

Tuple[MultiTaskDataset, Tuple[Tensor, Tensor, Tensor | None]]

botorch.utils.test_helpers.get_pvar_expected(posterior, model, X, m)[source]

Computes the expected variance of a posterior after adding the predictive noise from the likelihood.

Parameters:
  • posterior (Posterior)

  • model (Model)

  • X (Tensor)

  • m (int)

Return type:

Tensor

Torch

class botorch.utils.torch.BufferDict(buffers=None)[source]

Bases: Module

Holds buffers in a dictionary.

BufferDict can be indexed like a regular Python dictionary, but buffers it contains are properly registered, and will be visible by all Module methods.

BufferDict is an ordered dictionary that respects

  • the order of insertion, and

  • in update(), the order of the merged OrderedDict or another BufferDict (the argument to update()).

Note that update() with other unordered mapping types (e.g., Python’s plain dict) does not preserve the order of the merged mapping.

Parameters:

buffers (iterable, optional) – a mapping (dictionary) of (string : Tensor) or an iterable of key-value pairs of type (string, Tensor)

Example:

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.buffers = nn.BufferDict({
                'left': torch.randn(5, 10),
                'right': torch.randn(5, 10)
        })

    def forward(self, x, choice):
        x = self.buffers[choice].mm(x)
        return x
Parameters:

buffers – A mapping (dictionary) from string to Tensor, or an iterable of key-value pairs of type (string, Tensor).

clear()[source]

Remove all items from the BufferDict.

pop(key)[source]

Remove key from the BufferDict and return its buffer.

Parameters:

key (string) – key to pop from the BufferDict

keys()[source]

Return an iterable of the BufferDict keys.

items()[source]

Return an iterable of the BufferDict key/value pairs.

values()[source]

Return an iterable of the BufferDict values.

update(buffers)[source]

Update the BufferDict with the key-value pairs from a mapping or an iterable, overwriting existing keys.

Note

If buffers is an OrderedDict, a BufferDict, or an iterable of key-value pairs, the order of new elements in it is preserved.

Parameters:

buffers (iterable) – a mapping (dictionary) from string to Tensor, or an iterable of key-value pairs of type (string, Tensor)

extra_repr()[source]

Set the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

Transformations

Some basic data transformation helpers.

botorch.utils.transforms.standardize(Y)[source]

Standardizes (zero mean, unit variance) a tensor by dim=-2.

If the tensor is single-dimensional, simply standardizes the tensor. If for some batch index all elements are equal (or if there is only a single data point), this function will return 0 for that batch index.

Parameters:

Y (Tensor) – A batch_shape x n x m-dim tensor.

Returns:

The standardized Y.

Return type:

Tensor

Example

>>> Y = torch.rand(4, 3)
>>> Y_standardized = standardize(Y)
botorch.utils.transforms.normalize(X, bounds)[source]

Min-max normalize X w.r.t. the provided bounds.

NOTE: If the upper and lower bounds are identical for a dimension, that dimension will not be scaled. Such dimensions will only be shifted as new_X[…, i] = X[…, i] - bounds[0, i]. This avoids division by zero issues.

Parameters:
  • X (Tensor) – … x d tensor of data

  • bounds (Tensor) – 2 x d tensor of lower and upper bounds for each of the X’s d columns.

Returns:

A … x d-dim tensor of normalized data, given by

(X - bounds[0]) / (bounds[1] - bounds[0]). If all elements of X are contained within bounds, the normalized values will be contained within [0, 1]^d.

Return type:

Tensor

Example

>>> X = torch.rand(4, 3)
>>> bounds = torch.stack([torch.zeros(3), 0.5 * torch.ones(3)])
>>> X_normalized = normalize(X, bounds)
botorch.utils.transforms.unnormalize(X, bounds)[source]

Un-normalizes X w.r.t. the provided bounds.

NOTE: If the upper and lower bounds are identical for a dimension, that dimension will not be scaled. Such dimensions will only be shifted as new_X[…, i] = X[…, i] + bounds[0, i], matching the behavior of normalize.

Parameters:
  • X (Tensor) – … x d tensor of data

  • bounds (Tensor) – 2 x d tensor of lower and upper bounds for each of the X’s d columns.

Returns:

A … x d-dim tensor of unnormalized data, given by

X * (bounds[1] - bounds[0]) + bounds[0]. If all elements of X are contained in [0, 1]^d, the un-normalized values will be contained within bounds.

Return type:

Tensor

Example

>>> X_normalized = torch.rand(4, 3)
>>> bounds = torch.stack([torch.zeros(3), 0.5 * torch.ones(3)])
>>> X = unnormalize(X_normalized, bounds)
botorch.utils.transforms.normalize_indices(indices, d)[source]

Normalize a list of indices to ensure that they are positive.

Parameters:
  • indices (List[int] | None) – A list of indices (may contain negative indices for indexing “from the back”).

  • d (int) – The dimension of the tensor to index.

Returns:

A normalized list of indices such that each index is between 0 and d-1, or None if indices is None.

Return type:

List[int] | None

botorch.utils.transforms.is_fully_bayesian(model)[source]

Check if at least one model is a fully Bayesian model.

Parameters:

model (Model) – A BoTorch model (may be a ModelList or ModelListGP)

Returns:

True if at least one model is a fully Bayesian model.

Return type:

bool

botorch.utils.transforms.is_ensemble(model)[source]

Check if at least one model is an ensemble model.

Parameters:

model (Model) – A BoTorch model (may be a ModelList or ModelListGP)

Returns:

True if at least one model is an ensemble model.

Return type:

bool

botorch.utils.transforms.t_batch_mode_transform(expected_q=None, assert_output_shape=True)[source]

Factory for decorators enabling consistent t-batch behavior.

This method creates decorators for instance methods to transform an input tensor X to t-batch mode (i.e. with at least 3 dimensions). This assumes the tensor has a q-batch dimension. The decorator also checks the q-batch size if expected_q is provided, and the output shape if assert_output_shape is True.

Parameters:
  • expected_q (Optional[int]) – The expected q-batch size of X. If specified, this will raise an AssertionError if X’s q-batch size does not equal expected_q.

  • assert_output_shape (bool) – If True, this will raise an AssertionError if the output shape does not match either the t-batch shape of X, or the acqf.model.batch_shape for acquisition functions using batched models.

Returns:

The decorated instance method.

Return type:

Callable[[Callable[[AcquisitionFunction, Any], Any]], Callable[[AcquisitionFunction, Any], Any]]

Example

>>> class ExampleClass:
>>>     @t_batch_mode_transform(expected_q=1)
>>>     def single_q_method(self, X):
>>>         ...
>>>
>>>     @t_batch_mode_transform()
>>>     def arbitrary_q_method(self, X):
>>>         ...
botorch.utils.transforms.concatenate_pending_points(method)[source]

Decorator concatenating X_pending into an acquisition function’s argument.

This decorator works on the forward method of acquisition functions taking a tensor X as the argument. If the acquisition function has an X_pending attribute (that is not None), this is concatenated into the input X, appropriately expanding the pending points to match the batch shape of X.

Example

>>> class ExampleAcquisitionFunction:
>>>     @concatenate_pending_points
>>>     @t_batch_mode_transform()
>>>     def forward(self, X):
>>>         ...
Parameters:

method (Callable[[Any, Tensor], Any])

Return type:

Callable[[Any, Tensor], Any]

botorch.utils.transforms.match_batch_shape(X, Y)[source]

Matches the batch dimension of a tensor to that of another tensor.

Parameters:
  • X (Tensor) – A batch_shape_X x q x d tensor, whose batch dimensions that correspond to batch dimensions of Y are to be matched to those (if compatible).

  • Y (Tensor) – A batch_shape_Y x q’ x d tensor.

Returns:

A batch_shape_Y x q x d tensor containing the data of X expanded to the batch dimensions of Y (if compatible). For instance, if X is b’’ x b’ x q x d and Y is b x q x d, then the returned tensor is b’’ x b x q x d.

Return type:

Tensor

Example

>>> X = torch.rand(2, 1, 5, 3)
>>> Y = torch.rand(2, 6, 4, 3)
>>> X_matched = match_batch_shape(X, Y)
>>> X_matched.shape
torch.Size([2, 6, 5, 3])
botorch.utils.transforms.convert_to_target_pre_hook(module, *args)[source]

Pre-hook for automatically calling .to(X) on module prior to forward

Feasible Volume

botorch.utils.feasible_volume.get_feasible_samples(samples, inequality_constraints=None)[source]

Checks which of the samples satisfy all of the inequality constraints.

Parameters:
  • samples (Tensor) – A sample size x d size tensor of feature samples, where d is a feature dimension.

  • constraints (inequality) – A list of tuples (indices, coefficients, rhs), with each tuple encoding an inequality constraint of the form sum_i (X[indices[i]] * coefficients[i]) >= rhs.

  • inequality_constraints (List[Tuple[Tensor, Tensor, float]] | None)

Returns:

2-element tuple containing

  • Samples satisfying the linear constraints.

  • Estimated proportion of samples satisfying the linear constraints.

Return type:

Tuple[Tensor, float]

botorch.utils.feasible_volume.get_outcome_feasibility_probability(model, X, outcome_constraints, threshold=0.1, nsample_outcome=1000, seed=None)[source]

Monte Carlo estimate of the feasible volume with respect to the outcome constraints.

Parameters:
  • model (Model) – The model used for sampling the posterior.

  • X (Tensor) – A tensor of dimension batch-shape x 1 x d, where d is feature dimension.

  • outcome_constraints (List[Callable[[Tensor], Tensor]]) – A list of callables, each mapping a Tensor of dimension sample_shape x batch-shape x q x m to a Tensor of dimension sample_shape x batch-shape x q, where negative values imply feasibility.

  • threshold (float) – A lower limit for the probability of posterior samples feasibility.

  • nsample_outcome (int) – The number of samples from the model posterior.

  • seed (int | None) – The seed for the posterior sampler. If omitted, use a random seed.

Returns:

Estimated proportion of features for which posterior samples satisfy given outcome constraints with probability above or equal to the given threshold.

Return type:

float

botorch.utils.feasible_volume.estimate_feasible_volume(bounds, model, outcome_constraints, inequality_constraints=None, nsample_feature=1000, nsample_outcome=1000, threshold=0.1, verbose=False, seed=None, device=None, dtype=None)[source]

Monte Carlo estimate of the feasible volume with respect to feature constraints and outcome constraints.

Parameters:
  • bounds (Tensor) – A 2 x d tensor of lower and upper bounds for each column of X.

  • model (Model) – The model used for sampling the outcomes.

  • outcome_constraints (List[Callable[[Tensor], Tensor]]) – A list of callables, each mapping a Tensor of dimension sample_shape x batch-shape x q x m to a Tensor of dimension sample_shape x batch-shape x q, where negative values imply feasibility.

  • constraints (inequality) – A list of tuples (indices, coefficients, rhs), with each tuple encoding an inequality constraint of the form sum_i (X[indices[i]] * coefficients[i]) >= rhs.

  • nsample_feature (int) – The number of feature samples satisfying the bounds.

  • nsample_outcome (int) – The number of outcome samples from the model posterior.

  • threshold (float) – A lower limit for the probability of outcome feasibility

  • seed (int | None) – The seed for both feature and outcome samplers. If omitted, use a random seed.

  • verbose (bool) – An indicator for whether to log the results.

  • inequality_constraints (List[Tuple[Tensor, Tensor, float]] | None)

  • device (device | None)

  • dtype (dtype | None)

Returns:

  • Estimated proportion of volume in feature space that is

    feasible wrt the bounds and the inequality constraints (linear).

  • Estimated proportion of feasible features for which

    posterior samples (outcome) satisfies the outcome constraints with probability above the given threshold.

Return type:

2-element tuple containing

Types and Type Hints

class botorch.utils.types.DEFAULT

Bases: object

Constants

botorch.utils.constants.get_constants(values, device=None, dtype=None)[source]

Returns scalar-valued Tensors containing each of the given constants. Used to expedite tensor operations involving scalar arithmetic. Note that the returned Tensors should not be modified in-place.

Parameters:
  • values (Number | Iterator[Number])

  • device (device | None)

  • dtype (dtype | None)

Return type:

Tensor | Tuple[Tensor, …]

botorch.utils.constants.get_constants_like(values, ref)[source]
Parameters:
  • values (Number | Iterator[Number])

  • ref (Tensor)

Return type:

Tensor | Iterator[Tensor]

Safe Math

Special implementations of mathematical functions that solve numerical issues of naive implementations.

[Maechler2012accurate] (1,2)
  1. Mächler. Accurately Computing log (1 - exp (-| a|))

    Assessed by the Rmpfr package. Technical report, 2012.

botorch.utils.safe_math.exp(x, **kwargs)[source]
Parameters:

x (Tensor)

Return type:

Tensor

botorch.utils.safe_math.log(x, **kwargs)[source]
Parameters:

x (Tensor)

Return type:

Tensor

botorch.utils.safe_math.add(a, b, **kwargs)[source]
Parameters:
  • a (Tensor)

  • b (Tensor)

Return type:

Tensor

botorch.utils.safe_math.sub(a, b)[source]
Parameters:
  • a (Tensor)

  • b (Tensor)

Return type:

Tensor

botorch.utils.safe_math.div(a, b)[source]
Parameters:
  • a (Tensor)

  • b (Tensor)

Return type:

Tensor

botorch.utils.safe_math.mul(a, b)[source]
Parameters:
  • a (Tensor)

  • b (Tensor)

Return type:

Tensor

botorch.utils.safe_math.log1mexp(x)[source]

Numerically accurate evaluation of log(1 - exp(x)) for x < 0. See [Maechler2012accurate] for details.

Parameters:

x (Tensor)

Return type:

Tensor

botorch.utils.safe_math.log1pexp(x)[source]

Numerically accurate evaluation of log(1 + exp(x)). See [Maechler2012accurate] for details.

Parameters:

x (Tensor)

Return type:

Tensor

botorch.utils.safe_math.logexpit(X)[source]

Computes the logarithm of the expit (a.k.a. sigmoid) function.

Parameters:

X (Tensor)

Return type:

Tensor

botorch.utils.safe_math.logplusexp(a, b)[source]

Computes log(exp(a) + exp(b)) similar to logsumexp.

Parameters:
  • a (Tensor)

  • b (Tensor)

Return type:

Tensor

botorch.utils.safe_math.logdiffexp(log_a, log_b)[source]

Computes log(b - a) accurately given log(a) and log(b). Assumes, log_b > log_a, i.e. b > a > 0.

Parameters:
  • log_a (Tensor) – The logarithm of a, assumed to be less than log_b.

  • log_b (Tensor) – The logarithm of b, assumed to be larger than log_a.

Returns:

A Tensor of values corresponding to log(b - a).

Return type:

Tensor

botorch.utils.safe_math.logsumexp(x, dim, keepdim=False)[source]

Version of logsumexp that has a well-behaved backward pass when x contains infinities.

In particular, the gradient of the standard torch version becomes NaN 1) for any element that is positive infinity, and 2) for any slice that only contains negative infinities.

This version returns a gradient of 1 for any positive infinities in case 1, and for all elements of the slice in case 2, in agreement with the asymptotic behavior of the function.

Parameters:
  • x (Tensor) – The Tensor to which to apply logsumexp.

  • dim (int | Tuple[int, ...]) – An integer or a tuple of integers, representing the dimensions to reduce.

  • keepdim (bool) – Whether to keep the reduced dimensions. Defaults to False.

Returns:

A Tensor representing the log of the summed exponentials of x.

Return type:

Tensor

botorch.utils.safe_math.logmeanexp(X, dim, keepdim=False)[source]

Computes log(mean(exp(X), dim=dim, keepdim=keepdim)).

Parameters:
  • X (Tensor) – Values of which to compute the logmeanexp.

  • dim (int | Tuple[int, ...]) – The dimension(s) over which to compute the mean.

  • keepdim (bool) – If True, keeps the reduced dimensions.

Returns:

A Tensor of values corresponding to log(mean(exp(X), dim=dim)).

Return type:

Tensor

botorch.utils.safe_math.log_softplus(x, tau=1.0)[source]

Computes the logarithm of the softplus function with high numerical accuracy.

Parameters:
  • x (Tensor) – Input tensor, should have single or double precision floats.

  • tau (float | Tensor) – Decreasing tau increases the tightness of the approximation to ReLU. Non-negative and defaults to 1.0.

Returns:

Tensor corresponding to log(softplus(x)).

Return type:

Tensor

botorch.utils.safe_math.smooth_amax(X, dim=-1, keepdim=False, tau=1.0)[source]

Computes a smooth approximation to max(X, dim=dim), i.e the maximum value of X over dimension dim, using the logarithm of the l_(1/tau) norm of exp(X). Note that when X = log(U) is the logarithm of an acquisition utility U,

logsumexp(log(U) / tau) * tau = log(sum(U^(1/tau))^tau) = log(norm(U, ord=(1/tau))

Parameters:
  • X (Tensor) – A Tensor from which to compute the smoothed amax.

  • dim (int | Tuple[int, ...]) – The dimensions to reduce over.

  • keepdim (bool) – If True, keeps the reduced dimensions.

  • tau (float | Tensor) – Temperature parameter controlling the smooth approximation to max operator, becomes tighter as tau goes to 0. Needs to be positive.

Returns:

A Tensor of smooth approximations to max(X, dim=dim).

Return type:

Tensor

botorch.utils.safe_math.smooth_amin(X, dim=-1, keepdim=False, tau=1.0)[source]

A smooth approximation to min(X, dim=dim), similar to smooth_amax.

Parameters:
  • X (Tensor)

  • dim (int | Tuple[int, ...])

  • keepdim (bool)

  • tau (float | Tensor)

Return type:

Tensor

botorch.utils.safe_math.check_dtype_float32_or_float64(X)[source]
Parameters:

X (Tensor)

Return type:

None

botorch.utils.safe_math.log_fatplus(x, tau=1.0)[source]

Computes the logarithm of the fat-tailed softplus.

NOTE: Separated out in case the complexity of the log implementation increases in the future.

Parameters:
  • x (Tensor)

  • tau (float | Tensor)

Return type:

Tensor

botorch.utils.safe_math.fatplus(x, tau=1.0)[source]

Computes a fat-tailed approximation to ReLU(x) = max(x, 0) by linearly combining a regular softplus function and the density function of a Cauchy distribution. The coefficient alpha of the Cauchy density is chosen to guarantee monotonicity and convexity.

Parameters:
  • x (Tensor) – A Tensor on whose values to compute the smoothed function.

  • tau (float | Tensor) – Temperature parameter controlling the smoothness of the approximation.

Returns:

A Tensor of values of the fat-tailed softplus.

Return type:

Tensor

botorch.utils.safe_math.fatmax(x, dim, keepdim=False, tau=1.0, alpha=2.0)[source]

Computes a smooth approximation to amax(X, dim=dim) with a fat tail.

Parameters:
  • X – A Tensor from which to compute the smoothed maximum.

  • dim (int | Tuple[int, ...]) – The dimensions to reduce over.

  • keepdim (bool) – If True, keeps the reduced dimensions.

  • tau (float | Tensor) – Temperature parameter controlling the smooth approximation to max operator, becomes tighter as tau goes to 0. Needs to be positive.

  • alpha (float) – The exponent of the asymptotic power decay of the approximation. The default value is 2. Higher alpha parameters make the function behave more similarly to the standard logsumexp approximation to the max, so it is recommended to keep this value low or moderate, e.g. < 10.

  • x (Tensor)

Returns:

A Tensor of smooth approximations to amax(X, dim=dim) with a fat tail.

Return type:

Tensor

botorch.utils.safe_math.fatmin(x, dim, keepdim=False, tau=1.0, alpha=2.0)[source]

Computes a smooth approximation to amin(X, dim=dim) with a fat tail.

Parameters:
  • X – A Tensor from which to compute the smoothed minimum.

  • dim (int | Tuple[int, ...]) – The dimensions to reduce over.

  • keepdim (bool) – If True, keeps the reduced dimensions.

  • tau (float | Tensor) – Temperature parameter controlling the smooth approximation to min operator, becomes tighter as tau goes to 0. Needs to be positive.

  • alpha (float) – The exponent of the asymptotic power decay of the approximation. The default value is 2. Higher alpha parameters make the function behave more similarly to the standard logsumexp approximation to the max, so it is recommended to keep this value low or moderate, e.g. < 10.

  • x (Tensor)

Returns:

A Tensor of smooth approximations to amin(X, dim=dim) with a fat tail.

Return type:

Tensor

botorch.utils.safe_math.fatmaximum(a, b, tau=1.0, alpha=2.0)[source]

Computes a smooth approximation to torch.maximum(a, b) with a fat tail.

Parameters:
  • a (Tensor) – The first Tensor from which to compute the smoothed component-wise maximum.

  • b (Tensor) – The second Tensor from which to compute the smoothed component-wise maximum.

  • tau (float | Tensor) – Temperature parameter controlling the smoothness of the approximation. A smaller tau corresponds to a tighter approximation that leads to a sharper objective landscape that might be more difficult to optimize.

  • alpha (float)

Returns:

A smooth approximation of torch.maximum(a, b).

Return type:

Tensor

botorch.utils.safe_math.fatminimum(a, b, tau=1.0, alpha=2.0)[source]

Computes a smooth approximation to torch.minimum(a, b) with a fat tail.

Parameters:
  • a (Tensor) – The first Tensor from which to compute the smoothed component-wise minimum.

  • b (Tensor) – The second Tensor from which to compute the smoothed component-wise minimum.

  • tau (float | Tensor) – Temperature parameter controlling the smoothness of the approximation. A smaller tau corresponds to a tighter approximation that leads to a sharper objective landscape that might be more difficult to optimize.

  • alpha (float)

Returns:

A smooth approximation of torch.minimum(a, b).

Return type:

Tensor

botorch.utils.safe_math.log_fatmoid(X, tau=1.0)[source]

Computes the logarithm of the fatmoid. Separated out in case the implementation of the logarithm becomes more complex in the future to ensure numerical stability.

Parameters:
  • X (Tensor)

  • tau (float | Tensor)

Return type:

Tensor

botorch.utils.safe_math.fatmoid(X, tau=1.0)[source]

Computes a twice continuously differentiable approximation to the Heaviside step function with a fat tail, i.e. O(1 / x^2) as x goes to -inf.

Parameters:
  • X (Tensor) – A Tensor from which to compute the smoothed step function.

  • tau (float | Tensor) – Temperature parameter controlling the smoothness of the approximation.

Returns:

A tensor of fat-tailed approximations to the Heaviside step function.

Return type:

Tensor

botorch.utils.safe_math.cauchy(x)[source]

Computes a Lorentzian, i.e. an un-normalized Cauchy density function.

Parameters:

x (Tensor)

Return type:

Tensor

botorch.utils.safe_math.sigmoid(X, log=False, fat=False)[source]

A sigmoid function with an optional fat tail and evaluation in log space for better numerical behavior. Notably, the fat-tailed sigmoid can be used to remedy numerical underflow problems in the value and gradient of the canonical sigmoid.

Parameters:
  • X (Tensor) – The Tensor on which to evaluate the sigmoid.

  • log (bool) – Toggles the evaluation of the log sigmoid.

  • fat (bool) – Toggles the evaluation of the fat-tailed sigmoid.

Returns:

A Tensor of (log-)sigmoid values.

Return type:

Tensor

Multi-Objective Utilities

Abstract Box Decompositions

Box decomposition algorithms.

References

[Lacour17] (1,2,3,4,5)

R. Lacour, K. Klamroth, C. Fonseca. A box decomposition algorithm to compute the hypervolume indicator. Computers & Operations Research, Volume 79, 2017.

Box Decomposition List

Box decomposition container.

class botorch.utils.multi_objective.box_decompositions.box_decomposition_list.BoxDecompositionList(*box_decompositions)[source]

Bases: Module

A list of box decompositions.

Initialize the box decomposition list.

Parameters:

*box_decompositions (BoxDecomposition) – An variable number of box decompositions

Example

>>> bd1 = FastNondominatedPartitioning(ref_point, Y=Y1)
>>> bd2 = FastNondominatedPartitioning(ref_point, Y=Y2)
>>> bd = BoxDecompositionList(bd1, bd2)
property pareto_Y: List[Tensor]

This returns the non-dominated set.

Note: Internally, we store the negative pareto set (minimization).

Returns:

A list where the ith element is the n_pareto_i x m-dim tensor

of pareto optimal outcomes for each box_decomposition i.

property ref_point: Tensor

Get the reference point.

Note: Internally, we store the negative reference point (minimization).

Returns:

A n_box_decompositions x m-dim tensor of outcomes.

get_hypercell_bounds()[source]

Get the bounds of each hypercell in the decomposition.

Returns:

A 2 x n_box_decompositions x num_cells x num_outcomes-dim tensor

containing the lower and upper vertices bounding each hypercell.

Return type:

Tensor

update(Y)[source]

Update the partitioning.

Parameters:

Y (List[Tensor] | Tensor) – A n_box_decompositions x n x num_outcomes-dim tensor or a list where the ith element contains the new points for box_decomposition i.

Return type:

None

compute_hypervolume()[source]

Compute hypervolume that is dominated by the Pareto Froniter.

Returns:

A (batch_shape)-dim tensor containing the hypervolume dominated by

each Pareto frontier.

Return type:

Tensor

Box Decomposition Utilities

Utilities for box decomposition algorithms.

botorch.utils.multi_objective.box_decompositions.utils.compute_local_upper_bounds(U, Z, z)[source]

Compute local upper bounds.

Note: this assumes minimization.

This uses the incremental algorithm (Alg. 1) from [Lacour17].

Parameters:
  • U (Tensor) – A n x m-dim tensor containing the local upper bounds.

  • Z (Tensor) – A n x m x m-dim tensor containing the defining points.

  • z (Tensor) – A m-dim tensor containing the new point.

Returns:

  • A new n’ x m-dim tensor local upper bounds.

  • A n’ x m x m-dim tensor containing the defining points.

Return type:

2-element tuple containing

botorch.utils.multi_objective.box_decompositions.utils.get_partition_bounds(Z, U, ref_point)[source]

Get the cell bounds given the local upper bounds and the defining points.

This implements Equation 2 in [Lacour17].

Parameters:
  • Z (Tensor) – A n x m x m-dim tensor containing the defining points. The first dimension corresponds to u_idx, the second dimension corresponds to j, and Z[u_idx, j] is the set of definining points Z^j(u) where u = U[u_idx].

  • U (Tensor) – A n x m-dim tensor containing the local upper bounds.

  • ref_point (Tensor) – A m-dim tensor containing the reference point.

Returns:

A 2 x num_cells x m-dim tensor containing the lower and upper vertices

bounding each hypercell.

Return type:

Tensor

botorch.utils.multi_objective.box_decompositions.utils.update_local_upper_bounds_incremental(new_pareto_Y, U, Z)[source]

Update the current local upper with the new pareto points.

This assumes minimization.

Parameters:
  • new_pareto_Y (Tensor) – A n x m-dim tensor containing the new Pareto points.

  • U (Tensor) – A n’ x m-dim tensor containing the local upper bounds.

  • Z (Tensor) – A n x m x m-dim tensor containing the defining points.

Returns:

  • A new n’ x m-dim tensor local upper bounds.

  • A n’ x m x m-dim tensor containing the defining points

Return type:

2-element tuple containing

botorch.utils.multi_objective.box_decompositions.utils.compute_non_dominated_hypercell_bounds_2d(pareto_Y_sorted, ref_point)[source]

Compute an axis-aligned partitioning of the non-dominated space for 2 objectives.

Parameters:
  • pareto_Y_sorted (Tensor) – A (batch_shape) x n_pareto x 2-dim tensor of pareto outcomes that are sorted by the 0th dimension in increasing order. All points must be better than the reference point.

  • ref_point (Tensor) – A (batch_shape) x 2-dim reference point.

Returns:

A 2 x (batch_shape) x n_pareto + 1 x m-dim tensor of cell bounds.

Return type:

Tensor

botorch.utils.multi_objective.box_decompositions.utils.compute_dominated_hypercell_bounds_2d(pareto_Y_sorted, ref_point)[source]

Compute an axis-aligned partitioning of the dominated space for 2-objectives.

Parameters:
  • pareto_Y_sorted (Tensor) – A (batch_shape) x n_pareto x 2-dim tensor of pareto outcomes that are sorted by the 0th dimension in increasing order.

  • ref_point (Tensor) – A 2-dim reference point.

Returns:

A 2 x (batch_shape) x n_pareto x m-dim tensor of cell bounds.

Return type:

Tensor

Dominated Partitionings

Algorithms for partitioning the dominated space into hyperrectangles.

class botorch.utils.multi_objective.box_decompositions.dominated.DominatedPartitioning(ref_point, Y=None)[source]

Bases: FastPartitioning

Partition dominated space into axis-aligned hyperrectangles.

This uses the Algorithm 1 from [Lacour17].

Example

>>> bd = DominatedPartitioning(ref_point, Y)
Parameters:
  • ref_point (Tensor) – A m-dim tensor containing the reference point.

  • Y (Optional[Tensor]) – A (batch_shape) x n x m-dim tensor

Hypervolume

Hypervolume Utilities.

References

[Fonseca2006] (1,2)

C. M. Fonseca, L. Paquete, and M. Lopez-Ibanez. An improved dimension-sweep algorithm for the hypervolume indicator. In IEEE Congress on Evolutionary Computation, pages 1157-1163, Vancouver, Canada, July 2006.

[Ishibuchi2011]

H. Ishibuchi, N. Akedo, and Y. Nojima. A many-objective test problem for visually examining diversity maintenance behavior in a decision space. Proc. 13th Annual Conf. Genetic Evol. Comput., 2011.

botorch.utils.multi_objective.hypervolume.infer_reference_point(pareto_Y, max_ref_point=None, scale=0.1, scale_max_ref_point=False)[source]

Get reference point for hypervolume computations.

This sets the reference point to be ref_point = nadir - scale * range when there is no pareto_Y that is better than max_ref_point. If there’s pareto_Y better than max_ref_point, the reference point will be set to max_ref_point - scale * range if scale_max_ref_point is true and to max_ref_point otherwise.

[Ishibuchi2011] find 0.1 to be a robust multiplier for scaling the nadir point.

Note: this assumes maximization of all objectives.

Parameters:
  • pareto_Y (Tensor) – A n x m-dim tensor of Pareto-optimal points.

  • max_ref_point (Tensor | None) – A m dim tensor indicating the maximum reference point. Some elements can be NaN, except when pareto_Y is empty, in which case these dimensions will be treated as if no max_ref_point was provided and set to nadir - scale * range.

  • scale (float) – A multiplier used to scale back the reference point based on the range of each objective.

  • scale_max_ref_point (bool) – A boolean indicating whether to apply scaling to the max_ref_point based on the range of each objective.

Returns:

A m-dim tensor containing the reference point.

Return type:

Tensor

class botorch.utils.multi_objective.hypervolume.Hypervolume(ref_point)[source]

Bases: object

Hypervolume computation dimension sweep algorithm from [Fonseca2006].

Adapted from Simon Wessing’s implementation of the algorithm (Variant 3, Version 1.2) in [Fonseca2006] in PyMOO: https://github.com/msu-coinlab/pymoo/blob/master/pymoo/vendor/hv.py

Maximization is assumed.

TODO: write this in C++ for faster looping.

Initialize hypervolume object.

Parameters:

ref_point (Tensor) – m-dim Tensor containing the reference point.

property ref_point: Tensor

Get reference point (for maximization).

Returns:

A m-dim tensor containing the reference point.

compute(pareto_Y)[source]

Compute hypervolume.

Parameters:

pareto_Y (Tensor) – A n x m-dim tensor of pareto optimal outcomes

Returns:

The hypervolume.

Return type:

float

botorch.utils.multi_objective.hypervolume.sort_by_dimension(nodes, i)[source]

Sorts the list of nodes in-place by the specified objective.

Parameters:
  • nodes (List[Node]) – A list of Nodes

  • i (int) – The index of the objective to sort by

Return type:

None

class botorch.utils.multi_objective.hypervolume.Node(m, dtype, device, data=None)[source]

Bases: object

Node in the MultiList data structure.

Initialize MultiList.

Parameters:
  • m (int) – The number of objectives

  • dtype (torch.dtype) – The dtype

  • device (torch.device) – The device

  • data (Optional[Tensor]) – The tensor data to be stored in this Node.

class botorch.utils.multi_objective.hypervolume.MultiList(m, dtype, device)[source]

Bases: object

A special data structure used in hypervolume computation.

It consists of several doubly linked lists that share common nodes. Every node has multiple predecessors and successors, one in every list.

Initialize m doubly linked lists.

Parameters:
  • m (int) – number of doubly linked lists

  • dtype (torch.dtype) – the dtype

  • device (torch.device) – the device

append(node, index)[source]

Appends a node to the end of the list at the given index.

Parameters:
  • node (Node) – the new node

  • index (int) – the index where the node should be appended.

Return type:

None

extend(nodes, index)[source]

Extends the list at the given index with the nodes.

Parameters:
  • nodes (List[Node]) – list of nodes to append at the given index.

  • index (int) – the index where the nodes should be appended.

Return type:

None

remove(node, index, bounds)[source]

Removes and returns ‘node’ from all lists in [0, ‘index’].

Parameters:
  • node (Node) – The node to remove

  • index (int) – The upper bound on the range of indices

  • bounds (Tensor) – A 2 x m-dim tensor bounds on the objectives

Return type:

Node

reinsert(node, index, bounds)[source]

Re-inserts the node at its original position.

Re-inserts the node at its original position in all lists in [0, ‘index’] before it was removed. This method assumes that the next and previous nodes of the node that is reinserted are in the list.

Parameters:
  • node (Node) – The node

  • index (int) – The upper bound on the range of indices

  • bounds (Tensor) – A 2 x m-dim tensor bounds on the objectives

Return type:

None

class botorch.utils.multi_objective.hypervolume.SubsetIndexCachingMixin[source]

Bases: object

A Mixin class that adds q-subset index computations and caching.

Initializes the class with q_out = -1 and an empty q_subset_indices dict.

compute_q_subset_indices(q_out, device)[source]

Returns and caches a dict of indices equal to subsets of {1, …, q_out}.

This means that consecutive calls to self.compute_q_subset_indices with the same q_out do not recompute the indices for all (2^q_out - 1) subsets.

NOTE: This will use more memory than regenerating the indices for each i and then deleting them, but it will be faster for repeated evaluations (e.g. during optimization).

Parameters:
  • q_out (int) – The batch size of the objectives. This is typically equal to the q-batch size of X. However, if using a set valued objective (e.g., MVaR) that produces s objective values for each point on the q-batch of X, we need to properly account for each objective while calculating the hypervolume contributions by using q_out = q * s.

  • device (torch.device)

Returns:

A dict that maps “q choose i” to all size-i subsets of {1, …, q_out}.

Return type:

BufferDict[str, Tensor]

botorch.utils.multi_objective.hypervolume.compute_subset_indices(q, device=None)[source]

Compute all (2^q - 1) distinct subsets of {1, …, q}.

Parameters:
  • q (int) – An integer defininig the set {1, …, q} whose subsets to compute.

  • device (Optional[torch.device])

Returns:

A dict that maps “q choose i” to all size-i subsets of {1, …, q_out}.

Return type:

BufferDict[str, Tensor]

Non-dominated Partitionings

Algorithms for partitioning the non-dominated space into rectangles.

References

[Couckuyt2012] (1,2)

I. Couckuyt, D. Deschrijver and T. Dhaene, “Towards Efficient Multiobjective Optimization: Multiobjective statistical criterions,” 2012 IEEE Congress on Evolutionary Computation, Brisbane, QLD, 2012, pp. 1-8.

class botorch.utils.multi_objective.box_decompositions.non_dominated.NondominatedPartitioning(ref_point, Y=None, alpha=0.0)[source]

Bases: BoxDecomposition

A class for partitioning the non-dominated space into hyper-cells.

Note: this assumes maximization. Internally, it multiplies outcomes by -1 and performs the decomposition under minimization. TODO: use maximization internally as well.

Note: it is only feasible to use this algorithm to compute an exact decomposition of the non-dominated space for m<5 objectives (alpha=0.0).

The alpha parameter can be increased to obtain an approximate partitioning faster. The alpha is a fraction of the total hypervolume encapsuling the entire Pareto set. When a hypercell’s volume divided by the total hypervolume is less than alpha, we discard the hypercell. See Figure 2 in [Couckuyt2012] for a visual representation.

This PyTorch implementation of the binary partitioning algorithm ([Couckuyt2012]) is adapted from numpy/tensorflow implementation at: https://github.com/GPflow/GPflowOpt/blob/master/gpflowopt/pareto.py.

TODO: replace this with a more efficient decomposition. E.g. https://link.springer.com/content/pdf/10.1007/s10898-019-00798-7.pdf

Initialize NondominatedPartitioning.

Parameters:
  • ref_point (Tensor) – A m-dim tensor containing the reference point.

  • Y (Optional[Tensor]) – A (batch_shape) x n x m-dim tensor.

  • alpha (float) – A thresold fraction of total volume used in an approximate decomposition.

Example

>>> bd = NondominatedPartitioning(ref_point, Y=Y1)
get_hypercell_bounds()[source]

Get the bounds of each hypercell in the decomposition.

Parameters:

ref_point – A (batch_shape) x m-dim tensor containing the reference point.

Returns:

A 2 x num_cells x m-dim tensor containing the

lower and upper vertices bounding each hypercell.

Return type:

Tensor

class botorch.utils.multi_objective.box_decompositions.non_dominated.FastNondominatedPartitioning(ref_point, Y=None)[source]

Bases: FastPartitioning

A class for partitioning the non-dominated space into hyper-cells.

Note: this assumes maximization. Internally, it multiplies by -1 and performs the decomposition under minimization.

This class is far more efficient than NondominatedPartitioning for exact box partitionings

This class uses the two-step approach similar to that in [Yang2019], where:
  1. first, Alg 1 from [Lacour17] is used to find the local lower bounds

    for the maximization problem

  2. second, the local lower bounds are used as the Pareto frontier for the

    minimization problem, and [Lacour17] is applied again to partition the space dominated by that Pareto frontier.

Initialize FastNondominatedPartitioning.

Parameters:
  • ref_point (Tensor) – A m-dim tensor containing the reference point.

  • Y (Optional[Tensor]) – A (batch_shape) x n x m-dim tensor.

Example

>>> bd = FastNondominatedPartitioning(ref_point, Y=Y1)

Pareto

botorch.utils.multi_objective.pareto.is_non_dominated(Y, maximize=True, deduplicate=True)[source]

Computes the non-dominated front.

Note: this assumes maximization.

For small n, this method uses a highly parallel methodology that compares all pairs of points in Y. However, this is memory intensive and slow for large n. For large n (or if Y is larger than 5MB), this method will dispatch to a loop-based approach that is faster and has a lower memory footprint.

Parameters:
  • Y (Tensor) – A (batch_shape) x n x m-dim tensor of outcomes. If any element of Y is NaN, the corresponding point will be treated as a dominated point (returning False).

  • maximize (bool) – If True, assume maximization (default).

  • deduplicate (bool) – A boolean indicating whether to only return unique points on the pareto frontier.

Returns:

A (batch_shape) x n-dim boolean tensor indicating whether each point is non-dominated.

Return type:

Tensor

Scalarization

Helper utilities for constructing scalarizations.

References

[Knowles2005] (1,2)

J. Knowles, “ParEGO: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems,” in IEEE Transactions on Evolutionary Computation, vol. 10, no. 1, pp. 50-66, Feb. 2006.

botorch.utils.multi_objective.scalarization.get_chebyshev_scalarization(weights, Y, alpha=0.05)[source]

Construct an augmented Chebyshev scalarization.

The augmented Chebyshev scalarization is given by

g(y) = max_i(w_i * y_i) + alpha * sum_i(w_i * y_i)

where the goal is to minimize g(y) in the setting where all objectives y_i are to be minimized. Since the default in BoTorch is to maximize all objectives, this method constructs a Chebyshev scalarization where the inputs are first multiplied by -1, so that all objectives are to be minimized. Then, it computes g(y) (which should be minimized), and returns -g(y), which should be maximized.

Minimizing an objective is supported by passing a negative weight for that objective. To make all w * y’s have the same sign such that they are comparable when computing max(w * y), outcomes of minimization objectives are shifted from [0,1] to [-1,0].

See [Knowles2005] for details.

This scalarization can be used with qExpectedImprovement to implement q-ParEGO as proposed in [Daulton2020qehvi].

Parameters:
  • weights (Tensor) – A m-dim tensor of weights. Positive for maximization and negative for minimization.

  • Y (Tensor) – A n x m-dim tensor of observed outcomes, which are used for scaling the outcomes to [0,1] or [-1,0]. If n=0, then outcomes are left unnormalized.

  • alpha (float) – Parameter governing the influence of the weighted sum term. The default value comes from [Knowles2005].

Returns:

Transform function using the objective weights.

Return type:

Callable[[Tensor, Tensor | None], Tensor]

Example

>>> weights = torch.tensor([0.75, -0.25])
>>> transform = get_aug_chebyshev_scalarization(weights, Y)

Probability Utilities

Multivariate Gaussian Probabilities via Bivariate Conditioning

Bivariate conditioning algorithm for approximating Gaussian probabilities, see [Genz2016numerical] and [Trinh2015bivariate].

[Trinh2015bivariate] (1,2)

G. Trinh and A. Genz. Bivariate conditioning approximations for multivariate normal probabilities. Statistics and Computing, 2015.

[Genz2016numerical]

A. Genz and G. Tring. Numerical Computation of Multivariate Normal Probabilities using Bivariate Conditioning. Monte Carlo and Quasi-Monte Carlo Methods, 2016.

[Gibson1994monte]

GJ. Gibson, CA Galsbey, and DA Elston. Monte Carlo evaluation of multivariate normal integrals and sensitivity to variate ordering. Advances in Numerical Methods and Applications. 1994.

class botorch.utils.probability.mvnxpb.mvnxpbState[source]

Bases: TypedDict

step: int
perm: LongTensor
bounds: Tensor
piv_chol: PivotedCholesky
plug_ins: Tensor
log_prob: Tensor
log_prob_extra: Tensor | None
class botorch.utils.probability.mvnxpb.MVNXPB(covariance_matrix, bounds)[source]

Bases: object

An algorithm for approximating Gaussian probabilities P(X in bounds), where X ~ N(0, covariance_matrix).

Initializes an MVNXPB instance.

Parameters:
  • covariance_matrix (Tensor) – Covariance matrices of shape batch_shape x [n, n].

  • bounds (Tensor) – Tensor of lower and upper bounds, batch_shape x [n, 2]. These bounds are standardized internally and clipped to STANDARDIZED_RANGE.

classmethod build(step, perm, bounds, piv_chol, plug_ins, log_prob, log_prob_extra=None)[source]

Creates an MVNXPB instance from raw arguments. Unlike MVNXPB.__init__, this methods does not preprocess or copy terms.

Parameters:
  • step (int) – Integer used to track the solver’s progress.

  • bounds (Tensor) – Tensor of lower and upper bounds, batch_shape x [n, 2].

  • piv_chol (PivotedCholesky) – A PivotedCholesky instance for the system.

  • plug_ins (Tensor) – Tensor of plug-in estimators used to update lower and upper bounds on random variables that have yet to be integrated out.

  • log_prob (Tensor) – Tensor of log probabilities.

  • log_prob_extra (Tensor | None) – Tensor of conditional log probabilities for the next random variable. Used when integrating over an odd number of random variables.

  • perm (Tensor)

Return type:

MVNXPB

solve(num_steps=None, eps=1e-10)[source]

Runs the MVNXPB solver instance for a fixed number of steps.

Calculates a bivariate conditional approximation to P(X in bounds), where X ~ N(0, Σ). For details, see [Genz2016numerical] or [Trinh2015bivariate].

Parameters:
  • num_steps (int | None)

  • eps (float)

Return type:

Tensor

select_pivot()[source]

GGE variable prioritization strategy from [Gibson1994monte].

Returns the index of the random variable least likely to satisfy its bounds when conditioning on the previously integrated random variables X[:t - 1] attaining the values of plug-in estimators y[:t - 1]. Equivalently, ` argmin_{i = t, ..., n} P(X[i] \in bounds[i] | X[:t-1] = y[:t -1]), ` where t denotes the current step.

Return type:

LongTensor | None

pivot_(pivot)[source]

Swap random variables at pivot and step positions.

Parameters:

pivot (LongTensor)

Return type:

None

concat(other, dim)[source]
Parameters:
Return type:

MVNXPB

expand(*sizes)[source]
Parameters:

sizes (int)

Return type:

MVNXPB

augment(covariance_matrix, bounds, cross_covariance_matrix, disable_pivoting=False, jitter=None, max_tries=None)[source]

Augment an n-dimensional MVNXPB instance to include m additional random variables.

Parameters:
  • covariance_matrix (Tensor)

  • bounds (Tensor)

  • cross_covariance_matrix (Tensor)

  • disable_pivoting (bool)

  • jitter (float | None)

  • max_tries (int | None)

Return type:

MVNXPB

detach()[source]
Return type:

MVNXPB

clone()[source]
Return type:

MVNXPB

asdict()[source]
Return type:

mvnxpbState

Truncated Multivariate Normal Distribution

class botorch.utils.probability.truncated_multivariate_normal.TruncatedMultivariateNormal(loc, covariance_matrix=None, precision_matrix=None, scale_tril=None, bounds=None, solver=None, sampler=None, validate_args=None)[source]

Bases: MultivariateNormal

Initializes an instance of a TruncatedMultivariateNormal distribution.

Let x ~ N(0, K) be an n-dimensional Gaussian random vector. This class represents the distribution of the truncated Multivariate normal random vector x | a <= x <= b.

Parameters:
  • loc (Tensor) – A mean vector for the distribution, batch_shape x event_shape.

  • covariance_matrix (Optional[Tensor]) – Covariance matrix distribution parameter.

  • precision_matrix (Optional[Tensor]) – Inverse covariance matrix distribution parameter.

  • scale_tril (Optional[Tensor]) – Lower triangular, square-root covariance matrix distribution parameter.

  • bounds (Tensor) – A batch_shape x event_shape x 2 tensor of strictly increasing bounds for x so that bounds[…, 0] < bounds[…, 1] everywhere.

  • solver (Optional[MVNXPB]) – A pre-solved MVNXPB instance used to approximate the log partition.

  • sampler (Optional[LinearEllipticalSliceSampler]) – A LinearEllipticalSliceSampler instance used for sample generation.

  • validate_args (Optional[bool]) – Optional argument to super().__init__.

log_prob(value)[source]

Approximates the true log probability.

Parameters:

value (Tensor)

Return type:

Tensor

rsample(sample_shape=())[source]

Draw samples from the Truncated Multivariate Normal.

Parameters:

sample_shape (Size) – The shape of the samples.

Returns:

The (sample_shape x batch_shape x event_shape) tensor of samples.

Return type:

Tensor

property log_partition: Tensor
property solver: MVNXPB
property sampler: LinearEllipticalSliceSampler
expand(batch_shape, _instance=None)[source]

Returns a new distribution instance (or populates an existing instance provided by a derived class) with batch dimensions expanded to batch_shape. This method calls expand on the distribution’s parameters. As such, this does not allocate new memory for the expanded distribution instance. Additionally, this does not repeat any args checking or parameter broadcasting in __init__.py, when an instance is first created.

Parameters:
  • batch_shape (torch.Size) – the desired expanded size.

  • _instance (TruncatedMultivariateNormal | None) – new instance provided by subclasses that need to override .expand.

Returns:

New distribution instance with batch dimensions expanded to batch_size.

Return type:

TruncatedMultivariateNormal

Unified Skew Normal Distribution

class botorch.utils.probability.unified_skew_normal.UnifiedSkewNormal(trunc, gauss, cross_covariance_matrix, validate_args=None)[source]

Bases: Distribution

Unified Skew Normal distribution of Y | a < X < b for jointly Gaussian random vectors X ∈ R^m and Y ∈ R^n.

Batch shapes trunc.batch_shape and gauss.batch_shape must be broadcastable. Care should be taken when choosing trunc.batch_shape. When trunc is of lower batch dimensionality than gauss, the user should consider expanding trunc to hasten UnifiedSkewNormal.log_prob. In these cases, it is suggested that the user invoke trunc.solver before calling trunc.expand to avoid paying for multiple, identical solves.

Parameters:
  • trunc (TruncatedMultivariateNormal) – Distribution of Z = (X | a < X < b) ∈ R^m.

  • gauss (MultivariateNormal) – Distribution of Y ∈ R^n.

  • cross_covariance_matrix (Union[Tensor, LinearOperator]) – Cross-covariance Cov(X, Y) ∈ R^{m x n}.

  • validate_args (Optional[bool]) – Optional argument to super().__init__.

arg_constraints = {}
log_prob(value)[source]

Computes the log probability ln p(Y = value | a < X < b).

Parameters:

value (Tensor)

Return type:

Tensor

rsample(sample_shape=())[source]

Draw samples from the Unified Skew Normal.

Parameters:

sample_shape (Size) – The shape of the samples.

Returns:

The (sample_shape x batch_shape x event_shape) tensor of samples.

Return type:

Tensor

expand(batch_shape, _instance=None)[source]

Returns a new distribution instance (or populates an existing instance provided by a derived class) with batch dimensions expanded to batch_shape. This method calls expand on the distribution’s parameters. As such, this does not allocate new memory for the expanded distribution instance. Additionally, this does not repeat any args checking or parameter broadcasting in __init__.py, when an instance is first created.

Parameters:
  • batch_shape (torch.Size) – the desired expanded size.

  • _instance (UnifiedSkewNormal | None) – new instance provided by subclasses that need to override .expand.

Returns:

New distribution instance with batch dimensions expanded to batch_size.

Return type:

UnifiedSkewNormal

property covariance_matrix: Tensor
property scale_tril: Tensor

Bivariate Normal Probabilities and Statistics

Methods for computing bivariate normal probabilities and statistics.

[Genz2004bvnt] (1,2,3)

A. Genz. Numerical computation of rectangular bivariate and trivariate normal and t probabilities. Statistics and Computing, 2004.

[Muthen1990moments]

B. Muthen. Moments of the censored and truncated bivariate normal distribution. British Journal of Mathematical and Statistical Psychology, 1990.

botorch.utils.probability.bvn.bvn(r, xl, yl, xu, yu)[source]

A function for computing bivariate normal probabilities.

Calculates P(xl < x < xu, yl < y < yu) where x and y are bivariate normal with unit variance and correlation coefficient r. See Section 2.4 of [Genz2004bvnt].

This method uses a sign flip trick to improve numerical performance. Many of bvnu`s internal branches rely on evaluations `Phi(-bound). For a < b < 0, the term Phi(-a) - Phi(-b) goes to zero faster than Phi(b) - Phi(a) because finfo(dtype).epsneg is typically much larger than finfo(dtype).tiny. In these cases, flipping the sign can prevent situations where bvnu(…) - bvnu(…) would otherwise be zero due to round-off error.

Parameters:
  • r (Tensor) – Tensor of correlation coefficients.

  • xl (Tensor) – Tensor of lower bounds for x, same shape as r.

  • yl (Tensor) – Tensor of lower bounds for y, same shape as r.

  • xu (Tensor) – Tensor of upper bounds for x, same shape as r.

  • yu (Tensor) – Tensor of upper bounds for y, same shape as r.

Returns:

Tensor of probabilities P(xl < x < xu, yl < y < yu).

Return type:

Tensor

botorch.utils.probability.bvn.bvnu(r, h, k)[source]

Solves for P(x > h, y > k) where x and y are standard bivariate normal random variables with correlation coefficient r. In [Genz2004bvnt], this is (1)

L(h, k, r) = P(x < -h, y < -k) = 1/(a 2pi) int_{h}^{infty} int_{k}^{infty} f(x, y, r) dy dx,

where f(x, y, r) = e^{-1/(2a^2) (x^2 - 2rxy + y^2)} and a = (1 - r^2)^{1/2}.

[Genz2004bvnt] report the following integation scheme incurs a maximum of 5e-16 error when run in double precision: if |r| >= 0.925, use a 20-point quadrature rule on a 5th order Taylor expansion; else, numerically integrate in polar coordinates using no more than 20 quadrature points.

Parameters:
  • r (Tensor) – Tensor of correlation coefficients.

  • h (Tensor) – Tensor of negative upper bounds for x, same shape as r.

  • k (Tensor) – Tensor of negative upper bounds for y, same shape as r.

Returns:

A tensor of probabilities P(x > h, y > k).

Return type:

Tensor

botorch.utils.probability.bvn.bvnmom(r, xl, yl, xu, yu, p=None)[source]

Computes the expected values of truncated, bivariate normal random variables.

Let x and y be a pair of standard bivariate normal random variables having correlation r. This function computes E([x,y] | [xl,yl] < [x,y] < [xu,yu]).

Following [Muthen1990moments] equations (4) and (5), we have

E(x | [xl, yl] < [x, y] < [xu, yu]) = Z^{-1} phi(xl) P(yl < y < yu | x=xl) - phi(xu) P(yl < y < yu | x=xu),

where Z = P([xl, yl] < [x, y] < [xu, yu]) and phi is the standard normal PDF.

Parameters:
  • r (Tensor) – Tensor of correlation coefficients.

  • xl (Tensor) – Tensor of lower bounds for x, same shape as r.

  • xu (Tensor) – Tensor of upper bounds for x, same shape as r.

  • yl (Tensor) – Tensor of lower bounds for y, same shape as r.

  • yu (Tensor) – Tensor of upper bounds for y, same shape as r.

  • p (Tensor | None) – Tensor of probabilities P(xl < x < xu, yl < y < yu), same shape as r.

Returns:

E(x | [xl, yl] < [x, y] < [xu, yu]) and E(y | [xl, yl] < [x, y] < [xu, yu]).

Return type:

Tuple[Tensor, Tensor]

Elliptic Slice Sampler with Linear Constraints

Linear Elliptical Slice Sampler.

References

[Gessner2020]

A. Gessner, O. Kanjilal, and P. Hennig. Integrals over gaussians under linear domain constraints. AISTATS 2020.

This implementation is based (with multiple changes / optimiations) on the following implementations based on the algorithm in [Gessner2020]: - https://github.com/alpiges/LinConGauss - https://github.com/wjmaddox/pytorch_ess

The implementation here differentiates itself from the original implementations with: 1) Support for fixed feature equality constraints. 2) Support for non-standard Normal distributions. 3) Numerical stability improvements, especially relevant for high-dimensional cases.

Notably, this implementation does not rely on an adaptive delta_theta parameter in order to determine if two neighboring constraint intersection angles theta lead to a change in the feasibility of the sample. This both simplifies the implementation and makes it more robust to numerical imprecisions when two constraint intersection angles are close to each other.

class botorch.utils.probability.lin_ess.LinearEllipticalSliceSampler(inequality_constraints=None, bounds=None, interior_point=None, fixed_indices=None, mean=None, covariance_matrix=None, covariance_root=None, check_feasibility=False, burnin=0, thinning=0)[source]

Bases: PolytopeSampler

Linear Elliptical Slice Sampler.

Ideas: - Add batch support, broadcasting over parallel chains. - Optimize computations if possible, potentially with torch.compile. - Extend fixed features constraint to general linear equality constraints.

Initialize LinearEllipticalSliceSampler.

Parameters:
  • inequality_constraints (Optional[Tuple[Tensor, Tensor]]) – Tensors (A, b) describing inequality constraints A @ x <= b, where A is an n_ineq_con x d-dim Tensor and b is an n_ineq_con x 1-dim Tensor, with n_ineq_con the number of inequalities and d the dimension of the sample space. If omitted, must provide bounds instead.

  • bounds (Optional[Tensor]) – A 2 x d-dim tensor of box bounds. If omitted, must provide inequality_constraints instead.

  • interior_point (Optional[Tensor]) – A d x 1-dim Tensor presenting a point in the (relative) interior of the polytope. If omitted, an interior point is determined automatically by solving a Linear Program. Note: It is crucial that the point lie in the interior of the feasible set (rather than on the boundary), otherwise the sampler will produce invalid samples.

  • fixed_indices (Optional[Union[List[int], Tensor]]) – Integer list or d-dim Tensor representing the indices of dimensions that are constrained to be fixed to the values specified in the interior_point, which is required to be passed in conjunction with fixed_indices.

  • mean (Optional[Tensor]) – The d x 1-dim mean of the MVN distribution (if omitted, use zero).

  • covariance_matrix (Optional[Union[Tensor, LinearOperator]]) – The d x d-dim covariance matrix of the MVN distribution (if omitted, use the identity).

  • covariance_root (Optional[Union[Tensor, LinearOperator]]) – A d x d-dim root of the covariance matrix such that covariance_root @ covariance_root.T = covariance_matrix. NOTE: This matrix is assumed to be lower triangular. covariance_root can only be passed in conjunction with fixed_indices if covariance_root is a DiagLinearOperator. Otherwise the factorization would need to be re- computed, as we need to solve in standardize.

  • check_feasibility (bool) – If True, raise an error if the sampling results in an infeasible sample. This creates some overhead and so is switched off by default.

  • burnin (int) – Number of samples to generate upon initialization to warm up the sampler.

  • thinning (int) – Number of samples to skip before returning a sample in draw.

This sampler samples from a multivariante Normal N(mean, covariance_matrix) subject to linear domain constraints A x <= b (intersected with box bounds, if provided).

property lifetime_samples: int

The total number of samples generated by the sampler during its lifetime.

draw(n=1)[source]

Draw samples.

Parameters:

n (int) – The number of samples.

Returns:

A n x d-dim tensor of n samples.

Return type:

Tuple[Tensor, Tensor]

step()[source]

Take a step, return the new sample, update the internal state.

Returns:

A d x 1-dim sample from the domain.

Return type:

Tensor

botorch.utils.probability.lin_ess.get_index_tensors(fixed_indices, d)[source]

Converts fixed_indices to a d-dim integral Tensor that is True at indices that are contained in fixed_indices and False otherwise.

Parameters:
  • fixed_indices (List[int] | Tensor) – A list or Tensoro of integer indices to fix.

  • d (int) – The dimensionality of the Tensors to be indexed.

Returns:

A Tuple of integral Tensors partitioning [1, d] into indices that are fixed (first tensor) and non-fixed (second tensor).

Return type:

Tuple[Tensor, Tensor]

Linear Algebra Helpers

botorch.utils.probability.linalg.block_matrix_concat(blocks)[source]
Parameters:

blocks (Sequence[Sequence[Tensor]])

Return type:

Tensor

botorch.utils.probability.linalg.augment_cholesky(Laa, Kbb, Kba=None, Lba=None, jitter=None)[source]

Computes the Cholesky factor of a block matrix K = [[Kaa, Kab], [Kba, Kbb]] based on a precomputed Cholesky factor Kaa = Laa Laa^T.

Parameters:
  • Laa (Tensor) – Cholesky factor of K’s upper left block.

  • Kbb (Tensor) – Lower-right block of K.

  • Kba (Tensor | None) – Lower-left block of K.

  • Lba (Tensor | None) – Precomputed solve Kba Laa^{-T}.

  • jitter (float | None) – Optional nugget to be added to the diagonal of Kbb.

Return type:

Tensor

class botorch.utils.probability.linalg.PivotedCholesky(step: 'int', tril: 'Tensor', perm: 'LongTensor', diag: 'Optional[Tensor]' = None, validate_init: 'InitVar[bool]' = True)[source]

Bases: object

Parameters:
  • step (int)

  • tril (Tensor)

  • perm (LongTensor)

  • diag (Optional[Tensor])

  • validate_init (InitVar[bool])

step: int
tril: Tensor
perm: LongTensor
diag: Tensor | None = None
validate_init: InitVar[bool] = True
update_(eps=1e-10)[source]

Performs a single matrix decomposition step.

Parameters:

eps (float)

Return type:

None

pivot_(pivot)[source]
Parameters:

pivot (LongTensor)

Return type:

None

expand(*sizes)[source]
Parameters:

sizes (int)

Return type:

PivotedCholesky

concat(other, dim=0)[source]
Parameters:
Return type:

PivotedCholesky

detach()[source]
Return type:

PivotedCholesky

clone()[source]
Return type:

PivotedCholesky

Probability Helpers

botorch.utils.probability.utils.case_dispatcher(out, cases=(), default=None)[source]

Basic implementation of a tensorized switching case statement.

Parameters:
  • out (Tensor) – Tensor to which case outcomes are written.

  • cases (Iterable[Tuple[Callable[[], BoolTensor], Callable[[BoolTensor], Tensor]]]) – Iterable of function pairs (pred, func), where mask=pred() specifies whether func is applicable for each entry in out. Note that cases are resolved first-come, first-serve.

  • default (Callable[[BoolTensor], Tensor] | None) – Optional func to which all unclaimed entries of out are dispatched.

Return type:

Tensor

botorch.utils.probability.utils.get_constants(values, device=None, dtype=None)[source]

Returns scalar-valued Tensors containing each of the given constants. Used to expedite tensor operations involving scalar arithmetic. Note that the returned Tensors should not be modified in-place.

Parameters:
  • values (Number | Iterator[Number])

  • device (device | None)

  • dtype (dtype | None)

Return type:

Tensor | Tuple[Tensor, …]

botorch.utils.probability.utils.get_constants_like(values, ref)[source]
Parameters:
  • values (Number | Iterator[Number])

  • ref (Tensor)

Return type:

Tensor | Iterator[Tensor]

botorch.utils.probability.utils.gen_positional_indices(shape, dim, device=None)[source]
Parameters:
  • shape (Size)

  • dim (int)

  • device (device | None)

Return type:

Iterator[LongTensor]

botorch.utils.probability.utils.build_positional_indices(shape, dim, device=None)[source]
Parameters:
  • shape (Size)

  • dim (int)

  • device (device | None)

Return type:

LongTensor

botorch.utils.probability.utils.leggauss(deg, **tkwargs)[source]
Parameters:
  • deg (int)

  • tkwargs (Any)

Return type:

Tuple[Tensor, Tensor]

botorch.utils.probability.utils.ndtr(x)[source]

Standard normal CDF.

Parameters:

x (Tensor)

Return type:

Tensor

botorch.utils.probability.utils.phi(x)[source]

Standard normal PDF.

Parameters:

x (Tensor)

Return type:

Tensor

botorch.utils.probability.utils.log_phi(x)[source]

Logarithm of standard normal pdf

Parameters:

x (Tensor)

Return type:

Tensor

botorch.utils.probability.utils.log_ndtr(x)[source]

Implementation of log_ndtr that remedies problems of torch.special’s version for large negative x, where the torch implementation yields Inf or NaN gradients.

Parameters:

x (Tensor) – An input tensor with dtype torch.float32 or torch.float64.

Returns:

A tensor of values of the same type and shape as x containing log(ndtr(x)).

Return type:

Tensor

botorch.utils.probability.utils.log_erfc(x)[source]

Computes the logarithm of the complementary error function in a numerically stable manner. The GitHub issue https://github.com/pytorch/pytorch/issues/31945 tracks progress toward moving this feature into PyTorch in C++.

Parameters:

x (Tensor) – An input tensor with dtype torch.float32 or torch.float64.

Returns:

A tensor of values of the same type and shape as x containing log(erfc(x)).

Return type:

Tensor

botorch.utils.probability.utils.log_erfcx(x)[source]

Computes the logarithm of the complementary scaled error function in a numerically stable manner. The GitHub issue tracks progress toward moving this feature into PyTorch in C++: https://github.com/pytorch/pytorch/issues/31945.

Parameters:

x (Tensor) – An input tensor with dtype torch.float32 or torch.float64.

Returns:

A tensor of values of the same type and shape as x containing log(erfcx(x)).

Return type:

Tensor

botorch.utils.probability.utils.standard_normal_log_hazard(x)[source]

Computes the logarithm of the hazard function of the standard normal distribution, i.e. log(phi(x) / Phi(-x)).

Parameters:

x (Tensor) – A tensor of any shape, with either float32 or float64 dtypes.

Returns:

A Tensor of the same shape x, containing the values of the logarithm of the hazard function evaluated at x.

Return type:

Tensor

botorch.utils.probability.utils.log_prob_normal_in(a, b)[source]

Computes the probability that a standard normal random variable takes a value in [a, b], i.e. log(Phi(b) - Phi(a)), where Phi is the standard normal CDF. Returns accurate values and permits numerically stable backward passes for inputs in [-1e100, 1e100] for double precision and [-1e20, 1e20] for single precision. In contrast, a naive approach is not numerically accurate beyond [-10, 10].

Parameters:
  • a (Tensor) – Tensor of lower integration bounds of the Gaussian probability measure.

  • b (Tensor) – Tensor of upper integration bounds of the Gaussian probability measure.

Returns:

Tensor of the log probabilities.

Return type:

Tensor

botorch.utils.probability.utils.swap_along_dim_(values, i, j, dim, buffer=None)[source]

Swaps Tensor slices in-place along dimension dim.

When passed as Tensors, i (and j) should be dim-dimensional tensors with the same shape as values.shape[:dim]. The xception to this rule occurs when dim=0, in which case i (and j) should be (at most) one-dimensional when passed as a Tensor.

Parameters:
  • values (Tensor) – Tensor whose values are to be swapped.

  • i (int | LongTensor) – Indices for slices along dimension dim.

  • j (int | LongTensor) – Indices for slices along dimension dim.

  • dim (int) – The dimension of values along which to swap slices.

  • buffer (Tensor | None) – Optional buffer used internally to store copied values.

Returns:

The original values tensor.

Return type:

Tensor