botorch.models

Model APIs

Base Model API

Abstract base module for all BoTorch models.

This module contains Model, the abstract base class for all BoTorch models, and ModelList, a container for a list of Models.

class botorch.models.model.Model(*args, **kwargs)[source]

Bases: Module, ABC

Abstract base class for BoTorch models.

The Model base class cannot be used directly; it only defines an API for other BoTorch models.

Model subclasses torch.nn.Module. While a Module is most typically encountered as a representation of a neural network layer, it can be used more generally: see documentation on custom NN Modules.

Module provides several pieces of useful functionality: A Model’s attributes of Tensor or Module type are automatically registered so they can be moved and/or cast with the to method, automatically differentiated, and used with CUDA.

_has_transformed_inputs

A boolean denoting whether train_inputs are currently stored as transformed or not.

Type:

bool

_original_train_inputs

A Tensor storing the original train inputs for use in _revert_to_original_inputs. Note that this is necessary since transform / untransform cycle introduces numerical errors which lead to upstream errors during training.

Type:

torch.Tensor | None

_is_fully_bayesian

Returns True if this is a fully Bayesian model.

_is_ensemble

Returns True if this model consists of multiple models that are stored in an additional batch dimension. This is true for the fully Bayesian models.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

abstract posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]

Computes the posterior over model outputs at the provided points.

Note: The input transforms should be applied here using

self.transform_inputs(X) after the self.eval() call and before any model.forward or model.likelihood calls.

Parameters:
  • X (Tensor) – A b x q x d-dim Tensor, where d is the dimension of the feature space, q is the number of points considered jointly, and b is the batch dimension.

  • output_indices (list[int] | None) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.

  • observation_noise (bool | Tensor) – For models with an inferred noise level, if True, include observation noise. For models with an observed noise level, this must be a model_batch_shape x 1 x m-dim tensor or a model_batch_shape x n’ x m-dim tensor containing the average noise for each batch and output. noise must be in the outcome-transformed space if an outcome transform is used.

  • posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.

Returns:

A Posterior object, representing a batch of b joint distributions over q points and m outputs each.

Return type:

Posterior

property batch_shape: Size

The batch shape of the model.

This is a batch shape from an I/O perspective, independent of the internal representation of the model (as e.g. in BatchedMultiOutputGPyTorchModel). For a model with m outputs, a test_batch_shape x q x d-shaped input X to the posterior method returns a Posterior object over an output of shape broadcast(test_batch_shape, model.batch_shape) x q x m.

property num_outputs: int

The number of outputs of the model.

subset_output(idcs)[source]

Subset the model along the output dimension.

Parameters:

idcs (list[int]) – The output indices to subset the model to.

Returns:

A Model object of the same type and with the same parameters as the current model, subset to the specified output indices.

Return type:

Model

condition_on_observations(X, Y, **kwargs)[source]

Condition the model on new observations.

Parameters:
  • X (Tensor) – A batch_shape x n’ x d-dim Tensor, where d is the dimension of the feature space, n’ is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).

  • Y (Tensor) – A batch_shape’ x n’ x m-dim Tensor, where m is the number of model outputs, n’ is the number of points per batch, and batch_shape’ is the batch shape of the observations. batch_shape’ must be broadcastable to batch_shape using standard broadcasting semantics. If Y has fewer batch dimensions than X, it is assumed that the missing batch dimensions are the same for all Y.

  • kwargs (Any)

Returns:

A Model object of the same type, representing the original model conditioned on the new observations (X, Y) (and possibly noise observations passed in via kwargs).

Return type:

Model

classmethod construct_inputs(training_data)[source]

Construct Model keyword arguments from a SupervisedDataset.

Parameters:

training_data (SupervisedDataset) – A SupervisedDataset, with attributes train_X, train_Y, and, optionally, train_Yvar.

Returns:

A dict of keyword arguments that can be used to initialize a Model, with keys train_X, train_Y, and, optionally, train_Yvar.

Return type:

dict[str, BotorchContainer | Tensor]

transform_inputs(X, input_transform=None)[source]

Transform inputs.

Parameters:
  • X (Tensor) – A tensor of inputs

  • input_transform (Module | None) – A Module that performs the input transformation.

Returns:

A tensor of transformed inputs

Return type:

Tensor

eval()[source]

Puts the model in eval mode and sets the transformed inputs.

Return type:

Model

train(mode=True)[source]

Put the model in train mode. Reverts to the original inputs if in train mode (mode=True) or sets transformed inputs if in eval mode (mode=False).

Parameters:

mode (bool) – A boolean denoting whether to put in train or eval mode. If False, model is put in eval mode.

Return type:

Model

property dtypes_of_buffers: set[dtype]
class botorch.models.model.FantasizeMixin[source]

Bases: ABC

Mixin to add a fantasize method to a Model.

Example

class BaseModel:

def __init__(self, …): def condition_on_observations(self, …): def posterior(self, …): def transform_inputs(self, …):

class ModelThatCanFantasize(BaseModel, FantasizeMixin):
def __init__(self, args):

super().__init__(args)

model = ModelThatCanFantasize(…) model.fantasize(X)

abstract condition_on_observations(X, Y)[source]

Classes that inherit from FantasizeMixin must implement a condition_on_observations method.

Parameters:
  • X (Tensor)

  • Y (Tensor)

Return type:

Self

abstract posterior(X, *args, observation_noise=False)[source]

Classes that inherit from FantasizeMixin must implement a posterior method.

Parameters:
  • X (Tensor)

  • observation_noise (bool)

Return type:

Posterior

abstract transform_inputs(X, input_transform=None)[source]

Classes that inherit from FantasizeMixin must implement a transform_inputs method.

Parameters:
  • X (Tensor)

  • input_transform (Module | None)

Return type:

Tensor

fantasize(X, sampler, observation_noise=None, **kwargs)[source]

Construct a fantasy model.

Constructs a fantasy model in the following fashion: (1) compute the model posterior at X, including observation noise. If observation_noise is a Tensor, use it directly as the observation noise to add. (2) sample from this posterior (using sampler) to generate “fake” observations. (3) condition the model on the new fake observations.

Parameters:
  • X (Tensor) – A batch_shape x n’ x d-dim Tensor, where d is the dimension of the feature space, n’ is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).

  • sampler (MCSampler) – The sampler used for sampling from the posterior at X.

  • observation_noise (Tensor | None) – A model_batch_shape x 1 x m-dim tensor or a model_batch_shape x n’ x m-dim tensor containing the average noise for each batch and output, where m is the number of outputs. noise must be in the outcome-transformed space if an outcome transform is used. If None and using an inferred noise likelihood, the noise will be the inferred noise level. If using a fixed noise likelihood, the mean across the observation noise in the training data is used as observation noise.

  • kwargs (Any) – Will be passed to model.condition_on_observations

Returns:

The constructed fantasy model.

Return type:

Self

class botorch.models.model.ModelList(*models)[source]

Bases: Model

A multi-output Model represented by a list of independent models.

All BoTorch models are acceptable as inputs. The cost of this flexibility is that ModelList does not support all methods that may be implemented by its component models. One use case for ModelList is combining a regression model and a deterministic model in one multi-output container model, e.g. for cost-aware or multi-objective optimization where one of the outcomes is a deterministic function of the inputs.

Parameters:

*models (Model) – A variable number of models.

Example

>>> m_1 = SingleTaskGP(train_X, train_Y)
>>> m_2 = GenericDeterministicModel(lambda x: x.sum(dim=-1))
>>> m_12 = ModelList(m_1, m_2)
>>> m_12.posterior(test_X)
posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]

Computes the posterior over model outputs at the provided points.

Note: The input transforms should be applied here using

self.transform_inputs(X) after the self.eval() call and before any model.forward or model.likelihood calls.

Parameters:
  • X (Tensor) – A b x q x d-dim Tensor, where d is the dimension of the feature space, q is the number of points considered jointly, and b is the batch dimension.

  • output_indices (list[int] | None) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.

  • observation_noise (bool | Tensor) – If True, add the observation noise from the respective likelihoods to the posterior. If a Tensor of shape (batch_shape) x q x m, use it directly as the observation noise (with observation_noise[…,i] added to the posterior of the i-th model). observation_noise is assumed to be in the outcome-transformed space, if an outcome transform is used by the model.

  • posterior_transform (Callable[[PosteriorList], Posterior] | None) – An optional PosteriorTransform.

Returns:

A Posterior object, representing a batch of b joint distributions over q points and m outputs each.

Return type:

Posterior

property batch_shape: Size

The batch shape of the model.

This is a batch shape from an I/O perspective, independent of the internal representation of the model (as e.g. in BatchedMultiOutputGPyTorchModel). For a model with m outputs, a test_batch_shape x q x d-shaped input X to the posterior method returns a Posterior object over an output of shape broadcast(test_batch_shape, model.batch_shape) x q x m.

property num_outputs: int

The number of outputs of the model.

Equal to the sum of the number of outputs of the individual models in the ModelList.

subset_output(idcs)[source]

Subset the model along the output dimension.

Parameters:

idcs (list[int]) – The output indices to subset the model to. Relative to the overall number of outputs of the model.

Returns:

A Model (either a ModelList or one of the submodels) with the outputs subset to the indices in idcs.

Return type:

Model

Internally, this drops (if single-output) or subsets (if multi-output) the constitutent models and returns them as a ModelList. If the result is a single (possibly subset) model from the list, returns this model (instead of forming a degenerate singe-model ModelList). For instance, if m = ModelList(m1, m2) with m1 a two-output model and m2 a single-output model, then m.subset_output([1]) ` will return the model `m1 subset to its second output.

transform_inputs(X)[source]

Individually transform the inputs for each model.

Parameters:

X (Tensor) – A tensor of inputs.

Returns:

A list of tensors of transformed inputs.

Return type:

list[Tensor]

load_state_dict(state_dict, strict=True)[source]

Initialize the fully Bayesian models before loading the state dict.

Parameters:
  • state_dict (Mapping[str, Any])

  • strict (bool)

Return type:

None

fantasize(X, sampler, observation_noise=None, evaluation_mask=None, **kwargs)[source]

Construct a fantasy model.

Constructs a fantasy model in the following fashion: (1) compute the model posterior at X (including observation noise if observation_noise=True). (2) sample from this posterior (using sampler) to generate “fake” observations. (3) condition the model on the new fake observations.

Parameters:
  • X (Tensor) – A batch_shape x n’ x d-dim Tensor, where d is the dimension of the feature space, n’ is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).

  • sampler (MCSampler) – The sampler used for sampling from the posterior at X. If evaluation_mask is not None, this must be a ListSampler.

  • observation_noise (Tensor | None) – A model_batch_shape x 1 x m-dim tensor or a model_batch_shape x n’ x m-dim tensor containing the average noise for each batch and output, where m is the number of outputs. noise must be in the outcome-transformed space if an outcome transform is used. If None, then the noise will be the inferred noise level.

  • evaluation_mask (Tensor | None) – A n’ x m-dim tensor of booleans indicating which outputs should be fantasized for a given design. This uses the same evaluation mask for all batches.

  • kwargs (Any)

Returns:

The constructed fantasy model.

Return type:

Model

class botorch.models.model.ModelDict(**models)[source]

Bases: ModuleDict

A lightweight container mapping model names to models.

Initialize a ModelDict.

Parameters:

models (Model) – An arbitrary number of models. Each model can be any type of BoTorch Model, including multi-output models and ModelList.

GPyTorch Model API

Abstract model class for all GPyTorch-based botorch models.

To implement your own, simply inherit from both the provided classes and a GPyTorch Model class such as an ExactGP.

class botorch.models.gpytorch.GPyTorchModel(*args, **kwargs)[source]

Bases: Model, ABC

Abstract base class for models based on GPyTorch models.

The easiest way to use this is to subclass a model from a GPyTorch model class (e.g. an ExactGP) and this GPyTorchModel. See e.g. SingleTaskGP.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

likelihood: Likelihood
property batch_shape: Size

The batch shape of the model.

This is a batch shape from an I/O perspective, independent of the internal representation of the model (as e.g. in BatchedMultiOutputGPyTorchModel). For a model with m outputs, a test_batch_shape x q x d-shaped input X to the posterior method returns a Posterior object over an output of shape broadcast(test_batch_shape, model.batch_shape) x q x m.

property num_outputs: int

The number of outputs of the model.

posterior(X, observation_noise=False, posterior_transform=None, **kwargs)[source]

Computes the posterior over model outputs at the provided points.

Parameters:
  • X (Tensor) – A (batch_shape) x q x d-dim Tensor, where d is the dimension of the feature space and q is the number of points considered jointly.

  • observation_noise (bool | Tensor) – If True, add the observation noise from the likelihood to the posterior. If a Tensor, use it directly as the observation noise (must be of shape (batch_shape) x q). It is assumed to be in the outcome-transformed space if an outcome transform is used.

  • posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.

  • kwargs (Any)

Returns:

A GPyTorchPosterior object, representing a batch of b joint distributions over q points. Includes observation noise if specified.

Return type:

GPyTorchPosterior | TransformedPosterior

condition_on_observations(X, Y, noise=None, **kwargs)[source]

Condition the model on new observations.

Parameters:
  • X (Tensor) – A batch_shape x n’ x d-dim Tensor, where d is the dimension of the feature space, n’ is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).

  • Y (Tensor) – A batch_shape’ x n x m-dim Tensor, where m is the number of model outputs, n’ is the number of points per batch, and batch_shape’ is the batch shape of the observations. batch_shape’ must be broadcastable to batch_shape using standard broadcasting semantics. If Y has fewer batch dimensions than X, its is assumed that the missing batch dimensions are the same for all Y.

  • noise (Tensor | None) – If not None, a tensor of the same shape as Y representing the associated noise variance.

  • kwargs (Any) – Passed to self.get_fantasy_model.

Returns:

A Model object of the same type, representing the original model conditioned on the new observations (X, Y) (and possibly noise observations passed in via kwargs).

Return type:

Model

Example

>>> train_X = torch.rand(20, 2)
>>> train_Y = torch.sin(train_X[:, 0]) + torch.cos(train_X[:, 1])
>>> model = SingleTaskGP(train_X, train_Y)
>>> new_X = torch.rand(5, 2)
>>> new_Y = torch.sin(new_X[:, 0]) + torch.cos(new_X[:, 1])
>>> model = model.condition_on_observations(X=new_X, Y=new_Y)
class botorch.models.gpytorch.BatchedMultiOutputGPyTorchModel(*args, **kwargs)[source]

Bases: GPyTorchModel

Base class for batched multi-output GPyTorch models with independent outputs.

This model should be used when the same training data is used for all outputs. Outputs are modeled independently by using a different batch for each output.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

static get_batch_dimensions(train_X, train_Y)[source]

Get the raw batch shape and output-augmented batch shape of the inputs.

Parameters:
  • train_X (Tensor) – A n x d or batch_shape x n x d (batch mode) tensor of training features.

  • train_Y (Tensor) – A n x m or batch_shape x n x m (batch mode) tensor of training observations.

Returns:

2-element tuple containing

  • The input_batch_shape

  • The output-augmented batch shape: input_batch_shape x (m)

Return type:

tuple[Size, Size]

property batch_shape: Size

The batch shape of the model.

This is a batch shape from an I/O perspective, independent of the internal representation of the model (as e.g. in BatchedMultiOutputGPyTorchModel). For a model with m outputs, a test_batch_shape x q x d-shaped input X to the posterior method returns a Posterior object over an output of shape broadcast(test_batch_shape, model.batch_shape) x q x m.

posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]

Computes the posterior over model outputs at the provided points.

Parameters:
  • X (Tensor) – A (batch_shape) x q x d-dim Tensor, where d is the dimension of the feature space and q is the number of points considered jointly.

  • output_indices (list[int] | None) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.

  • observation_noise (bool | Tensor) – If True, add the observation noise from the likelihood to the posterior. If a Tensor, use it directly as the observation noise (must be of shape (batch_shape) x q x m).

  • posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.

Returns:

A GPyTorchPosterior object, representing batch_shape joint distributions over q points and the outputs selected by output_indices each. Includes observation noise if specified.

Return type:

GPyTorchPosterior | TransformedPosterior

condition_on_observations(X, Y, **kwargs)[source]

Condition the model on new observations.

Parameters:
  • X (Tensor) – A batch_shape x n’ x d-dim Tensor, where d is the dimension of the feature space, m is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).

  • Y (Tensor) – A batch_shape’ x n’ x m-dim Tensor, where m is the number of model outputs, n’ is the number of points per batch, and batch_shape’ is the batch shape of the observations. batch_shape’ must be broadcastable to batch_shape using standard broadcasting semantics. If Y has fewer batch dimensions than X, its is assumed that the missing batch dimensions are the same for all Y.

  • kwargs (Any)

Returns:

A BatchedMultiOutputGPyTorchModel object of the same type with n + n’ training examples, representing the original model conditioned on the new observations (X, Y) (and possibly noise observations passed in via kwargs).

Return type:

BatchedMultiOutputGPyTorchModel

Example

>>> train_X = torch.rand(20, 2)
>>> train_Y = torch.cat(
>>>     [torch.sin(train_X[:, 0]), torch.cos(train_X[:, 1])], -1
>>> )
>>> model = SingleTaskGP(train_X, train_Y)
>>> new_X = torch.rand(5, 2)
>>> new_Y = torch.cat([torch.sin(new_X[:, 0]), torch.cos(new_X[:, 1])], -1)
>>> model = model.condition_on_observations(X=new_X, Y=new_Y)
subset_output(idcs)[source]

Subset the model along the output dimension.

Parameters:

idcs (list[int]) – The output indices to subset the model to.

Returns:

The current model, subset to the specified output indices.

Return type:

BatchedMultiOutputGPyTorchModel

class botorch.models.gpytorch.ModelListGPyTorchModel(*models)[source]

Bases: ModelList, GPyTorchModel, ABC

Abstract base class for models based on multi-output GPyTorch models.

This is meant to be used with a gpytorch ModelList wrapper for independent evaluation of submodels. Those submodels can themselves be multi-output models, in which case the task covariances will be ignored.

Parameters:

*models (Model) – A variable number of models.

Example

>>> m_1 = SingleTaskGP(train_X, train_Y)
>>> m_2 = GenericDeterministicModel(lambda x: x.sum(dim=-1))
>>> m_12 = ModelList(m_1, m_2)
>>> m_12.posterior(test_X)
property batch_shape: Size

The batch shape of the model.

This is a batch shape from an I/O perspective, independent of the internal representation of the model (as e.g. in BatchedMultiOutputGPyTorchModel). For a model with m outputs, a test_batch_shape x q x d-shaped input X to the posterior method returns a Posterior object over an output of shape broadcast(test_batch_shape, model.batch_shape) x q x m.

posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]

Computes the posterior over model outputs at the provided points. If any model returns a MultitaskMultivariateNormal posterior, then that will be split into individual MVNs per task, with inter-task covariance ignored.

Parameters:
  • X (Tensor) – A b x q x d-dim Tensor, where d is the dimension of the feature space, q is the number of points considered jointly, and b is the batch dimension.

  • output_indices (list[int] | None) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.

  • observation_noise (bool | Tensor) – If True, add the observation noise from the respective likelihoods to the posterior. If a Tensor of shape (batch_shape) x q x m, use it directly as the observation noise (with observation_noise[…,i] added to the posterior of the i-th model).

  • posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.

Returns:

  • If no posterior_transform is provided and the component models have no

    outcome_transform, or if the component models only use linear outcome transforms like Standardize (i.e. not Log), returns a GPyTorchPosterior or GaussianMixturePosterior object, representing batch_shape joint distributions over q points and the outputs selected by output_indices each. Includes measurement noise if observation_noise is specified.

  • If no posterior_transform is provided and component models have

    nonlinear transforms like Log, returns a PosteriorList with sub-posteriors of type TransformedPosterior

  • If posterior_transform is provided, that posterior transform will be

    applied and will determine the return type. This could potentially be any subclass of Posterior, but common choices give a GPyTorchPosterior.

Return type:

GPyTorchPosterior | PosteriorList

condition_on_observations(X, Y, **kwargs)[source]

Condition the model on new observations.

Parameters:
  • X (Tensor) – A batch_shape x n’ x d-dim Tensor, where d is the dimension of the feature space, n’ is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).

  • Y (Tensor) – A batch_shape’ x n’ x m-dim Tensor, where m is the number of model outputs, n’ is the number of points per batch, and batch_shape’ is the batch shape of the observations. batch_shape’ must be broadcastable to batch_shape using standard broadcasting semantics. If Y has fewer batch dimensions than X, it is assumed that the missing batch dimensions are the same for all Y.

  • kwargs (Any)

Returns:

A Model object of the same type, representing the original model conditioned on the new observations (X, Y) (and possibly noise observations passed in via kwargs).

Return type:

Model

class botorch.models.gpytorch.MultiTaskGPyTorchModel(*args, **kwargs)[source]

Bases: GPyTorchModel, ABC

Abstract base class for multi-task models based on GPyTorch models.

This class provides the posterior method to models that implement a “long-format” multi-task GP in the style of MultiTaskGP.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]

Computes the posterior over model outputs at the provided points.

Parameters:
  • X (Tensor) – A tensor of shape batch_shape x q x d or batch_shape x q x (d + 1), where d is the dimension of the feature space (not including task indices) and q is the number of points considered jointly. The + 1 dimension is the optional task feature / index. If given, the model produces the outputs for the given task indices. If omitted, the model produces outputs for tasks in in self._output_tasks (specified as output_tasks while constructing the model), which can overwritten using output_indices.

  • output_indices (list[int] | None) – A list of task values over which to compute the posterior. Only used if X does not include the task feature. If omitted, defaults to self._output_tasks.

  • observation_noise (bool | Tensor) – If True, add observation noise from the respective likelihoods. If a Tensor, specifies the observation noise levels to add.

  • posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.

Returns:

A GPyTorchPosterior object, representing batch_shape joint distributions over q points. If the task features are included in X, the posterior will be single output. Otherwise, the posterior will be single or multi output corresponding to the tasks included in either the output_indices or self._output_tasks.

Return type:

GPyTorchPosterior | TransformedPosterior

subset_output(idcs)[source]

Returns a new model that only outputs a subset of the outputs.

Parameters:

idcs (list[int]) – A list of output indices, corresponding to the outputs to keep.

Returns:

A new model that only outputs the requested outputs.

Return type:

MultiTaskGPyTorchModel

Deterministic Model API

Deterministic Models: Simple wrappers that allow the usage of deterministic mappings via the BoTorch Model and Posterior APIs.

Deterministic models are useful for expressing known input-output relationships within the BoTorch Model API. This is useful e.g. for multi-objective optimization with known objective functions (e.g. the number of parameters of a Neural Network in the context of Neural Architecture Search is usually a known function of the architecture configuration), or to encode cost functions for cost-aware acquisition utilities. Cost-aware optimization is desirable when evaluations have a cost that is heterogeneous, either in the inputs X or in a particular fidelity parameter that directly encodes the fidelity of the observation. GenericDeterministicModel supports arbitrary deterministic functions, while AffineFidelityCostModel is a particular cost model for multi-fidelity optimization. Other use cases of deterministic models include representing approximate GP sample paths, e.g. Matheron paths obtained with get_matheron_path_model, which allows them to be substituted in acquisition functions or in other places where a Model is expected.

class botorch.models.deterministic.DeterministicModel(*args, **kwargs)[source]

Bases: EnsembleModel

Abstract base class for deterministic models.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(X)[source]

Compute the (deterministic) model output at X.

Parameters:

X (Tensor) – A batch_shape x n x d-dim input tensor X.

Returns:

A batch_shape x n x m-dimensional output tensor (the outcome dimension m must be explicit if m=1).

Return type:

Tensor

class botorch.models.deterministic.GenericDeterministicModel(f, num_outputs=1)[source]

Bases: DeterministicModel

A generic deterministic model constructed from a callable.

Example

>>> f = lambda x: x.sum(dim=-1, keep_dims=True)
>>> model = GenericDeterministicModel(f)
Parameters:
  • f (Callable[[Tensor], Tensor]) – A callable mapping a batch_shape x n x d-dim input tensor X to a batch_shape x n x m-dimensional output tensor (the outcome dimension m must be explicit, even if m=1).

  • num_outputs (int) – The number of outputs m.

subset_output(idcs)[source]

Subset the model along the output dimension.

Parameters:

idcs (list[int]) – The output indices to subset the model to.

Returns:

The current model, subset to the specified output indices.

Return type:

GenericDeterministicModel

forward(X)[source]

Compute the (deterministic) model output at X.

Parameters:

X (Tensor) – A batch_shape x n x d-dim input tensor X.

Returns:

A batch_shape x n x m-dimensional output tensor.

Return type:

Tensor

class botorch.models.deterministic.AffineDeterministicModel(a, b=0.01)[source]

Bases: DeterministicModel

An affine deterministic model.

Affine deterministic model from weights and offset terms.

A simple model of the form

y[…, m] = b[m] + sum_{i=1}^d a[i, m] * X[…, i]

Parameters:
  • a (Tensor) – A d x m-dim tensor of linear weights, where m is the number of outputs (must be explicit if m=1)

  • b (Tensor | float) – The affine (offset) term. Either a float (for single-output models or if the offset is shared), or a m-dim tensor (with different offset values for for the m different outputs).

subset_output(idcs)[source]

Subset the model along the output dimension.

Parameters:

idcs (list[int]) – The output indices to subset the model to.

Returns:

The current model, subset to the specified output indices.

Return type:

AffineDeterministicModel

forward(X)[source]

Compute the (deterministic) model output at X.

Parameters:

X (Tensor) – A batch_shape x n x d-dim input tensor X.

Returns:

A batch_shape x n x m-dimensional output tensor (the outcome dimension m must be explicit if m=1).

Return type:

Tensor

class botorch.models.deterministic.PosteriorMeanModel(model)[source]

Bases: DeterministicModel

A deterministic model that always returns the posterior mean.

Parameters:

model (Model) – The base model.

forward(X)[source]

Compute the (deterministic) model output at X.

Parameters:

X (Tensor) – A batch_shape x n x d-dim input tensor X.

Returns:

A batch_shape x n x m-dimensional output tensor (the outcome dimension m must be explicit if m=1).

Return type:

Tensor

class botorch.models.deterministic.FixedSingleSampleModel(model, w=None, dim=None, jitter=1e-08, dtype=None, device=None)[source]

Bases: DeterministicModel

A deterministic model defined by a single sample w.

Given a base model f and a fixed sample w, the model always outputs

y = f_mean(x) + f_stddev(x) * w

We assume the outcomes are uncorrelated here.

Parameters:
  • model (Model) – The base model.

  • w (Tensor | None) – A 1-d tensor with length model.num_outputs. If None, draw it from a standard normal distribution.

  • dim (int | None) – dimensionality of w. If None and w is not provided, draw w samples of size model.num_outputs.

  • jitter (float | None) – jitter value to be added for numerical stability, 1e-8 by default.

  • dtype (torch.dtype | None) – dtype for w if specified

  • device (torch.dtype | None) – device for w if specified

forward(X)[source]

Compute the (deterministic) model output at X.

Parameters:

X (Tensor) – A batch_shape x n x d-dim input tensor X.

Returns:

A batch_shape x n x m-dimensional output tensor (the outcome dimension m must be explicit if m=1).

Return type:

Tensor

Ensemble Model API

Ensemble Models: Simple wrappers that allow the usage of ensembles via the BoTorch Model and Posterior APIs.

class botorch.models.ensemble.EnsembleModel(*args, **kwargs)[source]

Bases: Model, ABC

Abstract base class for ensemble models.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(X)[source]

Compute the (ensemble) model output at X.

Parameters:

X (Tensor) – A batch_shape x n x d-dim input tensor X.

Returns:

A batch_shape x s x n x m-dimensional output tensor where s is the size of the ensemble.

Return type:

Tensor

property num_outputs: int

The number of outputs of the model.

posterior(X, output_indices=None, posterior_transform=None, **kwargs)[source]

Compute the ensemble posterior at X.

Parameters:
  • X (Tensor) – A batch_shape x q x d-dim input tensor X.

  • output_indices (list[int] | None) – A list of indices, corresponding to the outputs over which to compute the posterior. If omitted, computes the posterior over all model outputs.

  • posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.

  • kwargs (Any)

Returns:

An EnsemblePosterior object, representing batch_shape joint posteriors over n points and the outputs selected by output_indices.

Return type:

EnsemblePosterior

Models

Cost Models (for cost-aware optimization)

Cost models to be used with multi-fidelity optimization.

Cost are useful for defining known cost functions when the cost of an evaluation is heterogeneous in fidelity. For a full worked example, see the tutorial on continuous multi-fidelity Bayesian Optimization.

class botorch.models.cost.AffineFidelityCostModel(fidelity_weights=None, fixed_cost=0.01)[source]

Bases: DeterministicModel

Deterministic, affine cost model operating on fidelity parameters.

For each (q-batch) element of a candidate set X, this module computes a cost of the form

cost = fixed_cost + sum_j weights[j] * X[fidelity_dims[j]]

For a full worked example, see the tutorial on continuous multi-fidelity Bayesian Optimization.

Example

>>> from botorch.models import AffineFidelityCostModel
>>> from botorch.acquisition.cost_aware import InverseCostWeightedUtility
>>> cost_model = AffineFidelityCostModel(
>>>    fidelity_weights={6: 1.0}, fixed_cost=5.0
>>> )
>>> cost_aware_utility = InverseCostWeightedUtility(cost_model=cost_model)
Parameters:
  • fidelity_weights (dict[int, float] | None) – A dictionary mapping a subset of columns of X (the fidelity parameters) to its associated weight in the affine cost expression. If omitted, assumes that the last column of X is the fidelity parameter with a weight of 1.0.

  • fixed_cost (float) – The fixed cost of running a single candidate point (i.e. an element of a q-batch).

forward(X)[source]

Evaluate the cost on a candidate set X.

Computes a cost of the form

cost = fixed_cost + sum_j weights[j] * X[fidelity_dims[j]]

for each element of the q-batch

Parameters:

X (Tensor) – A batch_shape x q x d’-dim tensor of candidate points.

Returns:

A batch_shape x q x 1-dim tensor of costs.

Return type:

Tensor

class botorch.models.cost.FixedCostModel(fixed_cost)[source]

Bases: DeterministicModel

Deterministic, fixed cost model.

For each (q-batch) element of a candidate set X, this module computes a fixed cost per objective.

Parameters:

fixed_cost (Tensor) – A m-dim tensor containing the fixed cost of evaluating each objective.

forward(X)[source]

Evaluate the cost on a candidate set X.

Computes the fixed cost of evaluating each objective for each element of the q-batch.

Parameters:

X (Tensor) – A batch_shape x q x d’-dim tensor of candidate points.

Returns:

A batch_shape x q x m-dim tensor of costs.

Return type:

Tensor

GP Regression Models

Gaussian Process Regression models based on GPyTorch models.

These models are often a good starting point and are further documented in the tutorials.

SingleTaskGP is a single-task exact GP model that uses relatively strong priors on the Kernel hyperparameters, which work best when covariates are normalized to the unit cube and outcomes are standardized (zero mean, unit variance). By default, this model uses a Standardize outcome transform, which applies this standardization. However, it does not (yet) use an input transform by default.

SingleTaskGP model works in batch mode (each batch having its own hyperparameters). When the training observations include multiple outputs, SingleTaskGP uses batching to model outputs independently.

SingleTaskGP supports multiple outputs. However, as a single-task model, SingleTaskGP should be used only when the outputs are independent and all use the same training inputs. If outputs are independent but they have different training inputs, use the ModelListGP. When modeling correlations between outputs, use a multi-task model like MultiTaskGP.

class botorch.models.gp_regression.SingleTaskGP(train_X, train_Y, train_Yvar=None, likelihood=None, covar_module=None, mean_module=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None)[source]

Bases: BatchedMultiOutputGPyTorchModel, ExactGP, FantasizeMixin

A single-task exact GP model, supporting both known and inferred noise levels.

A single-task exact GP which, by default, utilizes hyperparameter priors from [Hvarfner2024vanilla]. These priors designed to perform well independently of the dimensionality of the problem. Moreover, they suggest a moderately low level of noise. Importantly, The model works best when covariates are normalized to the unit cube and outcomes are standardized (zero mean, unit variance). For a detailed discussion on the hyperparameter priors, see https://github.com/pytorch/botorch/discussions/2451.

This model works in batch mode (each batch having its own hyperparameters). When the training observations include multiple outputs, this model will use batching to model outputs independently.

Use this model when you have independent output(s) and all outputs use the same training data. If outputs are independent and outputs have different training data, use the ModelListGP. When modeling correlations between outputs, use the MultiTaskGP.

An example of a case in which noise levels are known is online experimentation, where noise can be measured using the variability of different observations from the same arm, or provided by outside software. Another use case is simulation optimization, where the evaluation can provide variance estimates, perhaps from bootstrapping. In any case, these noise levels can be provided to SingleTaskGP as train_Yvar.

SingleTaskGP can also be used when the observations are known to be noise-free. Noise-free observations can be modeled using arbitrarily small noise values, such as train_Yvar=torch.full_like(train_Y, 1e-6).

Example

Model with inferred noise levels:

>>> import torch
>>> from botorch.models.gp_regression import SingleTaskGP
>>> from botorch.models.transforms.outcome import Standardize
>>>
>>> train_X = torch.rand(20, 2, dtype=torch.float64)
>>> train_Y = torch.sin(train_X).sum(dim=1, keepdim=True)
>>> outcome_transform = Standardize(m=1)
>>> inferred_noise_model = SingleTaskGP(
...     train_X, train_Y, outcome_transform=outcome_transform,
... )

Model with a known observation variance of 0.2:

>>> train_Yvar = torch.full_like(train_Y, 0.2)
>>> observed_noise_model = SingleTaskGP(
...     train_X, train_Y, train_Yvar,
...     outcome_transform=outcome_transform,
... )

With noise-free observations:

>>> train_Yvar = torch.full_like(train_Y, 1e-6)
>>> noise_free_model = SingleTaskGP(
...     train_X, train_Y, train_Yvar,
...     outcome_transform=outcome_transform,
... )
Parameters:
  • train_X (Tensor) – A batch_shape x n x d tensor of training features.

  • train_Y (Tensor) – A batch_shape x n x m tensor of training observations.

  • train_Yvar (Tensor | None) – An optional batch_shape x n x m tensor of observed measurement noise.

  • likelihood (Likelihood | None) – A likelihood. If omitted, use a standard GaussianLikelihood with inferred noise level if train_Yvar is None, and a FixedNoiseGaussianLikelihood with the given noise observations if train_Yvar is not None.

  • covar_module (Module | None) – The module computing the covariance (Kernel) matrix. If omitted, uses an RBFKernel.

  • mean_module (Mean | None) – The mean function to be used. If omitted, use a ConstantMean.

  • outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the Posterior obtained by calling .posterior on the model will be on the original scale). We use a Standardize transform if no outcome_transform is specified. Pass down None to use no outcome transform.

  • input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.

classmethod construct_inputs(training_data, *, task_feature=None)[source]

Construct SingleTaskGP keyword arguments from a SupervisedDataset.

Parameters:
  • training_data (SupervisedDataset) – A SupervisedDataset, with attributes train_X, train_Y, and, optionally, train_Yvar.

  • task_feature (int | None) – Deprecated and allowed only for backward compatibility; ignored.

Returns:

A dict of keyword arguments that can be used to initialize a SingleTaskGP, with keys train_X, train_Y, and, optionally, train_Yvar.

Return type:

dict[str, BotorchContainer | Tensor]

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Parameters:

x (Tensor)

Return type:

MultivariateNormal

Multi-Fidelity GP Regression Models

Multi-Fidelity Gaussian Process Regression models based on GPyTorch models.

For more on Multi-Fidelity BO, see the tutorial.

A common use case of multi-fidelity regression modeling is optimizing a “high-fidelity” function that is expensive to simulate when you have access to one or more cheaper “lower-fidelity” versions that are not fully accurate but are correlated with the high-fidelity function. The multi-fidelity model models both the low- and high-fidelity functions together, including the correlation between them, which can help you predict and optimize the high-fidelity function without having to do too many expensive high-fidelity evaluations.

[Wu2019mf]

J. Wu, S. Toscano-Palmerin, P. I. Frazier, and A. G. Wilson. Practical multi-fidelity bayesian optimization for hyperparameter tuning. ArXiv 2019.

class botorch.models.gp_regression_fidelity.SingleTaskMultiFidelityGP(train_X, train_Y, train_Yvar=None, iteration_fidelity=None, data_fidelities=None, linear_truncated=True, nu=2.5, likelihood=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None)[source]

Bases: SingleTaskGP

A single task multi-fidelity GP model.

A SingleTaskGP model using a DownsamplingKernel for the data fidelity parameter (if present) and an ExponentialDecayKernel for the iteration fidelity parameter (if present).

This kernel is described in [Wu2019mf].

Example

>>> train_X = torch.rand(20, 4)
>>> train_Y = train_X.pow(2).sum(dim=-1, keepdim=True)
>>> model = SingleTaskMultiFidelityGP(train_X, train_Y, data_fidelities=[3])
Parameters:
  • train_X (Tensor) – A batch_shape x n x (d + s) tensor of training features, where s is the dimension of the fidelity parameters (either one or two).

  • train_Y (Tensor) – A batch_shape x n x m tensor of training observations.

  • train_Yvar (Tensor | None) – An optional batch_shape x n x m tensor of observed measurement noise.

  • iteration_fidelity (int | None) – The column index for the training iteration fidelity parameter (optional).

  • data_fidelities (Sequence[int] | None) – The column indices for the downsampling fidelity parameter. If a list/tuple of indices is provided, a kernel will be constructed for each index (optional).

  • linear_truncated (bool) – If True, use a LinearTruncatedFidelityKernel instead of the default kernel.

  • nu (float) – The smoothness parameter for the Matern kernel: either 1/2, 3/2, or 5/2. Only used when linear_truncated=True.

  • likelihood (Likelihood | None) – A likelihood. If omitted, use a standard GaussianLikelihood with inferred noise level.

  • outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the Posterior obtained by calling .posterior on the model will be on the original scale). We use a Standardize transform if no outcome_transform is specified. Pass down None to use no outcome transform.

  • input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.

classmethod construct_inputs(training_data, fidelity_features)[source]

Construct Model keyword arguments from a dict of SupervisedDataset.

Parameters:
  • training_data (SupervisedDataset) – Dictionary of SupervisedDataset.

  • fidelity_features (list[int]) – Index of fidelity parameter as input columns.

Return type:

dict[str, Any]

GP Regression Models for Mixed Parameter Spaces

class botorch.models.gp_regression_mixed.MixedSingleTaskGP(train_X, train_Y, cat_dims, train_Yvar=None, cont_kernel_factory=None, likelihood=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None)[source]

Bases: SingleTaskGP

A single-task exact GP model for mixed search spaces.

This model is similar to SingleTaskGP, but supports mixed search spaces, which combine discrete and continuous features, as well as solely discrete spaces. It uses a kernel that combines a CategoricalKernel (based on Hamming distances) and a regular kernel into a kernel of the form

K((x1, c1), (x2, c2)) =

K_cont_1(x1, x2) + K_cat_1(c1, c2) + K_cont_2(x1, x2) * K_cat_2(c1, c2)

where xi and ci are the continuous and categorical features of the input, respectively. The suffix _i indicates that we fit different lengthscales for the kernels in the sum and product terms.

Since this model does not provide gradients for the categorical features, optimization of the acquisition function will need to be performed in a mixed fashion, i.e., treating the categorical features properly as discrete optimization variables. We recommend using optimize_acqf_mixed.

Example

>>> train_X = torch.cat(
        [torch.rand(20, 2), torch.randint(3, (20, 1))], dim=-1)
    )
>>> train_Y = (
        torch.sin(train_X[..., :-1]).sum(dim=1, keepdim=True)
        + train_X[..., -1:]
    )
>>> model = MixedSingleTaskGP(train_X, train_Y, cat_dims=[-1])

A single-task exact GP model supporting categorical parameters.

Parameters:
  • train_X (Tensor) – A batch_shape x n x d tensor of training features.

  • train_Y (Tensor) – A batch_shape x n x m tensor of training observations.

  • cat_dims (list[int]) – A list of indices corresponding to the columns of the input X that should be considered categorical features.

  • train_Yvar (Tensor | None) – An optional batch_shape x n x m tensor of observed measurement noise.

  • cont_kernel_factory (None | Callable[[torch.Size, int, list[int]], Kernel]) – A method that accepts batch_shape, ard_num_dims, and active_dims arguments and returns an instantiated GPyTorch Kernel object to be used as the base kernel for the continuous dimensions. If omitted, this model uses an RBFKernel as the kernel for the ordinal parameters.

  • likelihood (Likelihood | None) – A likelihood. If omitted, use a standard GaussianLikelihood with inferred noise level.

  • outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the Posterior obtained by calling .posterior on the model will be on the original scale). We use a Standardize transform if no outcome_transform is specified. Pass down None to use no outcome transform.

  • input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass. Only input transforms are allowed which do not transform the categorical dimensions. If you want to use it for example in combination with a OneHotToNumeric input transform one has to instantiate the transform with transform_on_train == False and pass in the already transformed input.

classmethod construct_inputs(training_data, categorical_features, likelihood=None)[source]

Construct Model keyword arguments from a dict of SupervisedDataset.

Parameters:
  • training_data (SupervisedDataset) – A SupervisedDataset containing the training data.

  • categorical_features (list[int]) – Column indices of categorical features.

  • likelihood (Likelihood | None) – Optional likelihood used to constuct the model.

Return type:

dict[str, Any]

Model List GP Regression Models

Model List GP Regression models.

class botorch.models.model_list_gp_regression.ModelListGP(*gp_models)[source]

Bases: IndependentModelList, ModelListGPyTorchModel, FantasizeMixin

A multi-output GP model with independent GPs for the outputs.

This model supports different-shaped training inputs for each of its sub-models. It can be used with any number of single-output GPyTorchModels and the models can be of different types. Use this model when you have independent outputs with different training data. When modeling correlations between outputs, use MultiTaskGP.

Internally, this model is just a list of individual models, but it implements the same input/output interface as all other BoTorch models. This makes it very flexible and convenient to work with. The sequential evaluation comes at a performance cost though - if you are using a block design (i.e. the same number of training example for each output, and a similar model structure, you should consider using a batched GP model instead, such as SingleTaskGP with batched inputs).

Parameters:

*gp_models (GPyTorchModel) – A number of single-output GPyTorchModels. If models have input/output transforms, these are honored individually for each model.

Example

>>> model1 = SingleTaskGP(train_X1, train_Y1)
>>> model2 = SingleTaskGP(train_X2, train_Y2)
>>> model = ModelListGP(model1, model2)
condition_on_observations(X, Y, **kwargs)[source]

Condition the model on new observations.

Parameters:
  • X (list[Tensor]) – A m-list of batch_shape x n’ x d-dim Tensors, where d is the dimension of the feature space, n’ is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).

  • Y (Tensor) – A batch_shape’ x n’ x m-dim Tensor, where m is the number of model outputs, n’ is the number of points per batch, and batch_shape’ is the batch shape of the observations. batch_shape’ must be broadcastable to batch_shape using standard broadcasting semantics. If Y has fewer batch dimensions than X, its is assumed that the missing batch dimensions are the same for all Y.

  • kwargs (Any) – Keyword arguments passed to IndependentModelList.get_fantasy_model.

Returns:

A ModelListGP representing the original model conditioned on the new observations (X, Y) (and possibly noise observations passed in via kwargs). Here the i-th model has n_i + n’ training examples, where the n’ training examples have been added and all test-time caches have been updated.

Return type:

ModelListGP

Multitask GP Models

Multi-Task GP models.

References

[Bonilla2007MTGP]

E. Bonilla, K. Chai and C. Williams. Multi-task Gaussian Process Prediction. Advances in Neural Information Processing Systems 20, NeurIPS 2007.

[Swersky2013MTBO]

K. Swersky, J. Snoek and R. Adams. Multi-Task Bayesian Optimization. Advances in Neural Information Processing Systems 26, NeurIPS 2013.

[Doucet2010sampl]

A. Doucet. A Note on Efficient Conditional Simulation of Gaussian Distributions. http://www.stats.ox.ac.uk/~doucet/doucet_simulationconditionalgaussian.pdf, Apr 2010.

[Maddox2021bohdo] (1,2)

W. Maddox, M. Balandat, A. Wilson, and E. Bakshy. Bayesian Optimization with High-Dimensional Outputs. https://arxiv.org/abs/2106.12997, Jun 2021.

botorch.models.multitask.get_task_value_remapping(task_values, dtype)[source]

Construct an mapping of discrete task values to contiguous int-valued floats.

Parameters:
  • task_values (Tensor) – A sorted long-valued tensor of task values.

  • dtype (dtype) – The dtype of the model inputs (e.g. X), which the new task values should have mapped to (e.g. float, double).

Returns:

A tensor of shape task_values.max() + 1 that maps task values to new task values. The indexing operation mapper[task_value] will produce a tensor of new task values, of the same shape as the original. The elements of the mapper tensor that do not appear in the original task_values are mapped to nan. The return value will be None, when the task values are contiguous integers starting from zero.

Return type:

Tensor | None

class botorch.models.multitask.MultiTaskGP(train_X, train_Y, task_feature, train_Yvar=None, mean_module=None, covar_module=None, likelihood=None, task_covar_prior=None, output_tasks=None, rank=None, all_tasks=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None)[source]

Bases: ExactGP, MultiTaskGPyTorchModel, FantasizeMixin

Multi-Task exact GP model using an ICM (intrinsic co-regionalization model) kernel. See [Bonilla2007MTGP] and [Swersky2013MTBO] for a reference on the model and its use in Bayesian optimization.

The model can be single-output or multi-output, determined by the output_tasks. This model uses relatively strong priors on the base Kernel hyperparameters, which work best when covariates are normalized to the unit cube and outcomes are standardized (zero mean, unit variance) - this standardization should be applied in a stratified fashion at the level of the tasks, rather than across all data points.

If the train_Yvar is None, this model infers the noise level. If you have known observation noise, you can set train_Yvar to a tensor containing the noise variance measurements. WARNING: This currently does not support different noise levels for the different tasks.

Multi-Task GP model using an ICM kernel.

Parameters:
  • train_X (Tensor) – A n x (d + 1) or b x n x (d + 1) (batch mode) tensor of training data. One of the columns should contain the task features (see task_feature argument).

  • train_Y (Tensor) – A n x 1 or b x n x 1 (batch mode) tensor of training observations.

  • task_feature (int) – The index of the task feature (-d <= task_feature <= d).

  • train_Yvar (Tensor | None) – An optional n or b x n (batch mode) tensor of observed measurement noise. If None, we infer the noise. Note that the inferred noise is common across all tasks.

  • mean_module (Module | None) – The mean function to be used. Defaults to ConstantMean.

  • covar_module (Module | None) – The module for computing the covariance matrix between the non-task features. Defaults to RBFKernel.

  • likelihood (Likelihood | None) – A likelihood. The default is selected based on train_Yvar. If train_Yvar is None, a standard GaussianLikelihood with inferred noise level is used. Otherwise, a FixedNoiseGaussianLikelihood is used.

  • output_tasks (list[int] | None) – A list of task indices for which to compute model outputs for. If omitted, return outputs for all task indices.

  • rank (int | None) – The rank to be used for the index kernel. If omitted, use a full rank (i.e. number of tasks) kernel.

  • task_covar_prior (Prior | None) – A Prior on the task covariance matrix. Must operate on p.s.d. matrices. A common prior for this is the LKJ prior.

  • all_tasks (list[int] | None) – By default, multi-task GPs infer the list of all tasks from the task features in train_X. This is an experimental feature that enables creation of multi-task GPs with tasks that don’t appear in the training data. Note that when a task is not observed, the corresponding task covariance will heavily depend on random initialization and may behave unexpectedly.

  • outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the Posterior obtained by calling .posterior on the model will be on the original scale). We use a Standardize transform if no outcome_transform is specified. Pass down None to use no outcome transform. NOTE: Standardization should be applied in a stratified fashion, separately for each task.

  • input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.

Example

>>> X1, X2 = torch.rand(10, 2), torch.rand(20, 2)
>>> i1, i2 = torch.zeros(10, 1), torch.ones(20, 1)
>>> train_X = torch.cat([
>>>     torch.cat([X1, i1], -1), torch.cat([X2, i2], -1),
>>> ])
>>> train_Y = torch.cat([f1(X1), f2(X2)]).unsqueeze(-1)
>>> model = MultiTaskGP(train_X, train_Y, task_feature=-1)
forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Parameters:

x (Tensor)

Return type:

MultivariateNormal

classmethod get_all_tasks(train_X, task_feature, output_tasks=None)[source]
Parameters:
  • train_X (Tensor)

  • task_feature (int)

  • output_tasks (list[int] | None)

Return type:

tuple[list[int], int, int]

classmethod construct_inputs(training_data, task_feature, output_tasks=None, task_covar_prior=None, prior_config=None, rank=None)[source]

Construct Model keyword arguments from a dataset and other args.

Parameters:
  • training_data (SupervisedDataset | MultiTaskDataset) – A SupervisedDataset or a MultiTaskDataset.

  • task_feature (int) – Column index of embedded task indicator features.

  • output_tasks (list[int] | None) – A list of task indices for which to compute model outputs for. If omitted, return outputs for all task indices.

  • task_covar_prior (Prior | None) – A GPyTorch Prior object to use as prior on the cross-task covariance matrix,

  • prior_config (dict | None) – Configuration for inter-task covariance prior. Should only be used if task_covar_prior is not passed directly. Must contain use_LKJ_prior indicator and should contain float value eta.

  • rank (int | None) – The rank of the cross-task covariance matrix.

Return type:

dict[str, Any]

class botorch.models.multitask.KroneckerMultiTaskGP(train_X, train_Y, likelihood=None, data_covar_module=None, task_covar_prior=None, rank=None, input_transform=None, outcome_transform=None, **kwargs)[source]

Bases: ExactGP, GPyTorchModel, FantasizeMixin

Multi-task GP with Kronecker structure, using an ICM kernel.

This model assumes the “block design” case, i.e., it requires that all tasks are observed at all data points.

For posterior sampling, this model uses Matheron’s rule [Doucet2010sampl] to compute the posterior over all tasks as in [Maddox2021bohdo] by exploiting Kronecker structure.

When a multi-fidelity model has Kronecker structure, this means there is one covariance kernel over the fidelity features (call it K_f) and another over the rest of the input parameters (call it K_i), and the resulting covariance across inputs and fidelities is given by the Kronecker product of the two covariance matrices. This is equivalent to saying the covariance between two input and feature pairs is given by

K((parameter_1, fidelity_1), (parameter_2, fidelity_2))

= K_f(fidelity_1, fidelity_2) * K_i(parameter_1, parameter_2).

Then the covariance matrix of n_i parameters and n_f fidelities can be codified as a Kronecker product of an n_i x n_i matrix and an n_f x n_f matrix, which is far more parsimonious than specifying the whole (n_i * n_f) x (n_i * n_f) covariance matrix.

Example

>>> train_X = torch.rand(10, 2)
>>> train_Y = torch.cat([f_1(X), f_2(X)], dim=-1)
>>> model = KroneckerMultiTaskGP(train_X, train_Y)
Parameters:
  • train_X (Tensor) – A batch_shape x n x d tensor of training features.

  • train_Y (Tensor) – A batch_shape x n x m tensor of training observations.

  • likelihood (MultitaskGaussianLikelihood | None) – A MultitaskGaussianLikelihood. If omitted, uses a MultitaskGaussianLikelihood with a GammaPrior(1.1, 0.05) noise prior.

  • data_covar_module (Module | None) – The module computing the covariance (Kernel) matrix in data space. If omitted, uses an RBFKernel.

  • task_covar_prior (Prior | None) – A Prior on the task covariance matrix. Must operate on p.s.d. matrices. A common prior for this is the LKJ prior. If omitted, uses LKJCovariancePrior with eta parameter as specified in the keyword arguments (if not specified, use eta=1.5).

  • rank (int | None) – The rank of the ICM kernel. If omitted, use a full rank kernel.

  • kwargs (Any) – Additional arguments to override default settings of priors, including: - eta: The eta parameter on the default LKJ task_covar_prior. A value of 1.0 is uninformative, values <1.0 favor stronger correlations (in magnitude), correlations vanish as eta -> inf. - sd_prior: A scalar prior over nonnegative numbers, which is used for the default LKJCovariancePrior task_covar_prior. - likelihood_rank: The rank of the task covariance matrix to fit. Defaults to 0 (which corresponds to a diagonal covariance matrix).

  • input_transform (InputTransform | None)

  • outcome_transform (OutcomeTransform | None)

forward(X)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Parameters:

X (Tensor)

Return type:

MultitaskMultivariateNormal

property train_full_covar
property predictive_mean_cache
posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]

Computes the posterior over model outputs at the provided points.

Parameters:
  • X (Tensor) – A (batch_shape) x q x d-dim Tensor, where d is the dimension of the feature space and q is the number of points considered jointly.

  • observation_noise (bool | Tensor) – If True, add the observation noise from the likelihood to the posterior. If a Tensor, use it directly as the observation noise (must be of shape (batch_shape) x q). It is assumed to be in the outcome-transformed space if an outcome transform is used.

  • posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.

  • output_indices (list[int] | None)

Returns:

A GPyTorchPosterior object, representing a batch of b joint distributions over q points. Includes observation noise if specified.

Return type:

MultitaskGPPosterior

train(val=True, *args, **kwargs)[source]

Put the model in train mode. Reverts to the original inputs if in train mode (mode=True) or sets transformed inputs if in eval mode (mode=False).

Parameters:

mode – A boolean denoting whether to put in train or eval mode. If False, model is put in eval mode.

Higher Order GP Models

References

[Zhe2019hogp]

S. Zhe, W. Xing, and R. M. Kirby. Scalable high-order gaussian process regression. Proceedings of Machine Learning Research, volume 89, Apr 2019.

class botorch.models.higher_order_gp.FlattenedStandardize(output_shape, batch_shape=None, min_stdv=1e-08)[source]

Bases: Standardize

Standardize outcomes in a structured multi-output settings by reshaping the batched output dimensions to be a vector. Specifically, an output dimension of [a x b x c] will be squeezed to be a vector of [a * b * c].

Parameters:
  • output_shape (torch.Size) – A n x output_shape-dim tensor of training targets.

  • batch_shape (torch.Size | None) – The batch_shape of the training targets.

  • min_stddv – The minimum standard deviation for which to perform standardization (if lower, only de-mean the data).

  • min_stdv (float)

forward(Y, Yvar=None)[source]

Standardize outcomes.

If the module is in train mode, this updates the module state (i.e. the mean/std normalizing constants). If the module is in eval mode, simply applies the normalization using the module state.

Parameters:
  • Y (Tensor) – A batch_shape x n x m-dim tensor of training targets.

  • Yvar (Tensor | None) – A batch_shape x n x m-dim tensor of observation noises associated with the training targets (if applicable).

Returns:

  • The transformed outcome observations.

  • The transformed observation noise (if applicable).

Return type:

A two-tuple with the transformed outcomes

untransform(Y, Yvar=None)[source]

Un-standardize outcomes.

Parameters:
  • Y (Tensor) – A batch_shape x n x m-dim tensor of standardized targets.

  • Yvar (Tensor | None) – A batch_shape x n x m-dim tensor of standardized observation noises associated with the targets (if applicable).

Returns:

  • The un-standardized outcome observations.

  • The un-standardized observation noise (if applicable).

Return type:

A two-tuple with the un-standardized outcomes

untransform_posterior(posterior)[source]

Un-standardize the posterior.

Parameters:

posterior (HigherOrderGPPosterior) – A posterior in the standardized space.

Returns:

The un-standardized posterior. If the input posterior is a GPyTorchPosterior, return a GPyTorchPosterior. Otherwise, return a TransformedPosterior.

Return type:

TransformedPosterior

class botorch.models.higher_order_gp.HigherOrderGP(train_X, train_Y, likelihood=None, covar_modules=None, num_latent_dims=None, learn_latent_pars=True, latent_init='default', outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None)[source]

Bases: BatchedMultiOutputGPyTorchModel, ExactGP, FantasizeMixin

A model for high-dimensional output regression.

As described in [Zhe2019hogp]. “Higher-order” means that the predictions are matrices (tensors) with at least two dimensions, such as images or grids of images, or measurements taken from a region of at least two dimensions. The posterior uses Matheron’s rule [Doucet2010sampl] as described in [Maddox2021bohdo].

HigherOrderGP differs from a “vector” multi-output model in that it uses Kronecker algebra to obtain parsimonious covariance matrices for these outputs (see KroneckerMultiTaskGP for more information). For example, imagine a 10 x 20 x 30 grid of images. If we were to vectorize the resulting 6,000 data points in order to use them in a non-higher-order GP, they would have a 6,000 x 6,000 covariance matrix, with 36 million entries. The Kronecker structure allows representing this as a product of 10x10, 20x20, and 30x30 covariance matrices, with only 1,400 entries.

NOTE: This model requires the use of specialized Kronecker solves in linear operator, which are disabled by default in BoTorch. These are enabled by default in the HigherOrderGP.posterior call. However, they need to be manually enabled by the user during model fitting.

Example

>>> from linear_operator.settings import _fast_solves
>>> model = SingleTaskGP(train_X, train_Y)
>>> mll = ExactMarginalLogLikelihood(model.likelihood, model)
>>> with _fast_solves(True):
>>>     fit_gpytorch_mll_torch(mll)
>>> samples = model.posterior(test_X).rsample()
Parameters:
  • train_X (Tensor) – A batch_shape x n x d-dim tensor of training inputs.

  • train_Y (Tensor) – A batch_shape x n x output_shape-dim tensor of training targets.

  • likelihood (Likelihood | None) – Gaussian likelihood for the model.

  • covar_modules (list[Kernel] | None) – List of kernels for each output structure.

  • num_latent_dims (list[int] | None) – Sizes for the latent dimensions.

  • learn_latent_pars (bool) – If true, learn the latent parameters.

  • latent_init (str) – [default or gp] how to initialize the latent parameters.

  • outcome_transform (OutcomeTransform | _DefaultType | None)

  • input_transform (InputTransform | None)

forward(X)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Parameters:

X (Tensor)

Return type:

MultivariateNormal

get_fantasy_model(inputs, targets, **kwargs)[source]

Returns a new GP model that incorporates the specified inputs and targets as new training data.

Using this method is more efficient than updating with set_train_data when the number of inputs is relatively small, because any computed test-time caches will be updated in linear time rather than computed from scratch.

Note

If targets is a batch (e.g. b x m), then the GP returned from this method will be a batch mode GP. If inputs is of the same (or lesser) dimension as targets, then it is assumed that the fantasy points are the same for each target batch.

Parameters:
  • inputs (torch.Tensor) – (b1 x … x bk x m x d or f x b1 x … x bk x m x d) Locations of fantasy observations.

  • targets (torch.Tensor) – (b1 x … x bk x m or f x b1 x … x bk x m) Labels of fantasy observations.

Returns:

An ExactGP model with n + m training examples, where the m fantasy examples have been added and all test-time caches have been updated.

Return type:

ExactGP

condition_on_observations(X, Y, noise=None, **kwargs)[source]

Condition the model on new observations.

Parameters:
  • X (Tensor) – A batch_shape x n’ x d-dim Tensor, where d is the dimension of the feature space, m is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).

  • Y (Tensor) – A batch_shape’ x n’ x m_d-dim Tensor, where m_d is the shaping of the model outputs, n’ is the number of points per batch, and batch_shape’ is the batch shape of the observations. batch_shape’ must be broadcastable to batch_shape using standard broadcasting semantics. If Y has fewer batch dimensions than X, its is assumed that the missing batch dimensions are the same for all Y.

  • noise (Tensor | None) – If not None, a tensor of the same shape as Y representing the noise variance associated with each observation.

  • kwargs (Any) – Passed to condition_on_observations.

Returns:

A BatchedMultiOutputGPyTorchModel object of the same type with n + n’ training examples, representing the original model conditioned on the new observations (X, Y) (and possibly noise observations passed in via kwargs).

Return type:

HigherOrderGP

posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]

Computes the posterior over model outputs at the provided points.

Parameters:
  • X (Tensor) – A (batch_shape) x q x d-dim Tensor, where d is the dimension of the feature space and q is the number of points considered jointly.

  • output_indices (list[int] | None) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.

  • observation_noise (bool | Tensor) – If True, add the observation noise from the likelihood to the posterior. If a Tensor, use it directly as the observation noise (must be of shape (batch_shape) x q x m).

  • posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.

Returns:

A GPyTorchPosterior object, representing batch_shape joint distributions over q points and the outputs selected by output_indices each. Includes observation noise if specified.

Return type:

GPyTorchPosterior

make_posterior_variances(joint_covariance_matrix)[source]

Computes the posterior variances given the data points X. As currently implemented, it computes another forwards call with the stacked data to get out the joint covariance across all data points.

Parameters:

joint_covariance_matrix (LinearOperator)

Return type:

Tensor

Latent Kronecker GP Models

References

[lin2024scaling]

J. A. Lin, S. Ament, M. Balandat, E. Bakshy. Scaling Gaussian Processes for Learning Curve Prediction via Latent Kronecker Structure. NeurIPS 2024 Bayesian Decision-making and Uncertainty Workshop.

[lin2023sampling]

J. A. Lin, J. Antorán, s. Padhy, D. Janz, J. M. Hernández-Lobato, A. Terenin. Sampling from Gaussian Process Posterior using Stochastic Gradient Descent. Advances in Neural Information Processing Systems 2023.

class botorch.models.latent_kronecker_gp.MinMaxStandardize(m=1, use_min=False, outputs=None, batch_shape=(), min_stdv=1e-08)[source]

Bases: Standardize

Standardize outcomes (zero mean, unit variance), centered about the minimum (or maximum) instead of the mean. Otherwise equivalent to ‘Standardize’.

Standardize outcomes (zero mean, unit variance).

Parameters:
  • m (int) – The output dimension.

  • use_min (bool) – Whether to use the minimum or maximum (instead of the mean).

  • outputs (list[int] | None) – Which of the outputs to standardize. If omitted, all outputs will be standardized.

  • batch_shape (Size) – The batch_shape of the training targets.

  • min_stddv – The minimum standard deviation for which to perform standardization (if lower, only de-mean the data).

  • min_stdv (float)

forward(Y, Yvar=None)[source]

Standardize outcomes.

If the module is in train mode, this updates the module state (i.e. the mean/std normalizing constants). If the module is in eval mode, simply applies the normalization using the module state.

Parameters:
  • Y (Tensor) – A batch_shape x n x m-dim tensor of training targets.

  • Yvar (Tensor | None) – A batch_shape x n x m-dim tensor of observation noises associated with the training targets (if applicable).

Returns:

  • The transformed outcome observations.

  • The transformed observation noise (if applicable).

Return type:

A two-tuple with the transformed outcomes

class botorch.models.latent_kronecker_gp.LatentKroneckerGP(train_X, train_Y, train_Y_valid=None, T=None, likelihood=None, mean_module_X=None, mean_module_T=None, covar_module_X=None, covar_module_T=None, input_transform=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>)[source]

Bases: GPyTorchModel, ExactGP, FantasizeMixin

A multi-task GP model which uses Kronecker structure despite missing entries.

Leverages pathwise conditioning and iterative linear system solvers to efficiently draw samples from the GP posterior. See [lin2024scaling] for details.

For more information about pathwise conditioning, see [wilson2021pathwise] and [Maddox2021bohdo]. Details about iterative linear system solvers for GPs with pathwise conditioning can be found in [lin2023sampling].

NOTE: This model requires iterative methods for efficient posterior inference. To enable iterative methods, the use_iterative_methods helper function can be used as a context manager.

Example

>>> model = LatentKroneckerGP(train_X, train_Y)
>>> mll = ExactMarginalLogLikelihood(model.likelihood, model)
>>> with model.use_iterative_methods():
>>>     fit_gpytorch_mll(mll)
>>>     samples = model.posterior(test_X).rsample()
Parameters:
  • train_X (Tensor) – A batch_shape x n x d tensor of training features.

  • train_Y (Tensor) – A batch_shape x n x t tensor of training observations.

  • train_Y_valid (Tensor | None) – A n x t boolean tensor of valid values. True indicates that the corresponding value is valid. False indicates that the corresponding value is missing. Does not allow explicit batch_shape because the mask must be shared across batch dimensions.

  • T (Tensor | None) – A batch_shape x t tensor of training time steps. If omitted, use [1, …, t].

  • likelihood (Likelihood | None) – A likelihood. If omitted, use a standard GaussianLikelihood with inferred noise level.

  • mean_module_X (Mean | None) – The mean function to be used for X. If omitted, use a ConstantMean.

  • mean_module_T (Mean | None) – The mean function to be used for T. If omitted, use a ConstantMean.

  • covar_module_X (Module | None) – The module computing the covariance matrix of X. If omitted, use a MaternKernel.

  • covar_module_T (Module | None) – The module computing the covariance matrix of T. If omitted, use a MaternKernel.

  • input_transform (InputTransform | None) – An input transform that is applied to X.

  • outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to Y.

use_iterative_methods(tol=0.01, max_iter=10000, covar_root_decomposition=False, log_prob=True, solves=True)[source]
Parameters:
  • tol (float)

  • max_iter (int)

  • covar_root_decomposition (bool)

  • log_prob (bool)

  • solves (bool)

forward(X)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Parameters:

X (Tensor)

Return type:

MultivariateNormal

posterior(X, observation_noise=False, posterior_transform=None, **kwargs)[source]

Computes the posterior over model outputs at the provided points.

Parameters:
  • X (Tensor) – A (batch_shape) x q x d-dim Tensor, where d is the dimension of the feature space and q is the number of points considered jointly.

  • observation_noise (bool | Tensor) – If True, add the observation noise from the likelihood to the posterior. If a Tensor, use it directly as the observation noise (must be of shape (batch_shape) x q). It is assumed to be in the outcome-transformed space if an outcome transform is used.

  • posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.

  • kwargs (Any)

Returns:

A GPyTorchPosterior object, representing a batch of b joint distributions over q points. Includes observation noise if specified.

Return type:

GPyTorchPosterior

condition_on_observations(X, Y, noise=None, **kwargs)[source]

Condition the model on new observations.

Parameters:
  • X (Tensor) – A batch_shape x n’ x d-dim Tensor, where d is the dimension of the feature space, n’ is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).

  • Y (Tensor) – A batch_shape’ x n x m-dim Tensor, where m is the number of model outputs, n’ is the number of points per batch, and batch_shape’ is the batch shape of the observations. batch_shape’ must be broadcastable to batch_shape using standard broadcasting semantics. If Y has fewer batch dimensions than X, its is assumed that the missing batch dimensions are the same for all Y.

  • noise (Tensor | None) – If not None, a tensor of the same shape as Y representing the associated noise variance.

  • kwargs (Any) – Passed to self.get_fantasy_model.

Returns:

A Model object of the same type, representing the original model conditioned on the new observations (X, Y) (and possibly noise observations passed in via kwargs).

Return type:

Model

Example

>>> train_X = torch.rand(20, 2)
>>> train_Y = torch.sin(train_X[:, 0]) + torch.cos(train_X[:, 1])
>>> model = SingleTaskGP(train_X, train_Y)
>>> new_X = torch.rand(5, 2)
>>> new_Y = torch.sin(new_X[:, 0]) + torch.cos(new_X[:, 1])
>>> model = model.condition_on_observations(X=new_X, Y=new_Y)

Pairwise GP Models

Preference Learning with Gaussian Process

[Chu2005preference] (1,2,3)

Wei Chu, and Zoubin Ghahramani. Preference learning with Gaussian processes. Proceedings of the 22nd international conference on Machine learning. 2005.

[Brochu2010tutorial]

Eric Brochu, Vlad M. Cora, and Nando De Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010).

class botorch.models.pairwise_gp.PairwiseGP(datapoints, comparisons, likelihood=None, covar_module=None, input_transform=None, *, jitter=1e-06, xtol=None, consolidate_rtol=0.0, consolidate_atol=0.0001, maxfev=None)[source]

Bases: Model, GP, FantasizeMixin

Probit GP for preference learning with Laplace approximation

A probit-likelihood GP that learns via pairwise comparison data, using a Laplace approximation of the posterior of the estimated utility values. By default it uses a scaled RBF kernel.

Implementation is based on [Chu2005preference]. Also see [Brochu2010tutorial] for additional reference.

Note that in [Chu2005preference] the likelihood of a pairwise comparison is \(\left(\frac{f(x_1) - f(x_2)}{\sqrt{2}\sigma}\right)\), i.e. a scale is used in the denominator. To maintain consistency with usage of kernels elsewhere in BoTorch, we instead do not include \(\sigma\) in the code (implicitly setting it to 1) and use ScaleKernel to scale the function.

In the example below, the user/decision maker has stated that they prefer the first item over the second item and the third item over the second item, generating comparisons [0, 1] and [2, 1]. .. rubric:: Example

>>> from botorch.models import PairwiseGP
>>> import torch
>>> datapoints = torch.Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> comparisons = torch.Tensor([[0, 1], [2, 1]])
>>> model = PairwiseGP(datapoints, comparisons)
Parameters:
  • datapoints (Tensor | None) – Either None or a batch_shape x n x d tensor of training features. If either datapoints or comparisons is None, construct a prior-only model.

  • comparisons (Tensor | None) – Either None or a batch_shape x m x 2 tensor of training comparisons; comparisons[i] is a noisy indicator suggesting the utility value of comparisons[i, 0]-th is greater than comparisons[i, 1]-th. If either comparisons or datapoints is None, construct a prior-only model.

  • likelihood (PairwiseLikelihood | None) – A PairwiseLikelihood.

  • covar_module (ScaleKernel | None) – Covariance module.

  • input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.

  • jitter (float) – Value added to diagonal for numerical stability in psd_safe_cholesky.

  • xtol (float | None) – Stopping creteria in scipy.optimize.fsolve used to find f_map in PairwiseGP._update. If None, default behavior is handled by PairwiseGP._update.

  • consolidate_rtol (float) – rtol passed to consolidate_duplicates.

  • consolidate_atol (float) – atol passed to consolidate_duplicates.

  • maxfev (int | None) – The maximum number of calls to the function in scipy.optimize.fsolve. If None, default behavior is handled by PairwiseGP._update.

property datapoints: Tensor

Alias for consolidated datapoints

property comparisons: Tensor

Alias for consolidated comparisons

property unconsolidated_utility: Tensor

Utility of the unconsolidated datapoints

property num_outputs: int

The number of outputs of the model.

property batch_shape: Size

The batch shape of the model.

This is a batch shape from an I/O perspective, independent of the internal representation of the model (as e.g. in BatchedMultiOutputGPyTorchModel). For a model with m outputs, a test_batch_shape x q x d-shaped input X to the posterior method returns a Posterior object over an output of shape broadcast(test_batch_shape, model.batch_shape) x q x m.

classmethod construct_inputs(training_data)[source]

Construct Model keyword arguments from a RankingDataset.

Parameters:

training_data (SupervisedDataset) – A RankingDataset, with attributes train_X, train_Y, and, optionally, train_Yvar.

Returns:

A dict of keyword arguments that can be used to initialize a PairwiseGP, including datapoints and comparisons.

Return type:

dict[str, Tensor]

set_train_data(datapoints=None, comparisons=None, strict=False, update_model=True)[source]

Set datapoints and comparisons and update model properties if needed

Parameters:
  • datapoints (Tensor | None) – Either None or a batch_shape x n x d dimension tensor X. If there are input transformations, assume the datapoints are not transformed. If either datapoints or comparisons is None, construct a prior-only model.

  • comparisons (Tensor | None) – Either None or a tensor of size batch_shape x m x 2. (i, j) means f_i is preferred over f_j. If either comparisons or datapoints is None, construct a prior-only model.

  • strict (bool) – strict argument as in gpytorch.models.exact_gp for compatibility when using fit_gpytorch_mll with input_transform.

  • update_model (bool) – True if we want to refit the model (see _update) after re-setting the data.

Return type:

None

load_state_dict(state_dict, strict=False)[source]

Removes data related buffers from the state_dict and calls super().load_state_dict with strict=False.

Parameters:
  • state_dict (dict[str, Tensor]) – The state dict.

  • strict (bool) – Boolean specifying whether or not given and instance-bound state_dicts should have identical keys. Only implemented for strict=False since buffers will filters out when calling _load_from_state_dict.

Returns:

A named tuple _IncompatibleKeys, containing the missing_keys and unexpected_keys.

Return type:

_IncompatibleKeys

forward(datapoints)[source]

Calculate a posterior or prior prediction.

During training mode, forward implemented solely for gradient-based hyperparam opt. Essentially what it does is to re-calculate the utility f using its analytical form at f_map so that we are able to obtain gradients of the hyperparameters.

Parameters:

datapoints (Tensor) – A batch_shape x n x d Tensor, should be the same as self.datapoints during training

Returns:

  1. Posterior centered at MAP points for training data (training mode)

  2. Prior predictions (prior mode)

  3. Predictive posterior (eval mode)

Return type:

A MultivariateNormal object, being one of the followings

posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]

Computes the posterior over model outputs at the provided points.

Parameters:
  • X (Tensor) – A batch_shape x q x d-dim Tensor, where d is the dimension of the feature space and q is the number of points considered jointly.

  • output_indices (list[int] | None) – As defined in parent Model class, not used for this model.

  • observation_noise (bool) – Ignored (since noise is not identifiable from scale in probit models).

  • posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.

Returns:

A Posterior object, representing joint

distributions over q points.

Return type:

Posterior

condition_on_observations(X, Y)[source]

Condition the model on new observations.

Note that unlike other BoTorch models, PairwiseGP requires Y to be pairwise comparisons.

Parameters:
  • X (Tensor) – A batch_shape x n x d dimension tensor X

  • Y (Tensor) – A tensor of size batch_shape x m x 2. (i, j) means f_i is preferred over f_j

  • kwargs – Not used.

Returns:

A (deepcopied) Model object of the same type, representing the original model conditioned on the new observations (X, Y).

Return type:

Model

class botorch.models.pairwise_gp.PairwiseLaplaceMarginalLogLikelihood(likelihood, model)[source]

Bases: MarginalLogLikelihood

Laplace-approximated marginal log likelihood/evidence for PairwiseGP

See (12) from [Chu2005preference].

Parameters:
  • likelihood – Used as in args to GPyTorch MarginalLogLikelihood

  • model (GP) – Used as in args to GPyTorch MarginalLogLikelihood

forward(post, comp)[source]

Calculate approximated log evidence, i.e., log(P(D|theta))

Note that post will be based on the consolidated/deduped datapoints for numerical stability, but comp will still be the unconsolidated comparisons so that it’s still compatible with fit_gpytorch_*.

Parameters:
  • post (Posterior) – training posterior distribution from self.model (after consolidation)

  • comp (Tensor) – Comparisons pairs (before consolidation)

Returns:

The approximated evidence, i.e., the marginal log likelihood

Return type:

Tensor

Contextual GP Models with Aggregate Rewards

class botorch.models.contextual.SACGP(train_X, train_Y, train_Yvar, decomposition)[source]

Bases: SingleTaskGP

A GP using a Structural Additive Contextual(SAC) kernel.

Parameters:
  • train_X (Tensor) – (n x d) X training data.

  • train_Y (Tensor) – (n x 1) Y training data.

  • train_Yvar (Tensor | None) – (n x 1) Noise variances of each training Y. If None, we use an inferred noise likelihood.

  • decomposition (dict[str, list[int]]) – Keys are context names. Values are the indexes of parameters belong to the context. The parameter indexes are in the same order across contexts.

classmethod construct_inputs(training_data, decomposition)[source]

Construct Model keyword arguments from a dict of SupervisedDataset.

Parameters:
  • training_data (SupervisedDataset) – A SupervisedDataset containing the training data.

  • decomposition (dict[str, list[int]]) – Dictionary of context names and their indexes of the corresponding active context parameters.

Return type:

dict[str, Any]

class botorch.models.contextual.LCEAGP(train_X, train_Y, train_Yvar, decomposition, train_embedding=True, cat_feature_dict=None, embs_feature_dict=None, embs_dim_list=None, context_weight_dict=None)[source]

Bases: SingleTaskGP

A GP using a Latent Context Embedding Additive (LCE-A) Kernel.

Note that the model does not support batch training. Input training data sets should have dim = 2.

Parameters:
  • train_X (Tensor) – (n x d) X training data.

  • train_Y (Tensor) – (n x 1) Y training data.

  • train_Yvar (Tensor | None) – (n x 1) Noise variance of Y. If None, we use an inferred noise likelihood.

  • decomposition (dict[str, list[int]]) – Keys are context names. Values are the indexes of parameters belong to the context.

  • train_embedding (bool) – Whether to train the embedding layer or not. If False, the model will use pre-trained embeddings in embs_feature_dict.

  • cat_feature_dict (dict | None) – Keys are context names and values are list of categorical features i.e. {“context_name” : [cat_0, …, cat_k]}, where k is the number of categorical variables. If None, we use context names in the decomposition as the only categorical feature, i.e., k = 1.

  • embs_feature_dict (dict | None) – Pre-trained continuous embedding features of each context.

  • embs_dim_list (list[int] | None) – Embedding dimension for each categorical variable. The length equals the number of categorical features k. If None, the embedding dimension is set to 1 for each categorical variable.

  • context_weight_dict (dict | None) – Known population weights of each context.

classmethod construct_inputs(training_data, decomposition, train_embedding=True, cat_feature_dict=None, embs_feature_dict=None, embs_dim_list=None, context_weight_dict=None)[source]

Construct Model keyword arguments from a dict of SupervisedDataset.

Parameters:
  • training_data (SupervisedDataset) – A SupervisedDataset containing the training data.

  • decomposition (dict[str, list[str]]) – Dictionary of context names and the names of the corresponding active context parameters.

  • train_embedding (bool) – Whether to train the embedding layer or not.

  • cat_feature_dict (dict | None) – Keys are context names and values are list of categorical features i.e. {“context_name” : [cat_0, …, cat_k]}, where k is the number of categorical variables. If None, we use context names in the decomposition as the only categorical feature, i.e., k = 1.

  • embs_feature_dict (dict | None) – Pre-trained continuous embedding features of each context.

  • embs_dim_list (list[int] | None) – Embedding dimension for each categorical variable. The length equals the number of categorical features k. If None, the embedding dimension is set to 1 for each categorical variable.

  • context_weight_dict (dict | None) – Known population weights of each context.

Return type:

dict[str, Any]

Contextual GP Models with Context Rewards

References

[Feng2020HDCPS]

Q. Feng, B. Latham, H. Mao and E. Backshy. High-Dimensional Contextual Policy Search with Unknown Context Rewards using Bayesian Optimization. Advances in Neural Information Processing Systems 33, NeurIPS 2020.

class botorch.models.contextual_multioutput.LCEMGP(train_X, train_Y, task_feature, train_Yvar=None, mean_module=None, covar_module=None, likelihood=None, context_cat_feature=None, context_emb_feature=None, embs_dim_list=None, output_tasks=None, all_tasks=None, outcome_transform=<class 'botorch.utils.types.DEFAULT'>, input_transform=None)[source]

Bases: MultiTaskGP

The Multi-Task GP with the latent context embedding multioutput (LCE-M) kernel. See [Feng2020HDCPS] for a reference on the model and its use in Bayesian optimization.

Parameters:
  • train_X (Tensor) – (n x d) X training data.

  • train_Y (Tensor) – (n x 1) Y training data.

  • task_feature (int) – Column index of train_X to get context indices.

  • train_Yvar (Tensor | None) – An optional (n x 1) tensor of observed variances of each training Y. If None, we infer the noise. Note that the inferred noise is common across all tasks.

  • mean_module (Module | None) – The mean function to be used. Defaults to ConstantMean.

  • covar_module (Module | None) – The module for computing the covariance matrix between the non-task features. Defaults to RBFKernel.

  • likelihood (Likelihood | None) – A likelihood. The default is selected based on train_Yvar. If train_Yvar is None, a standard GaussianLikelihood with inferred noise level is used. Otherwise, a FixedNoiseGaussianLikelihood is used.

  • context_cat_feature (Tensor | None) – (n_contexts x k) one-hot encoded context features. Rows are ordered by context indices, where k is the number of categorical variables. If None, task indices will be used and k = 1.

  • context_emb_feature (Tensor | None) – (n_contexts x m) pre-given continuous embedding features. Rows are ordered by context indices.

  • embs_dim_list (list[int] | None) – Embedding dimension for each categorical variable. The length equals k. If None, the embedding dimension is set to 1 for each categorical variable.

  • output_tasks (list[int] | None) – A list of task indices for which to compute model outputs for. If omitted, return outputs for all task indices.

  • all_tasks (list[int] | None) – By default, multi-task GPs infer the list of all tasks from the task features in train_X. This is an experimental feature that enables creation of multi-task GPs with tasks that don’t appear in the training data. Note that when a task is not observed, the corresponding task covariance will heavily depend on random initialization and may behave unexpectedly.

  • outcome_transform (OutcomeTransform | _DefaultType | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the Posterior obtained by calling .posterior on the model will be on the original scale). We use a Standardize transform if no outcome_transform is specified. Pass down None to use no outcome transform.

  • input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.

task_covar_module(task_idcs)[source]

Compute the task covariance matrix for a given tensor of task / context indices.

Parameters:

task_idcs (Tensor) – Task index tensor of shape (n x 1) or (b x n x 1).

Returns:

Task covariance matrix of shape (b x n x n).

Return type:

Tensor

classmethod construct_inputs(training_data, task_feature, output_tasks=None, context_cat_feature=None, context_emb_feature=None, embs_dim_list=None, **kwargs)[source]

Construct Model keyword arguments from a dataset and other args.

Parameters:
  • training_data (SupervisedDataset | MultiTaskDataset) – A SupervisedDataset or a MultiTaskDataset.

  • task_feature (int) – Column index of embedded task indicator features.

  • output_tasks (list[int] | None) – A list of task indices for which to compute model outputs for. If omitted, return outputs for all task indices.

  • context_cat_feature (Tensor | None) – (n_contexts x k) one-hot encoded context features. Rows are ordered by context indices, where k is the number of categorical variables. If None, task indices will be used and k = 1.

  • context_emb_feature (Tensor | None) – (n_contexts x m) pre-given continuous embedding features. Rows are ordered by context indices.

  • embs_dim_list (list[int] | None) – Embedding dimension for each categorical variable. The length equals k. If None, the embedding dimension is set to 1 for each categorical variable.

Return type:

dict[str, Any]

Variational GP Models

References

[burt2020svgp] (1,2,3,4)

David R. Burt and Carl Edward Rasmussen and Mark van der Wilk, Convergence of Sparse Variational Inference in Gaussian Process Regression, Journal of Machine Learning Research, 2020, http://jmlr.org/papers/v21/19-1015.html.

[hensman2013svgp]

James Hensman and Nicolo Fusi and Neil D. Lawrence, Gaussian Processes for Big Data, Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence, 2013, https://arxiv.org/abs/1309.6835.

[moss2023ipa] (1,2,3,4)

Henry B. Moss and Sebastian W. Ober and Victor Picheny, Inducing Point Allocation for Sparse Gaussian Processes in High-Throughput Bayesian Optimization,Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, 2023, https://arxiv.org/pdf/2301.10123.pdf.

class botorch.models.approximate_gp.ApproximateGPyTorchModel(model=None, likelihood=None, num_outputs=1, *args, **kwargs)[source]

Bases: GPyTorchModel

Botorch wrapper class for various (variational) approximate GP models in GPyTorch.

This can either include stochastic variational GPs (SVGPs) or variational implementations of weight space approximate GPs.

Parameters:
  • model (ApproximateGP | None) – Instance of gpytorch.approximate GP models. If omitted, constructs a _SingleTaskVariationalGP.

  • likelihood (Likelihood | None) – Instance of a GPyTorch likelihood. If omitted, uses a either a GaussianLikelihood (if num_outputs=1) or a MultitaskGaussianLikelihood`(if `num_outputs>1).

  • num_outputs (int) – Number of outputs expected for the GP model.

  • args – Optional positional arguments passed to the _SingleTaskVariationalGP constructor if no model is provided.

  • kwargs – Optional keyword arguments passed to the _SingleTaskVariationalGP constructor if no model is provided.

property num_outputs

The number of outputs of the model.

eval()[source]

Puts the model in eval mode.

Return type:

Self

train(mode=True)[source]

Put the model in train mode.

Parameters:

mode (bool) – A boolean denoting whether to put in train or eval mode. If False, model is put in eval mode.

Return type:

Self

posterior(X, output_indices=None, observation_noise=False, posterior_transform=None)[source]

Computes the posterior over model outputs at the provided points.

Parameters:
  • X – A (batch_shape) x q x d-dim Tensor, where d is the dimension of the feature space and q is the number of points considered jointly.

  • observation_noise (bool) – If True, add the observation noise from the likelihood to the posterior. If a Tensor, use it directly as the observation noise (must be of shape (batch_shape) x q). It is assumed to be in the outcome-transformed space if an outcome transform is used.

  • posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.

  • output_indices (list[int] | None)

Returns:

A GPyTorchPosterior object, representing a batch of b joint distributions over q points. Includes observation noise if specified.

Return type:

GPyTorchPosterior

forward(X)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Return type:

MultivariateNormal

class botorch.models.approximate_gp.SingleTaskVariationalGP(train_X, train_Y=None, likelihood=None, num_outputs=1, learn_inducing_points=True, covar_module=None, mean_module=None, variational_distribution=None, variational_strategy=<class 'gpytorch.variational.variational_strategy.VariationalStrategy'>, inducing_points=None, inducing_point_allocator=None, outcome_transform=None, input_transform=None)[source]

Bases: ApproximateGPyTorchModel

A single-task variational GP model following [hensman2013svgp].

By default, the inducing points are initialized though the GreedyVarianceReduction of [burt2020svgp], which is known to be effective for building globally accurate models. However, custom inducing point allocators designed for specific down-stream tasks can also be provided (see [moss2023ipa] for details), e.g. GreedyImprovementReduction when the goal is to build a model suitable for standard BO.

A single-task variational GP using relatively strong priors on the Kernel hyperparameters, which work best when covariates are normalized to the unit cube and outcomes are standardized (zero mean, unit variance).

This model works in batch mode (each batch having its own hyperparameters). When the training observations include multiple outputs, this model will use batching to model outputs independently. However, batches of multi-output models are not supported at this time, if you need to use those, please use a ModelListGP.

Use this model if you have a lot of data or if your responses are non-Gaussian.

To train this model, you should use gpytorch.mlls.VariationalELBO and not the exact marginal log likelihood.

Example

>>> import torch
>>> from botorch.models import SingleTaskVariationalGP
>>> from gpytorch.mlls import VariationalELBO
>>>
>>> train_X = torch.rand(20, 2)
>>> model = SingleTaskVariationalGP(train_X)
>>> mll = VariationalELBO(
>>>     model.likelihood, model.model, num_data=train_X.shape[-2]
>>> )
Parameters:
  • train_X (Tensor) – Training inputs (due to the ability of the SVGP to sub-sample this does not have to be all of the training inputs).

  • train_Y (Tensor | None) – Training targets (optional).

  • likelihood (Likelihood | None) – Instance of a GPyTorch likelihood. If omitted, uses a either a GaussianLikelihood (if num_outputs=1) or a MultitaskGaussianLikelihood`(if `num_outputs>1).

  • num_outputs (int) – Number of output responses per input (default: 1).

  • learn_inducing_points (bool) – If True, the inducing point locations are learned jointly with the other model parameters.

  • covar_module (Kernel | None) – Kernel function. If omitted, uses an RBFKernel.

  • mean_module (Mean | None) – Mean of GP model. If omitted, uses a ConstantMean.

  • variational_distribution (_VariationalDistribution | None) – Type of variational distribution to use (default: CholeskyVariationalDistribution), the properties of the variational distribution will encourage scalability or ease of optimization.

  • variational_strategy (type[_VariationalStrategy]) – Type of variational strategy to use (default: VariationalStrategy). The default setting uses “whitening” of the variational distribution to make training easier.

  • inducing_points (Tensor | int | None) – The number or specific locations of the inducing points.

  • inducing_point_allocator (InducingPointAllocator | None) – The InducingPointAllocator used to initialize the inducing point locations. If omitted, uses GreedyVarianceReduction.

  • outcome_transform (OutcomeTransform | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference. NOTE: If this model is trained in minibatches, an outcome transform with learnable parameters (such as Standardize) would update its parameters for each minibatch, which is undesirable. If you do intend to train in minibatches, we recommend you not use an outcome transform and instead pre-transform your whole data set before fitting the model.

  • input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass. NOTE: If this model is trained in minibatches, an input transform with learnable parameters (such as Normalize) would update its parameters for each minibatch, which is undesirable. If you do intend to train in minibatches, we recommend you not use an input transform and instead pre-transform your whole data set before fitting the model.

property batch_shape: Size

The batch shape of the model.

This is a batch shape from an I/O perspective. For a model with m outputs, a test_batch_shape x q x d-shaped input X to the posterior method returns a Posterior object over an output of shape broadcast(test_batch_shape, model.batch_shape) x q x m.

init_inducing_points(inputs)[source]

Reinitialize the inducing point locations in-place with the current kernel applied to inputs through the model’s inducing point allocation strategy. The variational distribution and variational strategy caches are reset.

Parameters:

inputs (Tensor) – (*batch_shape, n, d)-dim input data tensor.

Returns:

(*batch_shape, m, d)-dim tensor of selected inducing point locations.

Return type:

Tensor

Fully Bayesian GP Models

Gaussian Process Regression models with fully Bayesian inference.

Fully Bayesian models use Bayesian inference over model hyperparameters, such as lengthscales and noise variance, learning a posterior distribution for the hyperparameters using the No-U-Turn-Sampler (NUTS). This is followed by sampling a small set of hyperparameters (often ~16) from the posterior that we will use for model predictions and for computing acquisition function values. By contrast, our “standard” models (e.g. SingleTaskGP) learn only a single best value for each hyperparameter using MAP. The fully Bayesian method generally results in a better and more well-calibrated model, but is more computationally intensive. For a full description, see [Eriksson2021saasbo].

We use a lightweight PyTorch implementation of a Matern-5/2 kernel as there are some performance issues with running NUTS on top of standard GPyTorch models. The resulting hyperparameter samples are loaded into a batched GPyTorch model after fitting.

References:

[Eriksson2021saasbo] (1,2,3)

D. Eriksson, M. Jankowiak. High-Dimensional Bayesian Optimization with Sparse Axis-Aligned Subspaces. Proceedings of the Thirty- Seventh Conference on Uncertainty in Artificial Intelligence, 2021.

botorch.models.fully_bayesian.matern52_kernel(X, lengthscale)[source]

Matern-5/2 kernel.

Parameters:
  • X (Tensor)

  • lengthscale (Tensor)

Return type:

Tensor

botorch.models.fully_bayesian.compute_dists(X, lengthscale)[source]

Compute kernel distances.

Parameters:
  • X (Tensor)

  • lengthscale (Tensor)

Return type:

Tensor

botorch.models.fully_bayesian.reshape_and_detach(target, new_value)[source]

Detach and reshape new_value to match target.

Parameters:
  • target (Tensor)

  • new_value (Tensor)

Return type:

None

class botorch.models.fully_bayesian.PyroModel[source]

Bases: object

Base class for a Pyro model; used to assist in learning hyperparameters.

This class and its subclasses are not a standard BoTorch models; instead the subclasses are used as inputs to a SaasFullyBayesianSingleTaskGP, which should then have its hyperparameters fit with fit_fully_bayesian_model_nuts. (By default, its subclass SaasPyroModel is used). A PyroModel’s sample method should specify lightweight PyTorch functionality, which will be used for fast model fitting with NUTS. The utility of PyroModel is in enabling fast fitting with NUTS, since we would otherwise need to use GPyTorch, which is computationally infeasible in combination with Pyro.

set_inputs(train_X, train_Y, train_Yvar=None)[source]

Set the training data.

Parameters:
  • train_X (Tensor) – Training inputs (n x d)

  • train_Y (Tensor) – Training targets (n x 1)

  • train_Yvar (Tensor | None) – Observed noise variance (n x 1). Inferred if None.

Return type:

None

abstract sample()[source]

Sample from the model.

Return type:

None

abstract postprocess_mcmc_samples(mcmc_samples)[source]

Post-process the final MCMC samples.

Parameters:

mcmc_samples (dict[str, Tensor])

Return type:

dict[str, Tensor]

abstract load_mcmc_samples(mcmc_samples)[source]
Parameters:

mcmc_samples (dict[str, Tensor])

Return type:

tuple[Mean, Kernel, Likelihood]

class botorch.models.fully_bayesian.SaasPyroModel[source]

Bases: PyroModel

Implementation of the sparse axis-aligned subspace priors (SAAS) model.

The SAAS model uses sparsity-inducing priors to identify the most important parameters. This model is suitable for high-dimensional BO with potentially hundreds of tunable parameters. See [Eriksson2021saasbo] for more details.

SaasPyroModel is not a standard BoTorch model; instead, it is used as an input to SaasFullyBayesianSingleTaskGP. It is used as a default keyword argument, and end users are not likely to need to instantiate or modify a SaasPyroModel unless they want to customize its attributes (such as covar_module).

set_inputs(train_X, train_Y, train_Yvar=None)[source]

Set the training data.

Parameters:
  • train_X (Tensor) – Training inputs (n x d)

  • train_Y (Tensor) – Training targets (n x 1)

  • train_Yvar (Tensor | None) – Observed noise variance (n x 1). Inferred if None.

Return type:

None

sample()[source]

Sample from the SAAS model.

This samples the mean, noise variance, outputscale, and lengthscales according to the SAAS prior.

Return type:

None

sample_outputscale(concentration=2.0, rate=0.15, **tkwargs)[source]

Sample the outputscale.

Parameters:
  • concentration (float)

  • rate (float)

  • tkwargs (Any)

Return type:

Tensor

sample_mean(**tkwargs)[source]

Sample the mean constant.

Parameters:

tkwargs (Any)

Return type:

Tensor

sample_noise(**tkwargs)[source]

Sample the noise variance.

Parameters:

tkwargs (Any)

Return type:

Tensor

sample_lengthscale(dim, alpha=0.1, **tkwargs)[source]

Sample the lengthscale.

Parameters:
  • dim (int)

  • alpha (float)

  • tkwargs (Any)

Return type:

Tensor

postprocess_mcmc_samples(mcmc_samples)[source]

Post-process the MCMC samples.

This computes the true lengthscales and removes the inverse lengthscales and tausq (global shrinkage).

Parameters:

mcmc_samples (dict[str, Tensor])

Return type:

dict[str, Tensor]

load_mcmc_samples(mcmc_samples)[source]

Load the MCMC samples into the mean_module, covar_module, and likelihood.

Parameters:

mcmc_samples (dict[str, Tensor])

Return type:

tuple[Mean, Kernel, Likelihood]

class botorch.models.fully_bayesian.SaasFullyBayesianSingleTaskGP(train_X, train_Y, train_Yvar=None, outcome_transform=None, input_transform=None, pyro_model=None)[source]

Bases: ExactGP, BatchedMultiOutputGPyTorchModel

A fully Bayesian single-task GP model with the SAAS prior.

This model assumes that the inputs have been normalized to [0, 1]^d and that the output has been standardized to have zero mean and unit variance. You can either normalize and standardize the data before constructing the model or use an input_transform and outcome_transform. The SAAS model [Eriksson2021saasbo] with a Matern-5/2 kernel is used by default.

You are expected to use fit_fully_bayesian_model_nuts to fit this model as it isn’t compatible with fit_gpytorch_mll.

Example

>>> saas_gp = SaasFullyBayesianSingleTaskGP(train_X, train_Y)
>>> fit_fully_bayesian_model_nuts(saas_gp)
>>> posterior = saas_gp.posterior(test_X)

Initialize the fully Bayesian single-task GP model.

Parameters:
  • train_X (Tensor) – Training inputs (n x d)

  • train_Y (Tensor) – Training targets (n x 1)

  • train_Yvar (Tensor | None) – Observed noise variance (n x 1). Inferred if None.

  • outcome_transform (OutcomeTransform | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the Posterior obtained by calling .posterior on the model will be on the original scale).

  • input_transform (InputTransform | None) – An input transform that is applied in the model’s forward pass.

  • pyro_model (PyroModel | None) – Optional PyroModel, defaults to SaasPyroModel.

property median_lengthscale: Tensor

Median lengthscales across the MCMC samples.

property num_mcmc_samples: int

Number of MCMC samples in the model.

property batch_shape: Size

Batch shape of the model, equal to the number of MCMC samples. Note that SaasFullyBayesianSingleTaskGP does not support batching over input data at this point.

train(mode=True)[source]

Puts the model in train mode.

Parameters:

mode (bool)

Return type:

None

load_mcmc_samples(mcmc_samples)[source]

Load the MCMC hyperparameter samples into the model.

This method will be called by fit_fully_bayesian_model_nuts when the model has been fitted in order to create a batched SingleTaskGP model.

Parameters:

mcmc_samples (dict[str, Tensor])

Return type:

None

load_state_dict(state_dict, strict=True)[source]

Custom logic for loading the state dict.

The standard approach of calling load_state_dict currently doesn’t play well with the SaasFullyBayesianSingleTaskGP since the mean_module, covar_module and likelihood aren’t initialized until the model has been fitted. The reason for this is that we don’t know the number of MCMC samples until NUTS is called. Given the state dict, we can initialize a new model with some dummy samples and then load the state dict into this model. This currently only works for a SaasPyroModel and supporting more Pyro models likely requires moving the model construction logic into the Pyro model itself.

Parameters:
  • state_dict (Mapping[str, Any])

  • strict (bool)

forward(X)[source]

Unlike in other classes’ forward methods, there is no if self.training block, because it ought to be unreachable: If self.train() has been called, then self.covar_module will be None, check_if_fitted() will fail, and the rest of this method will not run.

Parameters:

X (Tensor)

Return type:

MultivariateNormal

posterior(X, output_indices=None, observation_noise=False, posterior_transform=None, **kwargs)[source]

Computes the posterior over model outputs at the provided points.

Parameters:
  • X (Tensor) – A (batch_shape) x q x d-dim Tensor, where d is the dimension of the feature space and q is the number of points considered jointly.

  • output_indices (list[int] | None) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.

  • observation_noise (bool) – If True, add the observation noise from the likelihood to the posterior. If a Tensor, use it directly as the observation noise (must be of shape (batch_shape) x q x m).

  • posterior_transform (PosteriorTransform | None) – An optional PosteriorTransform.

  • kwargs (Any)

Returns:

A GaussianMixturePosterior object. Includes observation noise

if specified.

Return type:

GaussianMixturePosterior

condition_on_observations(X, Y, **kwargs)[source]

Conditions on additional observations for a Fully Bayesian model (either identical across models or unique per-model).

Parameters:
  • X (Tensor) – A batch_shape x num_samples x d-dim Tensor, where d is the dimension of the feature space and batch_shape is the number of sampled models.

  • Y (Tensor) – A batch_shape x num_samples x 1-dim Tensor, where d is the dimension of the feature space and batch_shape is the number of sampled models.

  • kwargs (Any)

Returns:

A fully bayesian model conditioned on

given observations. The returned model has batch_shape copies of the training data in case of identical observations (and batch_shape training datasets otherwise).

Return type:

BatchedMultiOutputGPyTorchModel

Fully Bayesian Multitask GP Models

Multi-task Gaussian Process Regression models with fully Bayesian inference.

class botorch.models.fully_bayesian_multitask.MultitaskSaasPyroModel[source]

Bases: SaasPyroModel

Implementation of the multi-task sparse axis-aligned subspace priors (SAAS) model.

The multi-task model uses an ICM kernel. The data kernel is same as the single task SAAS model in order to handle high-dimensional parameter spaces. The task kernel is a Matern-5/2 kernel using learned task embeddings as the input.

set_inputs(train_X, train_Y, train_Yvar, task_feature, task_rank=None)[source]

Set the training data.

Parameters:
  • train_X (Tensor) – Training inputs (n x (d + 1))

  • train_Y (Tensor) – Training targets (n x 1)

  • train_Yvar (Tensor | None) – Observed noise variance (n x 1). If None, we infer the noise. Note that the inferred noise is common across all tasks.

  • task_feature (int) – The index of the task feature (-d <= task_feature <= d).

  • task_rank (int | None) – The num of learned task embeddings to be used in the task kernel. If omitted, use a full rank (i.e. number of tasks) kernel.

Return type:

None

sample()[source]

Sample from the SAAS model.

This samples the mean, noise variance, outputscale, and lengthscales according to the SAAS prior.

Return type:

None

sample_latent_features(**tkwargs)[source]
Parameters:

tkwargs (Any)

sample_task_lengthscale(concentration=6.0, rate=3.0, **tkwargs)[source]
Parameters:
  • concentration (float)

  • rate (float)

  • tkwargs (Any)

load_mcmc_samples(mcmc_samples)[source]

Load the MCMC samples into the mean_module, covar_module, and likelihood.

Parameters:

mcmc_samples (dict[str, Tensor])

Return type:

tuple[Mean, Kernel, Likelihood, Kernel, Parameter]

class botorch.models.fully_bayesian_multitask.SaasFullyBayesianMultiTaskGP(train_X, train_Y, task_feature, train_Yvar=None, output_tasks=None, rank=None, all_tasks=None, outcome_transform=None, input_transform=None, pyro_model=None)[source]

Bases: MultiTaskGP

A fully Bayesian multi-task GP model with the SAAS prior.

This model assumes that the inputs have been normalized to [0, 1]^d and that the output has been stratified standardized to have zero mean and unit variance for each task. The SAAS model [Eriksson2021saasbo] with a Matern-5/2 is used as data kernel by default.

You are expected to use fit_fully_bayesian_model_nuts to fit this model as it isn’t compatible with fit_gpytorch_mll.

Example

>>> X1, X2 = torch.rand(10, 2), torch.rand(20, 2)
>>> i1, i2 = torch.zeros(10, 1), torch.ones(20, 1)
>>> train_X = torch.cat([
>>>     torch.cat([X1, i1], -1), torch.cat([X2, i2], -1),
>>> ])
>>> train_Y = torch.cat(f1(X1), f2(X2)).unsqueeze(-1)
>>> train_Yvar = 0.01 * torch.ones_like(train_Y)
>>> mtsaas_gp = SaasFullyBayesianMultiTaskGP(
>>>     train_X, train_Y, train_Yvar, task_feature=-1,
>>> )
>>> fit_fully_bayesian_model_nuts(mtsaas_gp)
>>> posterior = mtsaas_gp.posterior(test_X)

Initialize the fully Bayesian multi-task GP model.

Parameters:
  • train_X (Tensor) – Training inputs (n x (d + 1))

  • train_Y (Tensor) – Training targets (n x 1)

  • train_Yvar (Tensor | None) – Observed noise variance (n x 1). If None, we infer the noise. Note that the inferred noise is common across all tasks.

  • task_feature (int) – The index of the task feature (-d <= task_feature <= d).

  • output_tasks (list[int] | None) – A list of task indices for which to compute model outputs for. If omitted, return outputs for all task indices.

  • rank (int | None) – The num of learned task embeddings to be used in the task kernel. If omitted, use a full rank (i.e. number of tasks) kernel.

  • all_tasks (list[int] | None) – NOT SUPPORTED!

  • outcome_transform (OutcomeTransform | None) – An outcome transform that is applied to the training data during instantiation and to the posterior during inference (that is, the Posterior obtained by calling .posterior on the model will be on the original scale).

  • input_transform (InputTransform | None) – An input transform that is applied to the inputs X in the model’s forward pass.

  • pyro_model (MultitaskSaasPyroModel | None) – Optional PyroModel that has the same signature as MultitaskSaasPyroModel. Defaults to MultitaskSaasPyroModel.

train(mode=True)[source]

Puts the model in train mode.

Parameters:

mode (bool)

Return type:

None

property median_lengthscale: Tensor

Median lengthscales across the MCMC samples.

property num_mcmc_samples: int

Number of MCMC samples in the model.

property batch_shape: Size

Batch shape of the model, equal to the number of MCMC samples. Note that SaasFullyBayesianMultiTaskGP does not support batching over input data at this point.

fantasize(*args, **kwargs)[source]

Construct a fantasy model.

Constructs a fantasy model in the following fashion: (1) compute the model posterior at X, including observation noise. If observation_noise is a Tensor, use it directly as the observation noise to add. (2) sample from this posterior (using sampler) to generate “fake” observations. (3) condition the model on the new fake observations.

Parameters:
  • X – A batch_shape x n’ x d-dim Tensor, where d is the dimension of the feature space, n’ is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).

  • sampler – The sampler used for sampling from the posterior at X.

  • observation_noise – A model_batch_shape x 1 x m-dim tensor or a model_batch_shape x n’ x m-dim tensor containing the average noise for each batch and output, where m is the number of outputs. noise must be in the outcome-transformed space if an outcome transform is used. If None and using an inferred noise likelihood, the noise will be the inferred noise level. If using a fixed noise likelihood, the mean across the observation noise in the training data is used as observation noise.

  • kwargs – Will be passed to model.condition_on_observations

Returns:

The constructed fantasy model.

Return type:

NoReturn

load_mcmc_samples(mcmc_samples)[source]

Load the MCMC hyperparameter samples into the model.

This method will be called by fit_fully_bayesian_model_nuts when the model has been fitted in order to create a batched MultiTaskGP model.

Parameters:

mcmc_samples (dict[str, Tensor])

Return type:

None

posterior(X, output_indices=None, observation_noise=False, posterior_transform=None, **kwargs)[source]

Computes the posterior over model outputs at the provided points.

Returns:

A GaussianMixturePosterior object. Includes observation noise

if specified.

Parameters:
  • X (Tensor)

  • output_indices (list[int] | None)

  • observation_noise (bool)

  • posterior_transform (PosteriorTransform | None)

  • kwargs (Any)

Return type:

GaussianMixturePosterior

forward(X)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Parameters:

X (Tensor)

Return type:

MultivariateNormal

load_state_dict(state_dict, strict=True)[source]

Custom logic for loading the state dict.

The standard approach of calling load_state_dict currently doesn’t play well with the SaasFullyBayesianMultiTaskGP since the mean_module, covar_module and likelihood aren’t initialized until the model has been fitted. The reason for this is that we don’t know the number of MCMC samples until NUTS is called. Given the state dict, we can initialize a new model with some dummy samples and then load the state dict into this model. This currently only works for a MultitaskSaasPyroModel and supporting more Pyro models likely requires moving the model construction logic into the Pyro model itself.

TODO: If this were to inherif from SaasFullyBayesianSingleTaskGP, we could simplify this method and eliminate some others.

Parameters:
  • state_dict (Mapping[str, Any])

  • strict (bool)

Relevance Pursuit Models

Relevance Pursuit model structure and optimization routines for the sparse optimization of Gaussian process hyper-parameters, see [Ament2024pursuit] for details.

References

[Ament2024pursuit] (1,2,3,4,5,6,7,8)

S. Ament, E. Santorella, D. Eriksson, B. Letham, M. Balandat, and E. Bakshy. Robust Gaussian Processes via Relevance Pursuit. Advances in Neural Information Processing Systems 37, 2024. Arxiv: https://arxiv.org/abs/2410.24222.

class botorch.models.relevance_pursuit.RelevancePursuitMixin(dim, support)[source]

Bases: ABC

Mixin class to convert between the sparse and dense representations of the relevance pursuit models’ sparse parameters, as well as to compute the generalized support acquisition and support deletion criteria.

Constructor for the RelevancePursuitMixin class.

For details, see [Ament2024pursuit] or https://arxiv.org/abs/2410.24222.

Parameters:
  • dim (int) – The total number of features.

  • support (list[int] | None) – The indices of the features in the support, subset of range(dim).

dim: int
abstract property sparse_parameter: Parameter

The sparse parameter, required to have a single indexing dimension.

abstract set_sparse_parameter(value)[source]

Sets the sparse parameter.

NOTE: We can’t use the property setter @sparse_parameter.setter because of the special way PyTorch treats Parameter types, including custom setters that bypass the @property setters before the latter are called.

Parameters:

value (Parameter)

Return type:

None

property is_sparse: bool
property support: list[int]

The indices of the active parameters.

property is_active: Tensor

A Boolean Tensor of length dim, indicating which of the dim indices of self.sparse_parameter are in the support, i.e. active.

property inactive_indices: Tensor

An integral Tensor of length dim - len(support), indicating which of the indices of self.sparse_parameter are not in the support, i.e. inactive.

to_sparse()[source]

Converts the sparse parameter to its sparse (< dim) representation.

Returns:

The current object in its sparse representation.

Return type:

RelevancePursuitMixin

to_dense()[source]

Converts the sparse parameter to its dense, length-dim representation.

Returns:

The current object in its dense representation.

Return type:

RelevancePursuitMixin

expand_support(indices)[source]

Expands the support by a number of indices.

Parameters:

indices (list[int]) – A list of indices of self.sparse_parameter to add to the support.

Returns:

The current object, updated with the expanded support.

Return type:

RelevancePursuitMixin

contract_support(indices)[source]

Contracts the support by a number of indices.

Parameters:

indices (list[int]) – A list of indices of self.sparse_parameter to remove from the support.

Returns:

The current object, updated with the contracted support.

Return type:

RelevancePursuitMixin

full_support()[source]

Initializes the RelevancePursuitMixin with a full, size-dim support.

Returns:

The current object with full support in the dense representation.

Return type:

RelevancePursuitMixin

remove_support()[source]

Initializes the RelevancePursuitMixin with an empty, size-zero support.

Returns:

The current object with empty support, representation unchanged.

Return type:

RelevancePursuitMixin

support_expansion(mll, n=1, modifier=None)[source]

Computes the indices of the features that maximize the gradient of the sparse parameter and that are not already in the support, and subsequently expands the support to include the features if their gradient is positive.

Parameters:
  • mll (ExactMarginalLogLikelihood) – The marginal likelihood, containing the model to optimize. NOTE: Virtually all of the rest of the code is not specific to the marginal likelihood optimization, so we could generalize this to work with any objective.

  • n (int) – The number of features to select.

  • modifier (Callable[[Tensor], Tensor] | None) – A function that modifies the gradient of the inactive parameters before computing the support expansion criterion. This can be used to select the maximum gradient magnitude for real-valued parameters whose gradients are not non-negative, using modifier = torch.abs.

Returns:

True if the support was expanded, False otherwise.

Return type:

bool

expansion_objective(mll)[source]

Computes an objective value for all the inactive parameters, i.e. self.sparse_parameter[~self.is_active] since we can’t add already active parameters to the support. This value will be used to select the parameters.

Parameters:

mll (ExactMarginalLogLikelihood) – The marginal likelihood, containing the model to optimize.

Returns:

The expansion objective value for all the inactive parameters.

Return type:

Tensor

support_contraction(mll, n=1, modifier=None)[source]

Computes the indices of the features that have the smallest coefficients, and subsequently contracts the exlude the features.

Parameters:
  • mll (ExactMarginalLogLikelihood) – The marginal likelihood, containing the model to optimize. NOTE: Virtually all of the rest of the code is not specific to the marginal likelihood optimization, so we could generalize this to work with any objective.

  • n (int) – The number of features to select for removal.

  • modifier (Callable[[Tensor], Tensor] | None) – A function that modifies the parameter values before computing the support contraction criterion.

Returns:

True if the support was expanded, False otherwise.

Return type:

bool

optimize_mll(mll, model_trace=None, reset_parameters=False, reset_dense_parameters=False, optimizer_kwargs=None)[source]

Optimizes the marginal likelihood.

Parameters:
  • mll (ExactMarginalLogLikelihood) – The marginal likelihood, containing the model to optimize.

  • model_trace (list[Model] | None) – If not None, a list to which a deepcopy of the model state is appended. NOTE This operation is in place.

  • reset_parameters (bool) – If True, initializes the sparse parameter to the all-zeros vector before every marginal likelihood optimization step. If False, the optimization is warm-started with the previous iteration’s parameters.

  • reset_dense_parameters (bool) – If True, re-initializes the dense parameters, e.g. other GP hyper-parameters that are not part of the Relevance Pursuit module, to the initial values provided by their associated constraints.

  • optimizer_kwargs (dict[str, Any] | None) – A dictionary of keyword arguments for the optimizer.

Returns:

The marginal likelihood after optimization.

botorch.models.relevance_pursuit.forward_relevance_pursuit(sparse_module, mll, sparsity_levels=None, optimizer_kwargs=None, reset_parameters=False, reset_dense_parameters=False, record_model_trace=True, initial_support=None)[source]

Forward Relevance Pursuit.

NOTE: For the robust SparseOutlierNoise model of [Ament2024pursuit], the forward algorithm is generally faster than the backward algorithm, particularly when the maximum sparsity level is small, but it leads to less robust results when the number of outliers is large.

For details, see [Ament2024pursuit] or https://arxiv.org/abs/2410.24222.

Example

>>> base_noise = HomoskedasticNoise(
>>>    noise_constraint=NonTransformedInterval(
>>>        1e-5, 1e-1, initial_value=1e-3
>>>    )
>>> )
>>> likelihood = SparseOutlierGaussianLikelihood(
>>>    base_noise=base_noise,
>>>    dim=X.shape[0],
>>> )
>>> model = SingleTaskGP(train_X=X, train_Y=Y, likelihood=likelihood)
>>> mll = ExactMarginalLogLikelihood(model.likelihood, model)
>>> # NOTE: `likelihood.noise_covar` is the `RelevancePursuitMixin`
>>> sparse_module = likelihood.noise_covar
>>> sparse_module, model_trace = forward_relevance_pursuit(sparse_module, mll)
Parameters:
  • sparse_module (RelevancePursuitMixin) – The relevance pursuit module.

  • mll (ExactMarginalLogLikelihood) – The marginal likelihood, containing the model to optimize.

  • sparsity_levels (list[int] | None) – The sparsity levels to expand the support to.

  • optimizer_kwargs (dict[str, Any] | None) – A dictionary of keyword arguments to pass to the optimizer. By default, initializes the “options” sub-dictionary with maxiter and ftol, gtol values, unless specified.

  • reset_parameters (bool) – If true, initializes the sparse parameter to the all zeros after each iteration.

  • reset_dense_parameters (bool) – If true, re-initializes the dense parameters, e.g. other GP hyper-parameters that are not part of the Relevance Pursuit module, to the initial values provided by their associated constraints.

  • record_model_trace (bool) – If true, records the model state after every iteration.

  • initial_support (list[int] | None) – The support with which to initialize the sparse module. By default, the support is initialized to the empty set.

Returns:

The relevance pursuit module after forward relevance pursuit optimization, and a list of models with different supports that were optimized.

Return type:

tuple[RelevancePursuitMixin, list[Model] | None]

botorch.models.relevance_pursuit.backward_relevance_pursuit(sparse_module, mll, sparsity_levels=None, optimizer_kwargs=None, reset_parameters=False, reset_dense_parameters=False, record_model_trace=True, initial_support=None)[source]

Backward Relevance Pursuit.

NOTE: For the robust SparseOutlierNoise model of [Ament2024pursuit], the backward algorithm generally leads to more robust results than the forward algorithm, especially when the number of outliers is large, but is more expensive unless the support is contracted by more than one in each iteration.

For details, see [Ament2024pursuit] or https://arxiv.org/abs/2410.24222.

Example

>>> base_noise = HomoskedasticNoise(
>>>    noise_constraint=NonTransformedInterval(
>>>        1e-5, 1e-1, initial_value=1e-3
>>>    )
>>> )
>>> likelihood = SparseOutlierGaussianLikelihood(
>>>    base_noise=base_noise,
>>>    dim=X.shape[0],
>>> )
>>> model = SingleTaskGP(train_X=X, train_Y=Y, likelihood=likelihood)
>>> mll = ExactMarginalLogLikelihood(model.likelihood, model)
>>> # NOTE: `likelihood.noise_covar` is the `RelevancePursuitMixin`
>>> sparse_module = likelihood.noise_covar
>>> sparse_module, model_trace = backward_relevance_pursuit(sparse_module, mll)
Parameters:
  • sparse_module (RelevancePursuitMixin) – The relevance pursuit module.

  • mll (ExactMarginalLogLikelihood) – The marginal likelihood, containing the model to optimize.

  • sparsity_levels (list[int] | None) – The sparsity levels to expand the support to.

  • optimizer_kwargs (dict[str, Any] | None) – A dictionary of keyword arguments to pass to the optimizer. By default, initializes the “options” sub-dictionary with maxiter and ftol, gtol values, unless specified.

  • reset_parameters (bool) – If true, initializes the sparse parameter to the all zeros after each iteration.

  • reset_dense_parameters (bool) – If true, re-initializes the dense parameters, e.g. other GP hyper-parameters that are not part of the Relevance Pursuit module, to the initial values provided by their associated constraints.

  • record_model_trace (bool) – If true, records the model state after every iteration.

  • initial_support (list[int] | None) – The support with which to initialize the sparse module. By default, the support is initialized to the full set.

Returns:

The relevance pursuit module after forward relevance pursuit optimization, and a list of models with different supports that were optimized.

Return type:

tuple[RelevancePursuitMixin, list[Model] | None]

botorch.models.relevance_pursuit.get_posterior_over_support(rp_class, model_trace, log_support_prior=None, prior_mean_of_support=None)[source]

Computes the posterior distribution over a list of models. Assumes we are storing both likelihood and GP model in the model_trace.

Example

>>> likelihood = SparseOutlierGaussianLikelihood(
>>>    base_noise=base_noise,
>>>    dim=X.shape[0],
>>> )
>>> model = SingleTaskGP(train_X=X, train_Y=Y, likelihood=likelihood)
>>> mll = ExactMarginalLogLikelihood(model.likelihood, model)
>>> # NOTE: `likelihood.noise_covar` is the `RelevancePursuitMixin`
>>> sparse_module = likelihood.noise_covar
>>> sparse_module, model_trace = backward_relevance_pursuit(sparse_module, mll)
>>> # NOTE: SparseOutlierNoise is the type of `sparse_module`
>>> support_size, bmc_probabilities = get_posterior_over_support(
>>>    SparseOutlierNoise, model_trace, prior_mean_of_support=2.0
>>> )
Parameters:
  • rp_class (type[RelevancePursuitMixin]) – The relevance pursuit class to use for computing the support size. This is used to get the RelevancePursuitMixin from the Model via the static method _from_model. We could generalize this and let the user pass this getter instead.

  • model_trace (list[Model]) – A list of models with different support sizes, usually generated with relevance_pursuit.

  • log_support_prior (Callable[[Tensor], Tensor] | None) – Callable that computes the log prior probability of a support size. If None, uses a default exponential prior with a mean specified by prior_mean_of_support.

  • prior_mean_of_support (float | None) – A mean value for the default exponential prior distribution over the support size. Ignored if log_support_prior is passed.

Returns:

A tensor of posterior marginal likelihoods, one for each model in the trace.

Return type:

tuple[Tensor, Tensor]

botorch.models.relevance_pursuit.initialize_dense_parameters(model)[source]

Sets the dense parameters of a model to their initial values. Infers initial values from the constraints, if no initial values are provided. If a parameter does not have a constraint, it is initialized to zero.

Parameters:

model (Model) – The model to initialize.

Returns:

The re-initialized model, and a dictionary of initial values.

Return type:

tuple[Model, dict[str, Any]]

Model Components

Kernels

class botorch.models.kernels.categorical.CategoricalKernel(ard_num_dims=None, batch_shape=None, active_dims=None, lengthscale_prior=None, lengthscale_constraint=None, eps=1e-06, **kwargs)[source]

Bases: Kernel

A Kernel for categorical features.

Computes exp(-dist(x1, x2) / lengthscale), where dist(x1, x2) is zero if x1 == x2 and one if x1 != x2. If the last dimension is not a batch dimension, then the mean is considered.

Note: This kernel is NOT differentiable w.r.t. the inputs.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters:
  • ard_num_dims (Optional[int])

  • batch_shape (Optional[torch.Size])

  • active_dims (Optional[Tuple[int, ...]])

  • lengthscale_prior (Optional[Prior])

  • lengthscale_constraint (Optional[Interval])

  • eps (float)

class botorch.models.kernels.downsampling.DownsamplingKernel(power_prior=None, offset_prior=None, power_constraint=None, offset_constraint=None, **kwargs)[source]

Bases: Kernel

GPyTorch Downsampling Kernel.

Computes a covariance matrix based on the down sampling kernel between inputs x_1 and x_2 (we expect d = 1):

K(mathbf{x_1}, mathbf{x_2}) = c + (1 - x_1)^(1 + delta) *

(1 - x_2)^(1 + delta).

where c is an offset parameter, and delta is a power parameter.

Parameters:
  • power_constraint (Interval | None) – Constraint to place on power parameter. Default is Positive.

  • power_prior (Prior | None) – Prior over the power parameter.

  • offset_constraint (Interval | None) – Constraint to place on offset parameter. Default is Positive.

  • active_dims – List of data dimensions to operate on. len(active_dims) should equal num_dimensions.

  • offset_prior (Prior | None)

class botorch.models.kernels.exponential_decay.ExponentialDecayKernel(power_prior=None, offset_prior=None, power_constraint=None, offset_constraint=None, **kwargs)[source]

Bases: Kernel

GPyTorch Exponential Decay Kernel.

Computes a covariance matrix based on the exponential decay kernel between inputs x_1 and x_2 (we expect d = 1):

K(x_1, x_2) = w + beta^alpha / (x_1 + x_2 + beta)^alpha.

where w is an offset parameter, beta is a lenthscale parameter, and alpha is a power parameter.

Parameters:
  • lengthscale_constraint – Constraint to place on lengthscale parameter. Default is Positive.

  • lengthscale_prior – Prior over the lengthscale parameter.

  • power_constraint (Interval | None) – Constraint to place on power parameter. Default is Positive.

  • power_prior (Prior | None) – Prior over the power parameter.

  • offset_constraint (Interval | None) – Constraint to place on offset parameter. Default is Positive.

  • active_dims – List of data dimensions to operate on. len(active_dims) should equal num_dimensions.

  • offset_prior (Prior | None)

class botorch.models.kernels.infinite_width_bnn.InfiniteWidthBNNKernel(depth=3, batch_shape=None, active_dims=None, acos_eps=1e-07, device=None)[source]

Bases: Kernel

Infinite-width BNN kernel.

Defines the GP kernel which is equivalent to performing exact Bayesian inference on a fully-connected deep neural network with ReLU activations and i.i.d. priors in the infinite-width limit. See [Cho2009kernel] and [Lee2018deep] for details.

[Cho2009kernel]

Y. Cho, and L. Saul. Kernel methods for deep learning. Advances in Neural Information Processing Systems 22. 2009.

[Lee2018deep]

J. Lee, Y. Bahri, R. Novak, S. Schoenholz, J. Pennington, and J. Dickstein. Deep Neural Networks as Gaussian Processes. International Conference on Learning Representations. 2018.

Parameters:
  • depth (int) – Depth of neural network.

  • batch_shape (torch.Size | None) – This will set a separate weight/bias var for each batch. It should be \(B_1 \times \ldots \times B_k\) if \(\mathbf\) is a \(B_1 \times \ldots \times B_k \times N \times D\) tensor.

  • active_dims (param) – Compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions.

  • acos_eps (param) – A small positive value to restrict acos inputs to :math`[-1 + epsilon, 1 - epsilon]`

  • device (param) – Device for parameters.

class botorch.models.kernels.linear_truncated_fidelity.LinearTruncatedFidelityKernel(fidelity_dims, dimension=None, power_prior=None, power_constraint=None, nu=2.5, lengthscale_prior_unbiased=None, lengthscale_prior_biased=None, lengthscale_constraint_unbiased=None, lengthscale_constraint_biased=None, covar_module_unbiased=None, covar_module_biased=None, **kwargs)[source]

Bases: Kernel

GPyTorch Linear Truncated Fidelity Kernel.

Computes a covariance matrix based on the Linear truncated kernel between inputs x_1 and x_2 for up to two fidelity parmeters:

K(x_1, x_2) = k_0 + c_1(x_1, x_2)k_1 + c_2(x_1,x_2)k_2 + c_3(x_1,x_2)k_3

where

  • k_i(i=0,1,2,3) are Matern kernels calculated between non-fidelity

    parameters of x_1 and x_2 with different priors.

  • c_1=(1 - x_1[f_1])(1 - x_2[f_1]))(1 + x_1[f_1] x_2[f_1])^p is the kernel

    of the the bias term, which can be decomposed into a determistic part and a polynomial kernel. Here f_1 is the first fidelity dimension and p is the order of the polynomial kernel.

  • c_3 is the same as c_1 but is calculated for the second fidelity

    dimension f_2.

  • c_2 is the interaction term with four deterministic terms and the

    polynomial kernel between x_1[…, [f_1, f_2]] and x_2[…, [f_1, f_2]].

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = LinearTruncatedFidelityKernel()
>>> covar = covar_module(x)  # Output: LinearOperator of size (10 x 10)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = LinearTruncatedFidelityKernel(batch_shape = torch.Size([2]))
>>> covar = covar_module(x)  # Output: LinearOperator of size (2 x 10 x 10)
Parameters:
  • fidelity_dims (list[int]) – A list containing either one or two indices specifying the fidelity parameters of the input.

  • dimension (int | None) – The dimension of x. Unused if active_dims is specified.

  • power_prior (Prior | None) – Prior for the power parameter of the polynomial kernel. Default is None.

  • power_constraint (Interval | None) – Constraint on the power parameter of the polynomial kernel. Default is Positive.

  • nu (float) – The smoothness parameter for the Matern kernel: either 1/2, 3/2, or 5/2. Unused if both covar_module_unbiased and covar_module_biased are specified.

  • lengthscale_prior_unbiased (Prior | None) – Prior on the lengthscale parameter of Matern kernel k_0. Default is Gamma(1.1, 1/20).

  • lengthscale_constraint_unbiased (Interval | None) – Constraint on the lengthscale parameter of the Matern kernel k_0. Default is Positive.

  • lengthscale_prior_biased (Prior | None) – Prior on the lengthscale parameter of Matern kernels k_i(i>0). Default is Gamma(5, 1/20).

  • lengthscale_constraint_biased (Interval | None) – Constraint on the lengthscale parameter of the Matern kernels k_i(i>0). Default is Positive.

  • covar_module_unbiased (Kernel | None) – Specify a custom kernel for k_0. If omitted, use a MaternKernel.

  • covar_module_biased (Kernel | None) – Specify a custom kernel for the biased parts k_i(i>0). If omitted, use a MaternKernel.

  • batch_shape – If specified, use a separate lengthscale for each batch of input data. If x1 is a batch_shape x n x d tensor, this should be batch_shape.

  • active_dims – Compute the covariance of a subset of input dimensions. The numbers correspond to the indices of the dimensions.

  • kwargs (Any)

class botorch.models.kernels.contextual_lcea.LCEAKernel(decomposition, batch_shape, train_embedding=True, cat_feature_dict=None, embs_feature_dict=None, embs_dim_list=None, context_weight_dict=None, device=None)[source]

Bases: Kernel

The Latent Context Embedding Additive (LCE-A) Kernel.

This kernel is similar to the SACKernel, and is used when context breakdowns are unbserverable. It assumes the same additive structure and a spatial kernel shared across contexts. Rather than assuming independence, LCEAKernel models the correlation in the latent functions for each context through learning context embeddings.

Parameters:
  • decomposition (dict[str, list[int]]) – Keys index context names. Values are the indexes of parameters belong to the context.

  • batch_shape (Size) – Batch shape as usual for gpytorch kernels. Model does not support batch training. When batch_shape is non-empty, it is used for loading hyper-parameter values generated from MCMC sampling.

  • train_embedding (bool) – A boolean indictor of whether to learn context embeddings.

  • cat_feature_dict (dict | None) – Keys are context names and values are list of categorical features i.e. {“context_name” : [cat_0, …, cat_k]}. k equals the number of categorical variables. If None, uses context names in the decomposition as the only categorical feature, i.e., k = 1.

  • embs_feature_dict (dict | None) – Pre-trained continuous embedding features of each context.

  • embs_dim_list (list[int] | None) – Embedding dimension for each categorical variable. The length equals to num of categorical features k. If None, the embedding dimension is set to 1 for each categorical variable.

  • context_weight_dict (dict | None) – Known population weights of each context.

  • device (device | None)

class botorch.models.kernels.contextual_sac.SACKernel(decomposition, batch_shape, device=None)[source]

Bases: Kernel

The structural additive contextual(SAC) kernel.

The kernel is used for contextual BO without oberseving context breakdowns. There are d parameters and M contexts. In total, the dimension of parameter space is d*M and input x can be written as x=[x_11, …, x_1d, x_21, …, x_2d, …, x_M1, …, x_Md].

The kernel uses the parameter decomposition and assumes an additive structure across contexts. Each context compponent is assumed to be independent.

\[\begin{equation*} k(\mathbf{x}, \mathbf{x'}) = k_1(\mathbf{x_(1)}, \mathbf{x'_(1)}) + \cdots + k_M(\mathbf{x_(M)}, \mathbf{x'_(M)}) \end{equation*}\]

where * :math: M is the number of partitions of parameter space. Each partition contains same number of parameters d. Each kernel k_i acts only on d parameters of ith partition i.e. mathbf{x}_(i). Each kernel k_i is a scaled RBF kernel with same lengthscales but different outputscales.

Parameters:
  • decomposition (dict[str, list[int]]) – Keys are context names. Values are the indexes of parameters belong to the context. The parameter indexes are in the same order across contexts.

  • batch_shape (Size) – Batch shape as usual for gpytorch kernels.

  • device (device | None) – The torch device.

class botorch.models.kernels.orthogonal_additive_kernel.OrthogonalAdditiveKernel(base_kernel, dim, quad_deg=32, second_order=False, batch_shape=None, dtype=None, device=None, coeff_constraint=Positive(), offset_prior=None, coeffs_1_prior=None, coeffs_2_prior=None)[source]

Bases: Kernel

Orthogonal Additive Kernels (OAKs) were introduced in [Lu2022additive], though only for the case of Gaussian base kernels with a Gaussian input data distribution.

The implementation here generalizes OAKs to arbitrary base kernels by using a Gauss-Legendre quadrature approximation to the required one-dimensional integrals involving the base kernels.

[Lu2022additive]

X. Lu, A. Boukouvalas, and J. Hensman. Additive Gaussian processes revisited. Proceedings of the 39th International Conference on Machine Learning. Jul 2022.

Parameters:
  • base_kernel (Kernel) – The kernel which to orthogonalize and evaluate in forward.

  • dim (int) – Input dimensionality of the kernel.

  • quad_deg (int) – Number of integration nodes for orthogonalization.

  • second_order (bool) – Toggles second order interactions. If true, both the time and space complexity of evaluating the kernel are quadratic in dim.

  • batch_shape (Size | None) – Optional batch shape for the kernel and its parameters.

  • dtype (dtype | None) – Initialization dtype for required Tensors.

  • device (device | None) – Initialization device for required Tensors.

  • coeff_constraint (Interval) – Constraint on the coefficients of the additive kernel.

  • offset_prior (Prior | None) – Prior on the offset coefficient. Should be prior with non- negative support.

  • coeffs_1_prior (Prior | None) – Prior on the parameter main effects. Should be prior with non-negative support.

  • coeffs_2_prior (Prior | None) – coeffs_1_prior: Prior on the parameter interactions. Should be prior with non-negative support.

Likelihoods

Pairwise likelihood for pairwise preference model (e.g., PairwiseGP).

class botorch.models.likelihoods.pairwise.PairwiseLikelihood(max_plate_nesting=1)[source]

Bases: Likelihood, ABC

Pairwise likelihood base class for pairwise preference GP (e.g., PairwiseGP).

Initialized like a gpytorch.likelihoods.Likelihood.

Parameters:

max_plate_nesting (int) – Defaults to 1.

forward(utility, D)[source]

Given the difference in (estimated) utility util_diff = f(v) - f(u), return a Bernoulli distribution object representing the likelihood of the user prefer v over u.

Note that this is not used by the PairwiseGP model,

Parameters:
  • utility (Tensor)

  • D (Tensor)

Return type:

Bernoulli

abstract p(utility, D)[source]

Given the difference in (estimated) utility util_diff = f(v) - f(u), return the probability of the user prefer v over u.

Parameters:
  • utility (Tensor) – A Tensor of shape (batch_size) x n, the utility at MAP point

  • D (Tensor) – D is (batch_size x) m x n matrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.

  • log – if true, return log probability

Return type:

Tensor

log_p(utility, D)[source]

return the log of p

Parameters:
  • utility (Tensor)

  • D (Tensor)

Return type:

Tensor

negative_log_gradient_sum(utility, D)[source]
Calculate the sum of negative log gradient with respect to each item’s latent

utility values. Useful for models using laplace approximation.

Parameters:
  • utility (Tensor) – A Tensor of shape (batch_size x) n, the utility at MAP point

  • D (Tensor) – D is (batch_size x) m x n matrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.

Returns:

A (batch_size x) n Tensor representing the sum of negative log gradient values of the likelihood over all comparisons (i.e., the m dimension) with respect to each item.

Return type:

Tensor

negative_log_hessian_sum(utility, D)[source]
Calculate the sum of negative log hessian with respect to each item’s latent

utility values. Useful for models using laplace approximation.

Parameters:
  • utility (Tensor) – A Tensor of shape (batch_size) x n, the utility at MAP point

  • D (Tensor) – D is (batch_size x) m x n matrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.

Returns:

A (batch_size x) n x n Tensor representing the sum of negative log hessian values of the likelihood over all comparisons (i.e., the m dimension) with respect to each item.

Return type:

Tensor

class botorch.models.likelihoods.pairwise.PairwiseProbitLikelihood(max_plate_nesting=1)[source]

Bases: PairwiseLikelihood

Pairwise likelihood using probit function

Given two items v and u with utilities f(v) and f(u), the probability that we prefer v over u with probability std_normal_cdf((f(v) - f(u))/sqrt(2)). Note that this formulation implicitly assume the noise term is fixed at 1.

Initialized like a gpytorch.likelihoods.Likelihood.

Parameters:

max_plate_nesting (int) – Defaults to 1.

p(utility, D, log=False)[source]

Given the difference in (estimated) utility util_diff = f(v) - f(u), return the probability of the user prefer v over u.

Parameters:
  • utility (Tensor) – A Tensor of shape (batch_size) x n, the utility at MAP point

  • D (Tensor) – D is (batch_size x) m x n matrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.

  • log (bool) – if true, return log probability

Return type:

Tensor

negative_log_gradient_sum(utility, D)[source]
Calculate the sum of negative log gradient with respect to each item’s latent

utility values. Useful for models using laplace approximation.

Parameters:
  • utility (Tensor) – A Tensor of shape (batch_size x) n, the utility at MAP point

  • D (Tensor) – D is (batch_size x) m x n matrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.

Returns:

A (batch_size x) n Tensor representing the sum of negative log gradient values of the likelihood over all comparisons (i.e., the m dimension) with respect to each item.

Return type:

Tensor

negative_log_hessian_sum(utility, D)[source]
Calculate the sum of negative log hessian with respect to each item’s latent

utility values. Useful for models using laplace approximation.

Parameters:
  • utility (Tensor) – A Tensor of shape (batch_size) x n, the utility at MAP point

  • D (Tensor) – D is (batch_size x) m x n matrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.

Returns:

A (batch_size x) n x n Tensor representing the sum of negative log hessian values of the likelihood over all comparisons (i.e., the m dimension) with respect to each item.

Return type:

Tensor

class botorch.models.likelihoods.pairwise.PairwiseLogitLikelihood(max_plate_nesting=1)[source]

Bases: PairwiseLikelihood

Pairwise likelihood using logistic (i.e., sigmoid) function

Given two items v and u with utilities f(v) and f(u), the probability that we prefer v over u with probability sigmoid(f(v) - f(u)). Note that this formulation implicitly assume the beta term in logistic function is fixed at 1.

Initialized like a gpytorch.likelihoods.Likelihood.

Parameters:

max_plate_nesting (int) – Defaults to 1.

log_p(utility, D)[source]

return the log of p

Parameters:
  • utility (Tensor)

  • D (Tensor)

Return type:

Tensor

p(utility, D)[source]

Given the difference in (estimated) utility util_diff = f(v) - f(u), return the probability of the user prefer v over u.

Parameters:
  • utility (Tensor) – A Tensor of shape (batch_size) x n, the utility at MAP point

  • D (Tensor) – D is (batch_size x) m x n matrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.

  • log – if true, return log probability

Return type:

Tensor

negative_log_gradient_sum(utility, D)[source]
Calculate the sum of negative log gradient with respect to each item’s latent

utility values. Useful for models using laplace approximation.

Parameters:
  • utility (Tensor) – A Tensor of shape (batch_size x) n, the utility at MAP point

  • D (Tensor) – D is (batch_size x) m x n matrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.

Returns:

A (batch_size x) n Tensor representing the sum of negative log gradient values of the likelihood over all comparisons (i.e., the m dimension) with respect to each item.

Return type:

Tensor

negative_log_hessian_sum(utility, D)[source]
Calculate the sum of negative log hessian with respect to each item’s latent

utility values. Useful for models using laplace approximation.

Parameters:
  • utility (Tensor) – A Tensor of shape (batch_size) x n, the utility at MAP point

  • D (Tensor) – D is (batch_size x) m x n matrix with all elements being zero in last dimension except at two positions D[…, i] = 1 and D[…, j] = -1 respectively, representing item i is preferred over item j.

Returns:

A (batch_size x) n x n Tensor representing the sum of negative log hessian values of the likelihood over all comparisons (i.e., the m dimension) with respect to each item.

Return type:

Tensor

class botorch.models.likelihoods.sparse_outlier_noise.SparseOutlierGaussianLikelihood(base_noise, dim, outlier_indices=None, rho_prior=None, rho_constraint=None, batch_shape=None, convex_parameterization=True, loo=True)[source]

Bases: _GaussianLikelihoodBase

A likelihood that models the noise of a GP with SparseOutlierNoise, a noise model in the Relevance Pursuit family of models, permitting additional “robust” variance for a small set of outlier data points. Notably, the indices of the outlier data points are inferred during the optimization of the associated log marginal likelihood via the Relevance Pursuit algorithm.

For details, see [Ament2024pursuit] or https://arxiv.org/abs/2410.24222.

NOTE: Letting base_noise also use the non-transformed constraints, will lead to more stable optimization, but is orthogonal implementation-wise. If the base noise is a HomoskedasticNoise, one can pass the non-transformed constraint as the noise_constraint.

Example

>>> base_noise = HomoskedasticNoise(
>>>    noise_constraint=NonTransformedInterval(
>>>        1e-5, 1e-1, initial_value=1e-3
>>>    )
>>> )
>>> likelihood = SparseOutlierGaussianLikelihood(
>>>    base_noise=base_noise,
>>>    dim=X.shape[0],
>>> )
>>> model = SingleTaskGP(train_X=X, train_Y=Y, likelihood=likelihood)
>>> mll = ExactMarginalLogLikelihood(model.likelihood, model)
>>> # NOTE: `likelihood.noise_covar` is the `RelevancePursuitMixin`
>>> sparse_module = likelihood.noise_covar
>>> backward_relevance_pursuit(sparse_module, mll)
Parameters:
  • base_noise (Noise | FixedGaussianNoise) – The base noise model.

  • dim (int) – The number of training observations, which determines the maximum number of data-point-specific noise variances of the noise model.

  • outlier_indices (list[int] | None) – The indices of the outliers.

  • rho_prior (Prior | None) – Prior for self.noise_covar’s rho parameter.

  • rho_constraint (NonTransformedInterval | None) – Constraint for self.noise_covar’s rho parameter. Needs to be a NonTransformedInterval because exact sparsity cannot be represented using smooth transforms like a softplus or sigmoid.

  • batch_shape (Size | None) – The batch shape of the learned noise parameter (default: []).

  • convex_parameterization (bool) – Whether to use the convex parameterization of rho, which generally improves optimization results and is thus recommended.

  • loo (bool) – Whether to use leave-one-out (LOO) update equations that can compute the optimal values of each individual rho, keeping all else equal.

marginal(function_dist, X=None, **kwargs)[source]

Computes a predictive distribution \(p(y^* | \mathbf x^*)\) given either a posterior distribution \(p(\mathbf f | \mathcal D, \mathbf x)\) or a prior distribution \(p(\mathbf f|\mathbf x)\) as input.

With both exact inference and variational inference, the form of \(p(\mathbf f|\mathcal D, \mathbf x)\) or \(p(\mathbf f| \mathbf x)\) should usually be Gaussian. As a result, function_dist should usually be a MultivariateNormal specified by the mean and (co)variance of \(p(\mathbf f|...)\).

Parameters:
  • function_dist (MultivariateNormal) – Distribution for \(f(x)\).

  • args – Additional args (passed to the foward function).

  • kwargs (Any) – Additional kwargs (passed to the foward function).

  • X (Tensor | list[Tensor] | None)

Returns:

The marginal distribution, or samples from it.

Return type:

MultivariateNormal

expected_log_prob(target, input, *params, **kwargs)[source]

(Used by VariationalELBO for variational inference.)

Computes the expected log likelihood, where the expectation is over the GP variational distribution.

\[\sum_{\mathbf x, y} \mathbb{E}_{q\left( f(\mathbf x) \right)} \left[ \log p \left( y \mid f(\mathbf x) \right) \right]\]
Parameters:
  • observations – Values of \(y\).

  • function_dist – Distribution for \(f(x)\).

  • args – Additional args (passed to the foward function).

  • kwargs (Any) – Additional kwargs (passed to the foward function).

  • target (Tensor)

  • input (MultivariateNormal)

  • params (Any)

Return type:

Tensor

class botorch.models.likelihoods.sparse_outlier_noise.SparseOutlierNoise(base_noise, dim, outlier_indices=None, rho_prior=None, rho_constraint=None, batch_shape=None, convex_parameterization=True, loo=True)[source]

Bases: Noise, RelevancePursuitMixin

A noise model in the Relevance Pursuit family of models, permitting additional “robust” variance for a small set of outlier data points. See also SparseOutlierGaussianLikelihood, which leverages this noise model.

For details, see [Ament2024pursuit] or https://arxiv.org/abs/2410.24222.

Example

>>> base_noise = HomoskedasticNoise(
>>>    noise_constraint=NonTransformedInterval(
>>>        1e-5, 1e-1, initial_value=1e-3
>>>    )
>>> )
>>> likelihood = SparseOutlierGaussianLikelihood(
>>>    base_noise=base_noise,
>>>    dim=X.shape[0],
>>> )
>>> model = SingleTaskGP(train_X=X, train_Y=Y, likelihood=likelihood)
>>> mll = ExactMarginalLogLikelihood(model.likelihood, model)
>>> # NOTE: `likelihood.noise_covar` is the `SparseOutlierNoise`
>>> sparse_module = likelihood.noise_covar
>>> backward_relevance_pursuit(sparse_module, mll)
Parameters:
  • base_noise (Noise | FixedGaussianNoise) – The base noise model.

  • dim (int) – The number of training observations, which determines the maximum number of data-point-specific noise variances of the noise model.

  • outlier_indices (list[int] | None) – The indices of the outliers.

  • rho_prior (Prior | None) – Prior for the rho parameter.

  • rho_constraint (NonTransformedInterval | None) – Constraint for the rho parameter. Needs to be a NonTransformedInterval because exact sparsity cannot be represented using smooth transforms like a softplus or sigmoid.

  • batch_shape (Size | None) – The batch shape of the learned noise parameter (default: []).

  • convex_parameterization (bool) – Whether to use the convex parameterization of rho, which generally improves optimization results and is thus recommended.

  • loo (bool) – Whether to use leave-one-out (LOO) update equations that can compute the optimal values of each individual rho, keeping all else equal.

property sparse_parameter: Parameter

The sparse parameter, required to have a single indexing dimension.

set_sparse_parameter(value)[source]

Sets the sparse parameter.

NOTE: We can’t use the property setter @sparse_parameter.setter because of the special way PyTorch treats Parameter types, including custom setters.

Parameters:

value (Parameter)

Return type:

None

property convex_parameterization: bool
property rho: Tensor

Dense representation of the data-point-specific variances, corresponding to the latent self.raw_rho values, which might be represented sparsely or in the convex parameterization. The last dimension is equal to the number of training points self.dim.

NOTE: rho differs from self.sparse_parameter in that the latter returns the the parameter in its sparse representation when self.is_sparse is true, and in its latent convex paramzeterization when self.convex_parameterization is true, while rho always returns the data-point-specific variances, embedded in a dense tensor. The dense representation is used to propagate gradients to the sparse rhos in the support.

Returns:

A batch_shape x self.dim-dim Tensor of robustness variances.

forward(X=None, shape=None, diag_K=None, **kwargs)[source]

Computes the covariance matrix of the sparse outlier noise model.

Parameters:
  • X (Tensor | list[Tensor] | None) – The training inputs, used to determine if the model is applied to the training data, in which case the outlier variances are applied, or not. NOTE: By default, BoTorch passes the transformed training inputs to the likelihood during both training and inference.

  • shape (Size | None) – The shape of the covariance matrix, which is used to broadcast the rho values to the correct shape.

  • diag_K (Tensor | None) – The diagonal of the covariance matrix, which is used to scale the rho values in the convex parameterization.

  • kwargs (Any) – Any additional parameters of the base noise model, same as for GPyTorch’s noise model. Note that this implementation does not support non-kwarg params arguments, which are used in GPyTorch’s noise models.

Returns:

A batch_shape x self.dim-dim Tensor of robustness variances.

Return type:

LinearOperator | Tensor

expansion_objective(mll)[source]

Computes an objective value for all the inactive parameters, i.e. self.sparse_parameter[~self.is_active] since we can’t add already active parameters to the support. This value will be used to select the parameters.

Parameters:

mll (ExactMarginalLogLikelihood) – The marginal likelihood, containing the model to optimize.

Returns:

The expansion objective value for all the inactive parameters.

Return type:

Tensor

Transforms

Outcome Transforms

Outcome transformations for automatically transforming and un-transforming model outputs. Outcome transformations are typically part of a Model and applied (i) within the model constructor to transform the train observations to the model space, and (ii) in the Model.posterior call to untransform the model posterior back to the original space.

References

[eriksson2021scalable]

D. Eriksson, M. Poloczek. Scalable Constrained Bayesian Optimization. International Conference on Artificial Intelligence and Statistics. PMLR, 2021, http://proceedings.mlr.press/v130/eriksson21a.html

class botorch.models.transforms.outcome.OutcomeTransform(*args, **kwargs)[source]

Bases: Module, ABC

Abstract base class for outcome transforms.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(Y, Yvar=None)[source]

Transform the outcomes in a model’s training targets

Parameters:
  • Y (Tensor) – A batch_shape x n x m-dim tensor of training targets.

  • Yvar (Tensor | None) – A batch_shape x n x m-dim tensor of observation noises associated with the training targets (if applicable).

Returns:

  • The transformed outcome observations.

  • The transformed observation noise (if applicable).

Return type:

A two-tuple with the transformed outcomes

subset_output(idcs)[source]

Subset the transform along the output dimension.

This functionality is used to properly treat outcome transformations in the subset_model functionality.

Parameters:

idcs (list[int]) – The output indices to subset the transform to.

Returns:

The current outcome transform, subset to the specified output indices.

Return type:

OutcomeTransform

untransform(Y, Yvar=None)[source]

Un-transform previously transformed outcomes

Parameters:
  • Y (Tensor) – A batch_shape x n x m-dim tensor of transfomred training targets.

  • Yvar (Tensor | None) – A batch_shape x n x m-dim tensor of transformed observation noises associated with the training targets (if applicable).

Returns:

  • The un-transformed outcome observations.

  • The un-transformed observation noise (if applicable).

Return type:

A two-tuple with the un-transformed outcomes

untransform_posterior(posterior)[source]

Un-transform a posterior.

Posteriors with _is_linear=True should return a GPyTorchPosterior when posterior is a GPyTorchPosterior. Posteriors with _is_linear=False likely return a TransformedPosterior instead.

Parameters:

posterior (Posterior) – A posterior in the transformed space.

Returns:

The un-transformed posterior.

Return type:

Posterior

class botorch.models.transforms.outcome.ChainedOutcomeTransform(**transforms)[source]

Bases: OutcomeTransform, ModuleDict

An outcome transform representing the chaining of individual transforms

Chaining of outcome transforms.

Parameters:

transforms (OutcomeTransform) – The transforms to chain. Internally, the names of the kwargs are used as the keys for accessing the individual transforms on the module.

forward(Y, Yvar=None)[source]

Transform the outcomes in a model’s training targets

Parameters:
  • Y (Tensor) – A batch_shape x n x m-dim tensor of training targets.

  • Yvar (Tensor | None) – A batch_shape x n x m-dim tensor of observation noises associated with the training targets (if applicable).

Returns:

  • The transformed outcome observations.

  • The transformed observation noise (if applicable).

Return type:

A two-tuple with the transformed outcomes

subset_output(idcs)[source]

Subset the transform along the output dimension.

Parameters:

idcs (list[int]) – The output indices to subset the transform to.

Returns:

The current outcome transform, subset to the specified output indices.

Return type:

OutcomeTransform

untransform(Y, Yvar=None)[source]

Un-transform previously transformed outcomes

Parameters:
  • Y (Tensor) – A batch_shape x n x m-dim tensor of transfomred training targets.

  • Yvar (Tensor | None) – A batch_shape x n x m-dim tensor of transformed observation noises associated with the training targets (if applicable).

Returns:

  • The un-transformed outcome observations.

  • The un-transformed observation noise (if applicable).

Return type:

A two-tuple with the un-transformed outcomes

untransform_posterior(posterior)[source]

Un-transform a posterior

Parameters:

posterior (Posterior) – A posterior in the transformed space.

Returns:

The un-transformed posterior.

Return type:

Posterior

class botorch.models.transforms.outcome.Standardize(m, outputs=None, batch_shape=(), min_stdv=1e-08)[source]

Bases: OutcomeTransform

Standardize outcomes (zero mean, unit variance).

This module is stateful: If in train mode, calling forward updates the module state (i.e. the mean/std normalizing constants). If in eval mode, calling forward simply applies the standardization using the current module state.

Standardize outcomes (zero mean, unit variance).

Parameters:
  • m (int) – The output dimension.

  • outputs (list[int] | None) – Which of the outputs to standardize. If omitted, all outputs will be standardized.

  • batch_shape (torch.Size) – The batch_shape of the training targets.

  • min_stddv – The minimum standard deviation for which to perform standardization (if lower, only de-mean the data).

  • min_stdv (float)

forward(Y, Yvar=None)[source]

Standardize outcomes.

If the module is in train mode, this updates the module state (i.e. the mean/std normalizing constants). If the module is in eval mode, simply applies the normalization using the module state.

Parameters:
  • Y (Tensor) – A batch_shape x n x m-dim tensor of training targets.

  • Yvar (Tensor | None) – A batch_shape x n x m-dim tensor of observation noises associated with the training targets (if applicable).

Returns:

  • The transformed outcome observations.

  • The transformed observation noise (if applicable).

Return type:

A two-tuple with the transformed outcomes

subset_output(idcs)[source]

Subset the transform along the output dimension.

Parameters:

idcs (list[int]) – The output indices to subset the transform to.

Returns:

The current outcome transform, subset to the specified output indices.

Return type:

OutcomeTransform

untransform(Y, Yvar=None)[source]

Un-standardize outcomes.

Parameters:
  • Y (Tensor) – A batch_shape x n x m-dim tensor of standardized targets.

  • Yvar (Tensor | None) – A batch_shape x n x m-dim tensor of standardized observation noises associated with the targets (if applicable).

Returns:

  • The un-standardized outcome observations.

  • The un-standardized observation noise (if applicable).

Return type:

A two-tuple with the un-standardized outcomes

untransform_posterior(posterior)[source]

Un-standardize the posterior.

Parameters:

posterior (Posterior) – A posterior in the standardized space.

Returns:

The un-standardized posterior. If the input posterior is a GPyTorchPosterior, return a GPyTorchPosterior. Otherwise, return a TransformedPosterior.

Return type:

GPyTorchPosterior | TransformedPosterior

class botorch.models.transforms.outcome.Log(outputs=None)[source]

Bases: OutcomeTransform

Log-transform outcomes.

Useful if the targets are modeled using a (multivariate) log-Normal distribution. This means that we can use a standard GP model on the log-transformed outcomes and un-transform the model posterior of that GP.

Log-transform outcomes.

Parameters:

outputs (list[int] | None) – Which of the outputs to log-transform. If omitted, all outputs will be standardized.

subset_output(idcs)[source]

Subset the transform along the output dimension.

Parameters:

idcs (list[int]) – The output indices to subset the transform to.

Returns:

The current outcome transform, subset to the specified output indices.

Return type:

OutcomeTransform

forward(Y, Yvar=None)[source]

Log-transform outcomes.

Parameters:
  • Y (Tensor) – A batch_shape x n x m-dim tensor of training targets.

  • Yvar (Tensor | None) – A batch_shape x n x m-dim tensor of observation noises associated with the training targets (if applicable).

Returns:

  • The transformed outcome observations.

  • The transformed observation noise (if applicable).

Return type:

A two-tuple with the transformed outcomes

untransform(Y, Yvar=None)[source]

Un-transform log-transformed outcomes

Parameters:
  • Y (Tensor) – A batch_shape x n x m-dim tensor of log-transfomred targets.

  • Yvar (Tensor | None) – A batch_shape x n x m-dim tensor of log- transformed observation noises associated with the training targets (if applicable).

Returns:

  • The exponentiated outcome observations.

  • The exponentiated observation noise (if applicable).

Return type:

A two-tuple with the un-transformed outcomes

untransform_posterior(posterior)[source]

Un-transform the log-transformed posterior.

Parameters:

posterior (Posterior) – A posterior in the log-transformed space.

Returns:

The un-transformed posterior.

Return type:

TransformedPosterior

class botorch.models.transforms.outcome.Power(power, outputs=None)[source]

Bases: OutcomeTransform

Power-transform outcomes.

Useful if the targets are modeled using a (multivariate) power transform of a Normal distribution. This means that we can use a standard GP model on the power-transformed outcomes and un-transform the model posterior of that GP.

Power-transform outcomes.

Parameters:
  • outputs (list[int] | None) – Which of the outputs to power-transform. If omitted, all outputs will be standardized.

  • power (float)

subset_output(idcs)[source]

Subset the transform along the output dimension.

Parameters:

idcs (list[int]) – The output indices to subset the transform to.

Returns:

The current outcome transform, subset to the specified output indices.

Return type:

OutcomeTransform

forward(Y, Yvar=None)[source]

Power-transform outcomes.

Parameters:
  • Y (Tensor) – A batch_shape x n x m-dim tensor of training targets.

  • Yvar (Tensor | None) – A batch_shape x n x m-dim tensor of observation noises associated with the training targets (if applicable).

Returns:

  • The transformed outcome observations.

  • The transformed observation noise (if applicable).

Return type:

A two-tuple with the transformed outcomes

untransform(Y, Yvar=None)[source]

Un-transform power-transformed outcomes

Parameters:
  • Y (Tensor) – A batch_shape x n x m-dim tensor of power-transfomred targets.

  • Yvar (Tensor | None) – A batch_shape x n x m-dim tensor of power-transformed observation noises associated with the training targets (if applicable).

Returns:

  • The un-power transformed outcome observations.

  • The un-power transformed observation noise (if applicable).

Return type:

A two-tuple with the un-transformed outcomes

untransform_posterior(posterior)[source]

Un-transform the power-transformed posterior.

Parameters:

posterior (Posterior) – A posterior in the power-transformed space.

Returns:

The un-transformed posterior.

Return type:

TransformedPosterior

class botorch.models.transforms.outcome.Bilog(outputs=None)[source]

Bases: OutcomeTransform

Bilog-transform outcomes.

The Bilog transform [eriksson2021scalable] is useful for modeling outcome constraints as it magnifies values near zero and flattens extreme values.

Bilog-transform outcomes.

Parameters:

outputs (list[int] | None) – Which of the outputs to Bilog-transform. If omitted, all outputs will be transformed.

subset_output(idcs)[source]

Subset the transform along the output dimension.

Parameters:

idcs (list[int]) – The output indices to subset the transform to.

Returns:

The current outcome transform, subset to the specified output indices.

Return type:

OutcomeTransform

forward(Y, Yvar=None)[source]

Bilog-transform outcomes.

Parameters:
  • Y (Tensor) – A batch_shape x n x m-dim tensor of training targets.

  • Yvar (Tensor | None) – A batch_shape x n x m-dim tensor of observation noises associated with the training targets (if applicable).

Returns:

  • The transformed outcome observations.

  • The transformed observation noise (if applicable).

Return type:

A two-tuple with the transformed outcomes

untransform(Y, Yvar=None)[source]

Un-transform bilog-transformed outcomes

Parameters:
  • Y (Tensor) – A batch_shape x n x m-dim tensor of bilog-transfomred targets.

  • Yvar (Tensor | None) – A batch_shape x n x m-dim tensor of bilog-transformed observation noises associated with the training targets (if applicable).

Returns:

  • The un-transformed outcome observations.

  • The un-transformed observation noise (if applicable).

Return type:

A two-tuple with the un-transformed outcomes

untransform_posterior(posterior)[source]

Un-transform the bilog-transformed posterior.

Parameters:

posterior (Posterior) – A posterior in the bilog-transformed space.

Returns:

The un-transformed posterior.

Return type:

TransformedPosterior

Input Transforms

Input Transformations.

These classes implement a variety of transformations for input parameters including: learned input warping functions, rounding functions, and log transformations. The input transformation is typically part of a Model and applied within the model.forward() method.

class botorch.models.transforms.input.InputTransform(*args, **kwargs)[source]

Bases: Module, ABC

Abstract base class for input transforms.

Properties:
is_one_to_many: A boolean denoting whether the transform produces

multiple values for each input.

transform_on_train: A boolean indicating whether to apply the

transform in train() mode.

transform_on_eval: A boolean indicating whether to apply the

transform in eval() mode.

transform_on_fantasize: A boolean indicating whether to apply

the transform when called from within a fantasize call.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

is_one_to_many: bool = False
transform_on_eval: bool
transform_on_train: bool
transform_on_fantasize: bool
forward(X)[source]

Transform the inputs to a model.

Parameters:

X (Tensor) – A batch_shape x n x d-dim tensor of inputs.

Returns:

A batch_shape x n’ x d-dim tensor of transformed inputs.

Return type:

Tensor

abstract transform(X)[source]

Transform the inputs to a model.

Parameters:

X (Tensor) – A batch_shape x n x d-dim tensor of inputs.

Returns:

A batch_shape x n x d-dim tensor of transformed inputs.

Return type:

Tensor

untransform(X)[source]

Un-transform the inputs to a model.

Parameters:

X (Tensor) – A batch_shape x n x d-dim tensor of transformed inputs.

Returns:

A batch_shape x n x d-dim tensor of un-transformed inputs.

Return type:

Tensor

equals(other)[source]

Check if another input transform is equivalent.

Note: The reason that a custom equals method is defined rather than defining an __eq__ method is because defining an __eq__ method sets the __hash__ method to None. Hashing modules is currently used in pytorch. See https://github.com/pytorch/pytorch/issues/7733.

Parameters:

other (InputTransform) – Another input transform.

Returns:

A boolean indicating if the other transform is equivalent.

Return type:

bool

preprocess_transform(X)[source]

Apply transforms for preprocessing inputs.

The main use cases for this method are 1) to preprocess training data before calling set_train_data and 2) preprocess X_baseline for noisy acquisition functions so that X_baseline is “preprocessed” with the same transformations as the cached training inputs.

Parameters:

X (Tensor) – A batch_shape x n x d-dim tensor of inputs.

Returns:

A batch_shape x n x d-dim tensor of (transformed) inputs.

Return type:

Tensor

class botorch.models.transforms.input.BatchBroadcastedInputTransform(transforms, broadcast_index=-3)[source]

Bases: InputTransform, ModuleDict

An input transform representing a list of transforms to be broadcasted.

A transform list that is broadcasted across a batch dimension specified by broadcast_index. This is allows using a batched Gaussian process model when the input transforms are different for different batch dimensions.

Parameters:
  • transforms (list[InputTransform]) – The transforms to broadcast across the first batch dimension. The transform at position i in the list will be applied to X[i] for a given input tensor X in the forward pass.

  • broadcast_index (int) – The tensor index at which the transforms are broadcasted.

Example

>>> tf1 = Normalize(d=2)
>>> tf2 = InputStandardize(d=2)
>>> tf = BatchBroadcastedTransformList(transforms=[tf1, tf2])
transform(X)[source]

Transform the inputs to a model.

Individual transforms are applied in sequence and results are returned as a batched tensor.

Parameters:

X (Tensor) – A batch_shape x n x d-dim tensor of inputs.

Returns:

A batch_shape x n x d-dim tensor of transformed inputs.

Return type:

Tensor

untransform(X)[source]

Un-transform the inputs to a model.

Un-transforms of the individual transforms are applied in reverse sequence.

Parameters:

X (Tensor) – A batch_shape x n x d-dim tensor of transformed inputs.

Returns:

A batch_shape x n x d-dim tensor of un-transformed inputs.

Return type:

Tensor

equals(other)[source]

Check if another input transform is equivalent.

Parameters:

other (InputTransform) – Another input transform.

Returns:

A boolean indicating if the other transform is equivalent.

Return type:

bool

preprocess_transform(X)[source]

Apply transforms for preprocessing inputs.

The main use cases for this method are 1) to preprocess training data before calling set_train_data and 2) preprocess X_baseline for noisy acquisition functions so that X_baseline is “preprocessed” with the same transformations as the cached training inputs.

Parameters:

X (Tensor) – A batch_shape x n x d-dim tensor of inputs.

Returns:

A batch_shape x n x d-dim tensor of (transformed) inputs.

Return type:

Tensor

class botorch.models.transforms.input.ChainedInputTransform(**transforms)[source]

Bases: InputTransform, ModuleDict

An input transform representing the chaining of individual transforms.

Chaining of input transforms.

Parameters:

transforms (InputTransform) – The transforms to chain. Internally, the names of the kwargs are used as the keys for accessing the individual transforms on the module.

Example

>>> tf1 = Normalize(d=2)
>>> tf2 = Normalize(d=2)
>>> tf = ChainedInputTransform(tf1=tf1, tf2=tf2)
>>> list(tf.keys())
['tf1', 'tf2']
>>> tf["tf1"]
Normalize()
transform(X)[source]

Transform the inputs to a model.

Individual transforms are applied in sequence.

Parameters:

X (Tensor) – A batch_shape x n x d-dim tensor of inputs.

Returns:

A batch_shape x n x d-dim tensor of transformed inputs.

Return type:

Tensor

untransform(X)[source]

Un-transform the inputs to a model.

Un-transforms of the individual transforms are applied in reverse sequence.

Parameters:

X (Tensor) – A batch_shape x n x d-dim tensor of transformed inputs.

Returns:

A batch_shape x n x d-dim tensor of un-transformed inputs.

Return type:

Tensor

equals(other)[source]

Check if another input transform is equivalent.

Parameters:

other (InputTransform) – Another input transform.

Returns:

A boolean indicating if the other transform is equivalent.

Return type:

bool

preprocess_transform(X)[source]

Apply transforms for preprocessing inputs.

The main use cases for this method are 1) to preprocess training data before calling set_train_data and 2) preprocess X_baseline for noisy acquisition functions so that X_baseline is “preprocessed” with the same transformations as the cached training inputs.

Parameters:

X (Tensor) – A batch_shape x n x d-dim tensor of inputs.

Returns:

A batch_shape x n x d-dim tensor of (transformed) inputs.

Return type:

Tensor

class botorch.models.transforms.input.ReversibleInputTransform(*args, **kwargs)[source]

Bases: InputTransform, ABC

An abstract class for a reversible input transform.

Properties:
reverse: A boolean indicating if the functionality of transform

and untransform methods should be swapped.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

reverse: bool
transform(X)[source]

Transform the inputs.

Parameters:

X (Tensor) – A batch_shape x n x d-dim tensor of inputs.

Returns:

A batch_shape x n x d-dim tensor of transformed inputs.

Return type:

Tensor

untransform(X)[source]

Un-transform the inputs.

Parameters:

X (Tensor) – A batch_shape x n x d-dim tensor of inputs.

Returns:

A batch_shape x n x d-dim tensor of un-transformed inputs.

Return type:

Tensor

equals(other)[source]

Check if another input transform is equivalent.

Parameters:

other (InputTransform) – Another input transform.

Returns:

A boolean indicating if the other transform is equivalent.

Return type:

bool

class botorch.models.transforms.input.AffineInputTransform(d, coefficient, offset, indices=None, batch_shape=(), transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True, reverse=False)[source]

Bases: ReversibleInputTransform

Apply affine transformation to input:

output = (input - offset) / coefficient

Parameters:
  • d (int) – The dimension of the input space.

  • coefficient (Tensor) – Tensor of linear coefficients, shape must to be broadcastable with (batch_shape x n x d)-dim input tensors.

  • offset (Tensor) – Tensor of offset coefficients, shape must to be broadcastable with (batch_shape x n x d)-dim input tensors.

  • indices (list[int] | Tensor | None) – The indices of the inputs to transform. If omitted, take all dimensions of the inputs into account. Either a list of ints or a Tensor of type torch.long.

  • batch_shape (torch.Size) – The batch shape of the inputs (assuming input tensors of shape batch_shape x n x d). If provided, perform individual transformation per batch, otherwise uses a single transformation.

  • transform_on_train (bool) – A boolean indicating whether to apply the transform in train() mode. Default: True.

  • transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.

  • transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a fantasize call. Default: True.

  • reverse (bool) – A boolean indicating whether the forward pass should untransform the inputs.

property coefficient: Tensor

The tensor of linear coefficients.

property offset: Tensor

The tensor of offset coefficients.

property learn_coefficients: bool
equals(other)[source]

Check if another input transform is equivalent.

Parameters:

other (InputTransform) – Another input transform.

Returns:

A boolean indicating if the other transform is equivalent.

Return type:

bool

class botorch.models.transforms.input.Normalize(d, indices=None, bounds=None, batch_shape=(), transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True, reverse=False, min_range=1e-08, learn_bounds=None, almost_zero=1e-12)[source]

Bases: AffineInputTransform

Normalize the inputs to the unit cube.

If no explicit bounds are provided this module is stateful: If in train mode, calling forward updates the module state (i.e. the normalizing bounds). If in eval mode, calling forward simply applies the normalization using the current module state.

Normalize the inputs to the unit cube.

Parameters:
  • d (int) – The dimension of the input space.

  • indices (list[int] | Tensor | None) – The indices of the inputs to normalize. If omitted, take all dimensions of the inputs into account.

  • bounds (Tensor | None) – If provided, use these bounds to normalize the inputs. If omitted, learn the bounds in train mode.

  • batch_shape (torch.Size) – The batch shape of the inputs (assuming input tensors of shape batch_shape x n x d). If provided, perform individual normalization per batch, otherwise uses a single normalization.

  • transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: True.

  • transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.

  • transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a fantasize call. Default: True.

  • reverse (bool) – A boolean indicating whether the forward pass should untransform the inputs.

  • min_range (float) – If the range of an input dimension is smaller than min_range, that input dimension will not be normalized. This is equivalent to using bounds of [0, 1] for this dimension, and helps avoid division by zero errors and related numerical issues. See the example below. NOTE: This only applies if learn_bounds=True.

  • learn_bounds (bool | None) – Whether to learn the bounds in train mode. Defaults to False if bounds are provided, otherwise defaults to True.

  • almost_zero (float)

Example

>>> t = Normalize(d=2)
>>> t(torch.tensor([[3., 2.], [3., 6.]]))
... tensor([[3., 2.],
...         [3., 6.]])
>>> t.eval()
... Normalize()
>>> t(torch.tensor([[3.5, 2.8]]))
... tensor([[3.5, 0.2]])
>>> t.bounds
... tensor([[0., 2.],
...         [1., 6.]])
>>> t.coefficient
... tensor([[1., 4.]])
property ranges
property mins
property bounds: Tensor

The bounds used for normalizing the inputs.

property learn_bounds: bool
get_init_args()[source]

Get the arguments necessary to construct an exact copy of the transform.

Return type:

dict[str, Any]

class botorch.models.transforms.input.InputStandardize(d, indices=None, batch_shape=(), transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True, reverse=False, min_std=1e-08)[source]

Bases: AffineInputTransform

Standardize inputs (zero mean, unit variance).

In train mode, calling forward updates the module state (i.e. the mean/std normalizing constants). If in eval mode, calling forward simply applies the standardization using the current module state.

Standardize inputs (zero mean, unit variance).

Parameters:
  • d (int) – The dimension of the input space.

  • indices (list[int] | Tensor | None) – The indices of the inputs to standardize. If omitted, take all dimensions of the inputs into account.

  • batch_shape (torch.Size) – The batch shape of the inputs (asssuming input tensors of shape batch_shape x n x d). If provided, perform individual normalization per batch, otherwise uses a single normalization.

  • transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: True

  • transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True

  • reverse (bool) – A boolean indicating whether the forward pass should untransform the inputs.

  • min_std (float) – If the standard deviation of an input dimension is smaller than min_std, that input dimension will not be standardized. This is equivalent to using a standard deviation of 1.0 and a mean of 0.0 for this dimension, and helps avoid division by zero errors and related numerical issues.

  • transform_on_fantasize (bool)

property stds
property means
class botorch.models.transforms.input.Round(integer_indices=None, categorical_features=None, transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True, approximate=False, tau=0.001)[source]

Bases: InputTransform

A discretization transformation for discrete inputs.

If approximate=False (the default), uses PyTorch’s round.

If approximate=True, a differentiable approximate rounding function is used, with a temperature parameter of tau. This method is a piecewise approximation of a rounding function where each piece is a hyperbolic tangent function.

For integers, this will typically be used in conjunction with normalization as follows:

In eval() mode (i.e. after training), the inputs pass would typically be normalized to the unit cube (e.g. during candidate optimization). 1. These are unnormalized back to the raw input space. 2. The integers are rounded. 3. All values are normalized to the unit cube.

In train() mode, the inputs can either (a) be normalized to the unit cube or (b) provided using their raw values. In the case of (a) transform_on_train should be set to True, so that the normalized inputs are unnormalized before rounding. In the case of (b) transform_on_train should be set to False, so that the raw inputs are rounded and then normalized to the unit cube.

By default, the straight through estimators are used for the gradients as proposed in [Daulton2022bopr]. This transformation supports differentiable approximate rounding (currently only for integers). The rounding function is approximated with a piece-wise function where each piece is a hyperbolic tangent function.

For categorical parameters, the input must be one-hot encoded.

Example

>>> bounds = torch.tensor([[0, 5], [0, 1], [0, 1]]).t()
>>> integer_indices = [0]
>>> categorical_features = {1: 2}
>>> unnormalize_tf = Normalize(
>>>     d=d,
>>>     bounds=bounds,
>>>     transform_on_eval=True,
>>>     transform_on_train=True,
>>>     reverse=True,
>>> )
>>> round_tf = Round(integer_indices, categorical_features)
>>> normalize_tf = Normalize(d=d, bounds=bounds)
>>> tf = ChainedInputTransform(
>>>     tf1=unnormalize_tf, tf2=round_tf, tf3=normalize_tf
>>> )

Initialize transform.

Parameters:
  • integer_indices (list[int] | LongTensor | None) – The indices of the integer inputs.

  • categorical_features (dict[int, int] | None) – A dictionary mapping the starting index of each categorical feature to its cardinality. This assumes that categoricals are one-hot encoded.

  • transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: True.

  • transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.

  • transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a fantasize call. Default: True.

  • approximate (bool) – A boolean indicating whether approximate or exact rounding should be used. Default: False.

  • tau (float) – The temperature parameter for approximate rounding.

transform(X)[source]

Discretize the inputs.

Parameters:

X (Tensor) – A batch_shape x n x d-dim tensor of inputs.

Returns:

A batch_shape x n x d-dim tensor of discretized inputs.

Return type:

Tensor

equals(other)[source]

Check if another input transform is equivalent.

Parameters:

other (InputTransform) – Another input transform.

Returns:

A boolean indicating if the other transform is equivalent.

Return type:

bool

get_init_args()[source]

Get the arguments necessary to construct an exact copy of the transform.

Return type:

dict[str, Any]

class botorch.models.transforms.input.Log10(indices, transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True, reverse=False)[source]

Bases: ReversibleInputTransform

A base-10 log transformation.

Initialize transform.

Parameters:
  • indices (list[int]) – The indices of the inputs to log transform.

  • transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: True.

  • transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.

  • transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a fantasize call. Default: True.

  • reverse (bool) – A boolean indicating whether the forward pass should untransform the inputs.

class botorch.models.transforms.input.Warp(indices, transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True, reverse=False, eps=1e-07, concentration1_prior=None, concentration0_prior=None, batch_shape=None)[source]

Bases: ReversibleInputTransform, Module

A transform that uses learned input warping functions.

Each specified input dimension is warped using the CDF of a Kumaraswamy distribution. Typically, MAP estimates of the parameters of the Kumaraswamy distribution, for each input dimension, are learned jointly with the GP hyperparameters.

TODO: implement support using independent warping functions for each output in batched multi-output and multi-task models.

For now, ModelListGPs should be used to learn independent warping functions for each output.

Initialize transform.

Parameters:
  • indices (list[int]) – The indices of the inputs to warp.

  • transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: True.

  • transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.

  • transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a fantasize call. Default: True.

  • reverse (bool) – A boolean indicating whether the forward pass should untransform the inputs.

  • eps (float) – A small value used to clip values to be in the interval (0, 1).

  • concentration1_prior (Prior | None) – A prior distribution on the concentration1 parameter of the Kumaraswamy distribution.

  • concentration0_prior (Prior | None) – A prior distribution on the concentration0 parameter of the Kumaraswamy distribution.

  • batch_shape (torch.Size | None) – An optional batch shape, for learning independent warping parameters for each batch of inputs. This should match the input batch shape of the model (i.e., train_X.shape[:-2]). NOTE: This is only supported for single-output models.

class botorch.models.transforms.input.AppendFeatures(feature_set=None, f=None, indices=None, fkwargs=None, skip_expand=False, transform_on_train=False, transform_on_eval=True, transform_on_fantasize=False)[source]

Bases: InputTransform

A transform that appends the input with a given set of features either provided beforehand or generated on the fly via a callable.

As an example, the predefined set of features can be used with RiskMeasureMCObjective to optimize risk measures as described in [Cakmak2020risk]. A tutorial notebook implementing the rhoKG acqusition function introduced in [Cakmak2020risk] can be found at https://botorch.org/tutorials/risk_averse_bo_with_environmental_variables.

The steps for using this to obtain samples of a risk measure are as follows:

  • Train a model on (x, w) inputs and the corresponding observations;

  • Pass in an instance of AppendFeatures with the feature_set denoting the samples of W as the input_transform to the trained model;

  • Call posterior(…).rsample(…) on the model with x inputs only to get the joint posterior samples over (x, w)`s, where the `w`s come from the `feature_set;

  • Pass these posterior samples through the RiskMeasureMCObjective of choice to get the samples of the risk measure.

Note: The samples of the risk measure obtained this way are in general biased since the feature_set does not fully represent the distribution of the environmental variable.

Possible examples for using a callable include statistical models that are built on PyTorch, built-in mathematical operations such as torch.sum, or custom scripted functions. By this, this input transform allows for advanced feature engineering and transfer learning models within the optimization loop.

Example

>>> # We consider 1D `x` and 1D `w`, with `W` having a
>>> # uniform distribution over [0, 1]
>>> model = SingleTaskGP(
...     train_X=torch.rand(10, 2),
...     train_Y=torch.randn(10, 1),
...     input_transform=AppendFeatures(feature_set=torch.rand(10, 1))
... )
>>> mll = ExactMarginalLogLikelihood(model.likelihood, model)
>>> fit_gpytorch_mll(mll)
>>> test_x = torch.rand(3, 1)
>>> # `posterior_samples` is a `10 x 30 x 1`-dim tensor
>>> posterior_samples = model.posterior(test_x).rsamples(torch.size([10]))
>>> risk_measure = VaR(alpha=0.8, n_w=10)
>>> # `risk_measure_samples` is a `10 x 3`-dim tensor of samples of the
>>> # risk measure VaR
>>> risk_measure_samples = risk_measure(posterior_samples)

Append feature_set to each input or generate a set of features to append on the fly via a callable.

Parameters:
  • feature_set (Tensor | None) – An n_f x d_f-dim tensor denoting the features to be appended to the inputs. Default: None.

  • f (Callable[[Tensor], Tensor] | None) – A callable mapping a batch_shape x q x d-dim input tensor X to a batch_shape x q x n_f x d_f-dimensional output tensor. Default: None.

  • indices (list[int] | None) – List of indices denoting the indices of the features to be passed into f. Per default all features are passed to f. Default: None.

  • fkwargs (dict[str, Any] | None) – Dictionary of keyword arguments passed to the callable f. Default: None.

  • skip_expand (bool) – A boolean indicating whether to expand the input tensor before appending features. This is intended for use with an InputPerturbation. If True, the input tensor will be expected to be of shape batch_shape x (q * n_f) x d. Not implemented in combination with a callable.

  • transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: False.

  • transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.

  • transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a fantasize call. Default: False.

is_one_to_many: bool = True
transform(X)[source]

Transform the inputs by appending feature_set to each input or by generating a set of features to be appended on the fly via a callable.

For each 1 x d-dim element in the input tensor, this will produce an n_f x (d + d_f)-dim tensor with feature_set appended as the last d_f dimensions. For a generic batch_shape x q x d-dim X, this translates to a batch_shape x (q * n_f) x (d + d_f)-dim output, where the values corresponding to X[…, i, :] are found in output[…, i * n_f: (i + 1) * n_f, :].

Note: Adding the feature_set on the q-batch dimension is necessary to avoid introducing additional bias by evaluating the inputs on independent GP sample paths.

Parameters:

X (Tensor) – A batch_shape x q x d-dim tensor of inputs. If self.skip_expand is True, then X should be of shape batch_shape x (q * n_f) x d, typically obtained by passing a batch_shape x q x d shape input through an InputPerturbation with n_f perturbation values.

Returns:

A batch_shape x (q * n_f) x (d + d_f)-dim tensor of appended inputs.

Return type:

Tensor

class botorch.models.transforms.input.InteractionFeatures(indices=None)[source]

Bases: AppendFeatures

A transform that appends the first-order interaction terms $x_i * x_j, i < j$, for all or a subset of the input variables.

Initializes the InteractionFeatures transform.

Parameters:

indices (list[int] | None) – Indices of the subset of dimensions to compute interaction features on.

class botorch.models.transforms.input.FilterFeatures(feature_indices, transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True)[source]

Bases: InputTransform

A transform that filters the input with a given set of features indices.

As an example, this can be used in a multiobjective optimization with ModelListGP in which the specific models only share subsets of features (feature selection). A reason could be that it is known that specific features do not have any impact on a specific objective but they need to be included in the model for another one.

Filter features from a model.

Parameters:
  • feature_set – An one-dim tensor denoting the indices of the features to be kept and fed to the model.

  • transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: True.

  • transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.

  • transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a fantasize call. Default: True.

  • feature_indices (Tensor)

transform(X)[source]

Transform the inputs by keeping only the in feature_indices specified feature indices and filtering out the others.

Parameters:

X (Tensor) – A batch_shape x q x d-dim tensor of inputs.

Returns:

A batch_shape x q x e-dim tensor of filtered inputs,

where e is the length of feature_indices.

Return type:

Tensor

equals(other)[source]

Check if another input transform is equivalent.

Parameters:

other (InputTransform) – Another input transform

Returns:

A boolean indicating if the other transform is equivalent.

Return type:

bool

class botorch.models.transforms.input.InputPerturbation(perturbation_set, bounds=None, indices=None, multiplicative=False, transform_on_train=False, transform_on_eval=True, transform_on_fantasize=False)[source]

Bases: InputTransform

A transform that adds the set of perturbations to the given input.

Similar to AppendFeatures, this can be used with RiskMeasureMCObjective to optimize risk measures. See AppendFeatures for additional discussion on optimizing risk measures.

A tutorial notebook using this with qNoisyExpectedImprovement can be found at https://botorch.org/tutorials/risk_averse_bo_with_input_perturbations.

Add perturbation_set to each input.

Parameters:
  • perturbation_set (Tensor | Callable[[Tensor], Tensor]) – An n_p x d-dim tensor denoting the perturbations to be added to the inputs. Alternatively, this can be a callable that returns batch x n_p x d-dim tensor of perturbations for input of shape batch x d. This is useful for heteroscedastic perturbations.

  • bounds (Tensor | None) – A 2 x d-dim tensor of lower and upper bounds for each column of the input. If given, the perturbed inputs will be clamped to these bounds.

  • indices (list[int] | None) – A list of indices specifying a subset of inputs on which to apply the transform. Note that len(indices) should be equal to the second dimension of perturbation_set and bounds. The dimensionality of the input X.shape[-1] can be larger if we only transform a subset.

  • multiplicative (bool) – A boolean indicating whether the input perturbations are additive or multiplicative. If True, inputs will be multiplied with the perturbations.

  • transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: False.

  • transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.

  • transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a fantasize call. Default: False.

is_one_to_many: bool = True
transform(X)[source]

Transform the inputs by adding perturbation_set to each input.

For each 1 x d-dim element in the input tensor, this will produce an n_p x d-dim tensor with the perturbation_set added to the input. For a generic batch_shape x q x d-dim X, this translates to a batch_shape x (q * n_p) x d-dim output, where the values corresponding to X[…, i, :] are found in output[…, i * n_w: (i + 1) * n_w, :].

Note: Adding the perturbation_set on the q-batch dimension is necessary to avoid introducing additional bias by evaluating the inputs on independent GP sample paths.

Parameters:

X (Tensor) – A batch_shape x q x d-dim tensor of inputs.

Returns:

A batch_shape x (q * n_p) x d-dim tensor of perturbed inputs.

Return type:

Tensor

property batch_shape

Returns a shape tuple such that subset_transform pre-allocates a (b x n_p x n x d) - dim tensor, where b is the batch shape of the input X of the transform and n_p is the number of perturbations. NOTE: this function is dependent on calling _expanded_perturbations(X) because n_p is inaccessible otherwise if perturbation_set is a function.

class botorch.models.transforms.input.OneHotToNumeric(dim, categorical_features=None, transform_on_train=True, transform_on_eval=True, transform_on_fantasize=True)[source]

Bases: InputTransform

Transform categorical parameters from a one-hot to a numeric representation.

Initialize.

Parameters:
  • dim (int) – The dimension of the one-hot-encoded input.

  • categorical_features (dict[int, int] | None) – A dictionary mapping the starting index of each categorical feature to its cardinality. This assumes that categoricals are one-hot encoded.

  • transform_on_train (bool) – A boolean indicating whether to apply the transforms in train() mode. Default: False.

  • transform_on_eval (bool) – A boolean indicating whether to apply the transform in eval() mode. Default: True.

  • transform_on_fantasize (bool) – A boolean indicating whether to apply the transform when called from within a fantasize call. Default: False.

Returns:

A batch_shape x n x d’-dim tensor of where the one-hot encoded categoricals are transformed to integer representation.

transform(X)[source]

Transform the categorical inputs into integer representation.

Parameters:

X (Tensor) – A batch_shape x n x d-dim tensor of inputs.

Returns:

A batch_shape x n x d’-dim tensor of where the one-hot encoded categoricals are transformed to integer representation.

Return type:

Tensor

untransform(X)[source]

Transform the categoricals from integer representation to one-hot.

Parameters:

X (Tensor) – A batch_shape x n x d’-dim tensor of transformed inputs, where the categoricals are represented as integers.

Returns:

A batch_shape x n x d-dim tensor of inputs, where the categoricals have been transformed to one-hot representation.

Return type:

Tensor

equals(other)[source]

Check if another input transform is equivalent.

Parameters:

other (InputTransform) – Another input transform.

Returns:

A boolean indicating if the other transform is equivalent.

Return type:

bool

Transform Factory Methods

botorch.models.transforms.factory.get_rounding_input_transform(one_hot_bounds, integer_indices=None, categorical_features=None, initialization=False, return_numeric=False, approximate=False)[source]

Get a rounding input transform.

The rounding function will take inputs from the unit cube, unnormalize the integers raw search space, round the inputs, and normalize them back to the unit cube.

Categoricals are assumed to be one-hot encoded. Integers are currently assumed to be contiguous ranges (e.g. [1,2,3] and not [1,5,7]).

TODO: support non-contiguous sets of integers by modifying the rounding function.

Parameters:
  • one_hot_bounds (Tensor) – The raw search space bounds where categoricals are encoded in one-hot representation and the integer parameters are not normalized.

  • integer_indices (list[int] | None) – The indices of the integer parameters.

  • categorical_features (dict[int, int] | None) – A dictionary mapping indices to cardinalities for the categorical features.

  • initialization (bool) – A boolean indicating whether this exact rounding function is for initialization. For initialization, the bounds for are expanded such that the end point of a range is selected with same probability that an interior point is selected, after rounding.

  • return_numeric (bool) – A boolean indicating whether to return numeric or one-hot encoded categoricals. Returning a nummeric representation is helpful if the downstream code (e.g. kernel) expects a numeric representation of the categoricals.

  • approximate (bool) – A boolean indicating whether to use an approximate rounding function.

Returns:

The rounding function ChainedInputTransform.

Return type:

ChainedInputTransform

Transform Utilities

botorch.models.transforms.utils.lognorm_to_norm(mu, Cov)[source]

Compute mean and covariance of a MVN from those of the associated log-MVN

If Y is log-normal with mean mu_ln and covariance Cov_ln, then X ~ N(mu_n, Cov_n) with

Cov_n_{ij} = log(1 + Cov_ln_{ij} / (mu_ln_{i} * mu_n_{j})) mu_n_{i} = log(mu_ln_{i}) - 0.5 * log(1 + Cov_ln_{ii} / mu_ln_{i}**2)

Parameters:
  • mu (Tensor) – A batch_shape x n mean vector of the log-Normal distribution.

  • Cov (Tensor) – A batch_shape x n x n covariance matrix of the log-Normal distribution.

Returns:

  • The batch_shape x n mean vector of the Normal distribution

  • The batch_shape x n x n covariance matrix of the Normal distribution

Return type:

A two-tuple containing

botorch.models.transforms.utils.norm_to_lognorm(mu, Cov)[source]

Compute mean and covariance of a log-MVN from its MVN sufficient statistics

If X ~ N(mu, Cov) and Y = exp(X), then Y is log-normal with

mu_ln_{i} = exp(mu_{i} + 0.5 * Cov_{ii}) Cov_ln_{ij} = exp(mu_{i} + mu_{j} + 0.5 * (Cov_{ii} + Cov_{jj})) * (exp(Cov_{ij}) - 1)

Parameters:
  • mu (Tensor) – A batch_shape x n mean vector of the Normal distribution.

  • Cov (Tensor) – A batch_shape x n x n covariance matrix of the Normal distribution.

Returns:

  • The batch_shape x n mean vector of the log-Normal distribution.

  • The batch_shape x n x n covariance matrix of the log-Normal

    distribution.

Return type:

A two-tuple containing

botorch.models.transforms.utils.norm_to_lognorm_mean(mu, var)[source]

Compute mean of a log-MVN from its MVN marginals

Parameters:
  • mu (Tensor) – A batch_shape x n mean vector of the Normal distribution.

  • var (Tensor) – A batch_shape x n variance vectorof the Normal distribution.

Returns:

The batch_shape x n mean vector of the log-Normal distribution.

Return type:

Tensor

botorch.models.transforms.utils.norm_to_lognorm_variance(mu, var)[source]

Compute variance of a log-MVN from its MVN marginals

Parameters:
  • mu (Tensor) – A batch_shape x n mean vector of the Normal distribution.

  • var (Tensor) – A batch_shape x n variance vectorof the Normal distribution.

Returns:

The batch_shape x n variance vector of the log-Normal distribution.

Return type:

Tensor

botorch.models.transforms.utils.expand_and_copy_tensor(X, batch_shape)[source]

Expand and copy X according to batch_shape.

Parameters:
  • X (Tensor) – A input_batch_shape x n x d-dim tensor of inputs.

  • batch_shape (Size) – The new batch shape.

Returns:

A new_batch_shape x n x d-dim tensor of inputs, where new_batch_shape is input_batch_shape against batch_shape.

Return type:

Tensor

botorch.models.transforms.utils.subset_transform(transform)[source]

Decorator of an input transform function to separate out indexing logic.

botorch.models.transforms.utils.interaction_features(X)[source]

Computes the interaction features between the inputs.

Parameters:
  • X (Tensor) – A batch_shape x q x d-dim tensor of inputs.

  • indices – The input dimensions to generate interaction features for.

Returns:

A n x q x 1 x (d * (d-1) / 2))-dim tensor of interaction features.

Return type:

Tensor

Utilities

GPyTorch Module Constructors

Pre-packaged kernels for bayesian optimization, including a Scale/Matern kernel that is well-suited to low-dimensional high-noise problems, and a dimension-agnostic RBF kernel without outputscale.

References:

[Hvarfner2024vanilla] (1,2,3)

C. Hvarfner, E. O. Hellsten, L. Nardi, Vanilla Bayesian Optimization Performs Great in High Dimensions. In International Conference on Machine Learning, 2024.

botorch.models.utils.gpytorch_modules.get_matern_kernel_with_gamma_prior(ard_num_dims, batch_shape=None)[source]

Constructs the Scale-Matern kernel that is used by default by several models. This uses a Gamma(3.0, 6.0) prior for the lengthscale and a Gamma(2.0, 0.15) prior for the output scale.

Parameters:
  • ard_num_dims (int)

  • batch_shape (Size | None)

Return type:

ScaleKernel

botorch.models.utils.gpytorch_modules.get_gaussian_likelihood_with_gamma_prior(batch_shape=None)[source]

Constructs the GaussianLikelihood that is used by default by several models. This uses a Gamma(1.1, 0.05) prior and constrains the noise level to be greater than MIN_INFERRED_NOISE_LEVEL (=1e-4).

Parameters:

batch_shape (Size | None)

Return type:

GaussianLikelihood

botorch.models.utils.gpytorch_modules.get_gaussian_likelihood_with_lognormal_prior(batch_shape=None)[source]

Return Gaussian likelihood with a LogNormal(-4.0, 1.0) prior. This prior is based on [Hvarfner2024vanilla].

Parameters:

batch_shape (Size | None) – Batch shape for the likelihood.

Returns:

GaussianLikelihood with LogNormal(-4.0, 1.0) prior and constrains the noise level to be greater than MIN_INFERRED_NOISE_LEVEL (=1e-4).

Return type:

GaussianLikelihood

botorch.models.utils.gpytorch_modules.get_covar_module_with_dim_scaled_prior(ard_num_dims, batch_shape=None, use_rbf_kernel=True, active_dims=None)[source]

Returns an RBF or Matern kernel with priors from [Hvarfner2024vanilla].

Parameters:
  • ard_num_dims (int) – Number of feature dimensions for ARD.

  • batch_shape (Size | None) – Batch shape for the covariance module.

  • use_rbf_kernel (bool) – Whether to use an RBF kernel. If False, uses a Matern kernel.

  • active_dims (Sequence[int] | None) – The set of input dimensions to compute the covariances on. By default, the covariance is computed using the full input tensor. Set this if you’d like to ignore certain dimensions.

Returns:

A Kernel constructed according to the given arguments. The prior is constrained to have lengthscales larger than 0.025 for numerical stability.

Return type:

MaternKernel | RBFKernel

Model Conversion

Utilities for converting between different models.

botorch.models.converter.model_list_to_batched(model_list)[source]

Convert a ModelListGP to a BatchedMultiOutputGPyTorchModel.

Parameters:

model_list (ModelListGP) – The ModelListGP to be converted to the appropriate BatchedMultiOutputGPyTorchModel. All sub-models must be of the same type and have the shape (batch shape and number of training inputs).

Returns:

The model converted into a BatchedMultiOutputGPyTorchModel.

Return type:

BatchedMultiOutputGPyTorchModel

Example

>>> list_gp = ModelListGP(gp1, gp2)
>>> batch_gp = model_list_to_batched(list_gp)
botorch.models.converter.set_attribute(obj, attr, val)[source]

Like setattr but works with hierarchical attribute specification. E.g. if obj=Zoo(), and attr=”tiger.age”, set_attribute(obj, attr, 3), would set the Zoo’s tiger’s age to three.

Parameters:

attr (str)

botorch.models.converter.get_attribute(obj, attr)[source]

Like getattr but works with hierarchical attribute specification. E.g. if obj=Zoo(), and attr=”tiger.age”, get_attribute(obj, attr), would return the Zoo’s tiger’s age.

Parameters:

attr (str)

botorch.models.converter.batched_to_model_list(batch_model)[source]

Convert a BatchedMultiOutputGPyTorchModel to a ModelListGP.

Parameters:

batch_model (BatchedMultiOutputGPyTorchModel) – The BatchedMultiOutputGPyTorchModel to be converted to a ModelListGP.

Returns:

The model converted into a ModelListGP.

Return type:

ModelListGP

Example

>>> train_X = torch.rand(5, 2)
>>> train_Y = torch.rand(5, 2)
>>> batch_gp = SingleTaskGP(train_X, train_Y)
>>> list_gp = batched_to_model_list(batch_gp)
botorch.models.converter.batched_multi_output_to_single_output(batch_mo_model)[source]

Convert a model from batched multi-output to a batched single-output.

Note: the underlying GPyTorch GP does not change. The GPyTorch GP’s batch_shape (referred to as _aug_batch_shape) is still _input_batch_shape x num_outputs. The only things that change are the attributes of the BatchedMultiOutputGPyTorchModel that are responsible the internal accounting of the number of outputs: namely, num_outputs, _input_batch_shape, and _aug_batch_shape. Initially for the batched MO models these are: num_outputs = m, _input_batch_shape = train_X.batch_shape, and _aug_batch_shape = train_X.batch_shape + torch.Size([num_outputs]). In the new SO model, these are: num_outputs = 1, _input_batch_shape = train_X.batch_shape + torch.Size([num_outputs]), and _aug_batch_shape = train_X.batch_shape + torch.Size([num_outputs]).

This is a (hopefully) temporary measure until multi-output MVNs with independent outputs have better support in GPyTorch (see https://github.com/cornellius-gp/gpytorch/pull/1083).

Parameters:
Returns:

The model converted into a batch single-output model.

Return type:

BatchedMultiOutputGPyTorchModel

Example

>>> train_X = torch.rand(5, 2)
>>> train_Y = torch.rand(5, 2)
>>> batch_mo_gp = SingleTaskGP(train_X, train_Y, outcome_transform=None)
>>> batch_so_gp = batched_multi_output_to_single_output(batch_mo_gp)

Inducing Point Allocators

Functionality for allocating the inducing points of sparse Gaussian process models.

References

[chen2018dpp] (1,2)

Laming Chen and Guoxin Zhang and Hanning Zhou, Fast greedy MAP inference for determinantal point process to improve recommendation diversity, Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, https://arxiv.org/abs/1709.05135.

class botorch.models.utils.inducing_point_allocators.InducingPointAllocator[source]

Bases: ABC

This class provides functionality to initialize the inducing point locations of an inducing point-based model, e.g. a SingleTaskVariationalGP.

allocate_inducing_points(inputs, covar_module, num_inducing, input_batch_shape)[source]

Initialize the num_inducing inducing point locations according to a specific initialization strategy. todo say something about quality

Parameters:
  • inputs (Tensor) – A (*batch_shape, n, d)-dim input data tensor.

  • covar_module (Module) – GPyTorch Module returning a LinearOperator kernel matrix.

  • num_inducing (int) – The maximun number (m) of inducing points (m <= n).

  • input_batch_shape (Size) – The non-task-related batch shape.

Returns:

A (*batch_shape, m, d)-dim tensor of inducing point locations.

Return type:

Tensor

class botorch.models.utils.inducing_point_allocators.QualityFunction[source]

Bases: ABC

A function that scores inputs with respect to a specific criterion.

class botorch.models.utils.inducing_point_allocators.UnitQualityFunction[source]

Bases: QualityFunction

A function returning ones for each element. Using this quality function for inducing point allocation corresponds to allocating inducing points with the sole aim of minimizing predictive variance, i.e. the approach of [burt2020svgp].

class botorch.models.utils.inducing_point_allocators.ExpectedImprovementQualityFunction(model, maximize)[source]

Bases: QualityFunction

A function measuring the quality of input points as their expected improvement with respect to a conservative baseline. Expectations are according to the model from the previous BO step. See [moss2023ipa] for details and justification.

Parameters:
  • model (Model) – The model fitted during the previous BO step. For now, this must be a single task model (i.e. num_outputs=1).

  • maximize (bool) – Set True if we are performing function maximization, else set False.

class botorch.models.utils.inducing_point_allocators.GreedyVarianceReduction[source]

Bases: InducingPointAllocator

The inducing point allocator proposed by [burt2020svgp], that greedily chooses inducing point locations with maximal (conditional) predictive variance.

class botorch.models.utils.inducing_point_allocators.GreedyImprovementReduction(model, maximize)[source]

Bases: InducingPointAllocator

An inducing point allocator that greedily chooses inducing points with large predictive variance and that are in promising regions of the search space (according to the model form the previous BO step), see [moss2023ipa].

Parameters:
  • model (Model) – The model fitted during the previous BO step.

  • maximize (bool) – Set True if we are performing function maximization, else set False.

botorch.models.utils.inducing_point_allocators._pivoted_cholesky_init(train_inputs, kernel_matrix, max_length, quality_scores, epsilon=1e-06)[source]

A pivoted Cholesky initialization method for the inducing points, originally proposed in [burt2020svgp] with the algorithm itself coming from [chen2018dpp]. Code is a PyTorch version from [chen2018dpp], based on https://github.com/laming-chen/fast-map-dpp/blob/master/dpp.py but with a small modification to allow the underlying DPP to be defined through its diversity-quality decomposition,as discussed by [moss2023ipa]. This method returns a greedy approximation of the MAP estimate of the specified DPP, i.e. its returns a set of points that are highly diverse (according to the provided kernel_matrix) and have high quality (according to the provided quality_scores).

Parameters:
  • train_inputs (Tensor) – training inputs (of shape n x d)

  • kernel_matrix (Tensor | LinearOperator) – kernel matrix on the training inputs

  • max_length (int) – number of inducing points to initialize

  • quality_scores (Tensor) – scores representing the quality of each candidate input (of shape [n])

  • epsilon (float) – numerical jitter for stability.

Returns:

max_length x d tensor of the training inputs corresponding to the top max_length pivots of the training kernel matrix

Return type:

Tensor

Other Utilties

Assorted helper methods and objects for working with BoTorch models.

botorch.models.utils.assorted.multioutput_to_batch_mode_transform(train_X, train_Y, num_outputs, train_Yvar=None)[source]

Transforms training inputs for a multi-output model.

Used for multi-output models that internally are represented by a batched single output model, where each output is modeled as an independent batch.

Parameters:
  • train_X (Tensor) – A n x d or input_batch_shape x n x d (batch mode) tensor of training features.

  • train_Y (Tensor) – A n x m or target_batch_shape x n x m (batch mode) tensor of training observations.

  • num_outputs (int) – number of outputs

  • train_Yvar (Tensor | None) – A n x m or target_batch_shape x n x m tensor of observed measurement noise.

Returns:

3-element tuple containing

  • A input_batch_shape x m x n x d tensor of training features.

  • A target_batch_shape x m x n tensor of training observations.

  • A target_batch_shape x m x n tensor observed measurement noise.

Return type:

tuple[Tensor, Tensor, Tensor | None]

botorch.models.utils.assorted.add_output_dim(X, original_batch_shape)[source]

Insert the output dimension at the correct location.

The trailing batch dimensions of X must match the original batch dimensions of the training inputs, but can also include extra batch dimensions.

Parameters:
  • X (Tensor) – A (new_batch_shape) x (original_batch_shape) x n x d tensor of features.

  • original_batch_shape (Size) – the batch shape of the model’s training inputs.

Returns:

2-element tuple containing

  • A (new_batch_shape) x (original_batch_shape) x m x n x d tensor of

    features.

  • The index corresponding to the output dimension.

Return type:

tuple[Tensor, int]

botorch.models.utils.assorted.check_no_nans(Z)[source]

Check that tensor does not contain NaN values.

Raises an InputDataError if Z contains NaN values.

Parameters:

Z (Tensor) – The input tensor.

Return type:

None

botorch.models.utils.assorted.check_min_max_scaling(X, strict=False, atol=0.01, raise_on_fail=False, ignore_dims=None)[source]

Check that tensor is normalized to the unit cube.

Parameters:
  • X (Tensor) – A batch_shape x n x d input tensor. Typically the training inputs of a model.

  • strict (bool) – If True, require X to be scaled to the unit cube (rather than just to be contained within the unit cube).

  • atol (float) – The tolerance for the boundary check. Only used if strict=True.

  • raise_on_fail (bool) – If True, raise an exception instead of a warning.

  • ignore_dims (list[int] | None) – Subset of dimensions where the min-max scaling check is omitted.

Return type:

None

botorch.models.utils.assorted.check_standardization(Y, atol_mean=0.01, atol_std=0.01, raise_on_fail=False)[source]

Check that tensor is standardized (zero mean, unit variance).

Parameters:
  • Y (Tensor) – The input tensor of shape batch_shape x n x m. Typically the train targets of a model. Standardization is checked across the n-dimension.

  • atol_mean (float) – The tolerance for the mean check.

  • atol_std (float) – The tolerance for the std check.

  • raise_on_fail (bool) – If True, raise an exception instead of a warning.

Return type:

None

botorch.models.utils.assorted.validate_input_scaling(train_X, train_Y, train_Yvar=None, raise_on_fail=False, ignore_X_dims=None)[source]

Helper function to validate input data to models.

Parameters:
  • train_X (Tensor) – A n x d or batch_shape x n x d (batch mode) tensor of training features.

  • train_Y (Tensor) – A n x m or batch_shape x n x m (batch mode) tensor of training observations.

  • train_Yvar (Tensor | None) – A batch_shape x n x m or batch_shape x n x m (batch mode) tensor of observed measurement noise.

  • raise_on_fail (bool) – If True, raise an error instead of emitting a warning (only for normalization/standardization checks, an error is always raised if NaN values are present).

  • ignore_X_dims (list[int] | None) – For this subset of dimensions from {1, …, d}, ignore the min-max scaling check.

Return type:

None

This function is typically called inside the constructor of standard BoTorch models. It validates the following: (i) none of the inputs contain NaN values (ii) the training data (train_X) is normalized to the unit cube for all dimensions except those in ignore_X_dims. (iii) the training targets (train_Y) are standardized (zero mean, unit var) No checks (other than the NaN check) are performed for observed variances (train_Yvar) at this point.

botorch.models.utils.assorted.mod_batch_shape(module, names, b)[source]

Recursive helper to modify gpytorch modules’ batch shape attribute.

Modifies the module in-place.

Parameters:
  • module (Module) – The module to be modified.

  • names (list[str]) – The list of names to access the attribute. If the full name of the module is “module.sub_module.leaf_module”, this will be [“sub_module”, “leaf_module”].

  • b (int) – The new size of the last element of the module’s batch_shape attribute.

Return type:

None

botorch.models.utils.assorted.gpt_posterior_settings()[source]

Context manager for settings used for computing model posteriors.

botorch.models.utils.assorted.detect_duplicates(X, rtol=0, atol=1e-08)[source]

Returns an iterator over index pairs (duplicate index, original index) for all duplicate entries of X. Supporting 2-d Tensor only.

Parameters:
  • X (Tensor) – the datapoints tensor with potential duplicated entries

  • rtol (float) – relative tolerance

  • atol (float) – absolute tolerance

Return type:

Iterator[tuple[int, int]]

botorch.models.utils.assorted.consolidate_duplicates(X, Y, rtol=0.0, atol=1e-08)[source]

Drop duplicated Xs and update the indices tensor Y accordingly. Supporting 2d Tensor only as in batch mode block design is not guaranteed.

Parameters:
  • X (Tensor) – the datapoints tensor

  • Y (Tensor) – the index tensor to be updated (e.g., pairwise comparisons)

  • rtol (float) – relative tolerance

  • atol (float) – absolute tolerance

Returns:

the consolidated X consolidated_Y: the consolidated Y (e.g., pairwise comparisons indices) new_indices: new index of each original item in X, a tensor of size X.shape[-2]

Return type:

consolidated_X

class botorch.models.utils.assorted.fantasize(state=True)[source]

Bases: _Flag

A flag denoting whether we are currently in a fantasize context.

Parameters:

state (bool)