botorch.models

Abstract Model API

Model

class botorch.models.model.Model[source]

Abstract base class for BoTorch models.

condition_on_observations(X, Y, **kwargs)[source]

Condition the model on new observations.

Parameters:
  • X (Tensor) – A batch_shape x m x d-dim Tensor, where d is the dimension of the feature space, m is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).
  • Y (Tensor) – A batch_shape’ x m x (o)-dim Tensor, where o is the number of model outputs, m is the number of points per batch, and batch_shape’ is the batch shape of the observations. batch_shape’ must be broadcastable to batch_shape using standard broadcasting semantics. If Y has fewer batch dimensions than X, it is assumed that the missing batch dimensions are the same for all Y.
Return type:

Model

Returns:

A Model object of the same type, representing the original model conditioned on the new observations (X, Y) (and possibly noise observations passed in via kwargs).

fantasize(X, sampler, observation_noise=True, **kwargs)[source]

Construct a fantasy model.

Constructs a fantasy model in the following fashion: (1) compute the model posterior at X (including observation noise if observation_noise=True). (2) sample from this posterior (using sampler) to generate “fake” observations. (3) condition the model on the new fake observations.

Parameters:
  • X (Tensor) – A batch_shape x m x d-dim Tensor, where d is the dimension of the feature space, m is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).
  • sampler (MCSampler) – The sampler used for sampling from the posterior at X.
  • observation_noise (bool) – If True, include observation noise.
Return type:

Model

Returns:

The constructed fantasy model.

posterior(X, output_indices=None, observation_noise=False, **kwargs)[source]

Computes the posterior over model outputs at the provided points.

Parameters:
  • X (Tensor) – A b x q x d-dim Tensor, where d is the dimension of the feature space, q is the number of points considered jointly, and b is the batch dimension.
  • output_indices (Optional[List[int]]) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.
  • observation_noise (bool) – If True, add observation noise to the posterior.
Return type:

Posterior

Returns:

A Posterior object, representing a batch of b joint distributions over q points and o outputs each.

GPyTorchModel

class botorch.models.gpytorch.GPyTorchModel[source]

Abstract base class for models based on GPyTorch models.

The easiest way to use this is to subclass a model from a GPyTorch model class (e.g. an ExactGP) and this GPyTorchModel. See e.g. SingleTaskGP.

condition_on_observations(X, Y, **kwargs)[source]

Condition the model on new observations.

Parameters:
  • X (Tensor) – A batch_shape x n x d-dim Tensor, where d is the dimension of the feature space, n is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).
  • Y (Tensor) – A batch_shape’ x n x (o)-dim Tensor, where o is the number of model outputs, n is the number of points per batch, and batch_shape’ is the batch shape of the observations. batch_shape’ must be broadcastable to batch_shape using standard broadcasting semantics. If Y has fewer batch dimensions than X, its is assumed that the missing batch dimensions are the same for all Y.
Return type:

Model

Returns:

A Model object of the same type, representing the original model conditioned on the new observations (X, Y) (and possibly noise observations passed in via kwargs).

Example

>>> train_X = torch.rand(20, 2)
>>> train_Y = torch.sin(train_X[:, 0]) + torch.cos(train_X[:, 1])
>>> model = SingleTaskGP(train_X, train_Y)
>>> new_X = torch.rand(5, 2)
>>> new_Y = torch.sin(new_X[:, 0]) + torch.cos(new_X[:, 1])
>>> model = model.condition_on_observations(X=new_X, Y=new_Y)
posterior(X, observation_noise=False, **kwargs)[source]

Computes the posterior over model outputs at the provided points.

Parameters:
  • X (Tensor) – A (batch_shape) x q x d-dim Tensor, where d is the dimension of the feature space and q is the number of points considered jointly.
  • observation_noise (bool) – If True, add observation noise to the posterior.
  • propagate_grads – If True, do not detach GPyTorch’s test caches when computing the posterior. Required for being able to compute derivatives with respect to training inputs at test time (used e.g. by qNoisyExpectedImprovement). Defaults to False.
Return type:

GPyTorchPosterior

Returns:

A GPyTorchPosterior object, representing a batch of b joint distributions over q points. Includes observation noise if observation_noise=True.

BatchedMultiOutputGPyTorchModel

class botorch.models.gpytorch.BatchedMultiOutputGPyTorchModel[source]

Base class for batched multi-output GPyTorch models with independent outputs.

This model should be used when the same training data is used for all outputs. Outputs are modeled independently by using a different batch for each output.

condition_on_observations(X, Y, **kwargs)[source]

Condition the model on new observations.

Parameters:
  • X (Tensor) – A batch_shape x m x d-dim Tensor, where d is the dimension of the feature space, m is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).
  • Y (Tensor) – A batch_shape’ x m x (o)-dim Tensor, where o is the number of model outputs, m is the number of points per batch, and batch_shape’ is the batch shape of the observations. batch_shape’ must be broadcastable to batch_shape using standard broadcasting semantics. If Y has fewer batch dimensions than X, its is assumed that the missing batch dimensions are the same for all Y.
Return type:

BatchedMultiOutputGPyTorchModel

Returns:

A BatchedMultiOutputGPyTorchModel object of the same type with n + m training examples, representing the original model conditioned on the new observations (X, Y) (and possibly noise observations passed in via kwargs).

Example

>>> train_X = torch.rand(20, 2)
>>> train_Y = torch.cat(
>>>     [torch.sin(train_X[:, 0]), torch.cos(train_X[:, 1])], -1
>>> )
>>> model = SingleTaskGP(train_X, train_Y)
>>> new_X = torch.rand(5, 2)
>>> new_Y = torch.cat([torch.sin(new_X[:, 0]), torch.cos(new_X[:, 1])], -1)
>>> model = model.condition_on_observations(X=new_X, Y=new_Y)
posterior(X, output_indices=None, observation_noise=False, **kwargs)[source]

Computes the posterior over model outputs at the provided points.

Parameters:
  • X (Tensor) – A (batch_shape) x q x d-dim Tensor, where d is the dimension of the feature space and q is the number of points considered jointly.
  • output_indices (Optional[List[int]]) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.
  • observation_noise (bool) – If True, add observation noise to the posterior.
  • propagate_grads – If True, do not detach GPyTorch’s test caches when computing of the posterior. Required for being able to compute derivatives with respect to training inputs at test time (used e.g. by qNoisyExpectedImprovement). Defaults to False.
Return type:

GPyTorchPosterior

Returns:

A GPyTorchPosterior object, representing batch_shape joint distributions over q points and the outputs selected by output_indices each. Includes observation noise if observation_noise=True.

ModelListGPyTorchModel

class botorch.models.gpytorch.ModelListGPyTorchModel[source]

Abstract base class for models based on multi-output GPyTorch models.

This is meant to be used with a gpytorch ModelList wrapper for independent evaluation of submodels.

condition_on_observations(X, Y, **kwargs)[source]

Condition the model on new observations.

Parameters:
  • X (Tensor) – A batch_shape x n x d-dim Tensor, where d is the dimension of the feature space, n is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).
  • Y (Tensor) – A batch_shape’ x n x (o)-dim Tensor, where o is the number of model outputs, n is the number of points per batch, and batch_shape’ is the batch shape of the observations. batch_shape’ must be broadcastable to batch_shape using standard broadcasting semantics. If Y has fewer batch dimensions than X, its is assumed that the missing batch dimensions are the same for all Y.
Return type:

ModelListGPyTorchModel

Returns:

A Model object of the same type, representing the original model conditioned on the new observations (X, Y) (and possibly noise observations passed in via kwargs).

Example

>>> train_X = torch.rand(20, 2)
>>> train_Y = torch.sin(train_X[:, 0]) + torch.cos(train_X[:, 1])
>>> model = SingleTaskGP(train_X, train_Y)
>>> new_X = torch.rand(5, 2)
>>> new_Y = torch.sin(new_X[:, 0]) + torch.cos(new_X[:, 1])
>>> model = model.condition_on_observations(X=new_X, Y=new_Y)
num_outputs

The number of outputs of the model.

Return type:int
posterior(X, output_indices=None, observation_noise=False, **kwargs)[source]

Computes the posterior over model outputs at the provided points.

Parameters:
  • X (Tensor) – A b x q x d-dim Tensor, where d is the dimension of the feature space, q is the number of points considered jointly, and b is the batch dimension.
  • output_indices (Optional[List[int]]) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.
  • observation_noise (bool) – If True, add observation noise to the posterior.
  • propagate_grads – If True, do not detach GPyTorch’s test caches when computing of the posterior. Required for being able to compute derivatives with respect to training inputs at test time (used e.g. by qNoisyExpectedImprovement). Defaults to False.
Return type:

GPyTorchPosterior

Returns:

A GPyTorchPosterior object, representing batch_shape joint distributions over q points and the outputs selected by output_indices each. Includes measurement noise if observation_noise=True.

MultiTaskGPyTorchModel

class botorch.models.gpytorch.MultiTaskGPyTorchModel[source]

Abstract base class for multi-task models baed on GPyTorch models.

This class provides the posterior method to models that implement a “long-format” multi-task GP in the style of MultiTaskGP.

posterior(X, output_indices=None, observation_noise=False, **kwargs)[source]

Computes the posterior over model outputs at the provided points.

Parameters:
  • X (Tensor) – A q x d or batch_shape x q x d (batch mode) tensor, where d is the dimension of the feature space (not including task indices) and q is the number of points considered jointly.
  • output_indices (Optional[List[int]]) – A list of indices, corresponding to the outputs over which to compute the posterior (if the model is multi-output). Can be used to speed up computation if only a subset of the model’s outputs are required for optimization. If omitted, computes the posterior over all model outputs.
  • observation_noise (bool) – If True, add observation noise to the posterior.
  • propagate_grads – If True, do not detach GPyTorch’s test caches when computing of the posterior. Required for being able to compute derivatives with respect to training inputs at test time (used e.g. by qNoisyExpectedImprovement). Defaults to False.
Return type:

GPyTorchPosterior

Returns:

A GPyTorchPosterior object, representing batch_shape joint distributions over q points and the outputs selected by output_indices. Includes measurement noise if observation_noise=True.

GPyTorch Regression Models

SingleTaskGP

class botorch.models.gp_regression.SingleTaskGP(train_X, train_Y, likelihood=None)[source]

A single-task exact GP model.

A single-task exact GP using relatively strong priors on the Kernel hyperparameters, which work best when covariates are normalized to the unit cube and outcomes are standardized (zero mean, unit variance).

This model works in batch mode (each batch having its own hyperparameters). When the training observations include multiple outputs, this model will use batching to model outputs independently.

Use this model when you have independent output(s) and all outputs use the same training data. If outputs are independent and outputs have different training data, use the ModelListGP. When modeling correlations between outputs, use the MultiTaskGP.

A single-task exact GP model.

Parameters:
  • train_X (Tensor) – A n x d or batch_shape x n x d (batch mode) tensor of training features.
  • train_Y (Tensor) – A n x (o) or batch_shape x n x (o) (batch mode) tensor of training observations.
  • likelihood (Optional[Likelihood]) – A likelihood. If omitted, use a standard GaussianLikelihood with inferred noise level.

Example

>>> train_X = torch.rand(20, 2)
>>> train_Y = torch.sin(train_X[:, 0]) + torch.cos(train_X[:, 1])
>>> model = SingleTaskGP(train_X, train_Y)

FixedNoiseGP

class botorch.models.gp_regression.FixedNoiseGP(train_X, train_Y, train_Yvar)[source]

A single-task exact GP model using fixed noise levels.

A single-task exact GP that uses fixed observation noise levels. This model also uses relatively strong priors on the Kernel hyperparameters, which work best when covariates are normalized to the unit cube and outcomes are standardized (zero mean, unit variance).

This model works in batch mode (each batch having its own hyperparameters).

A single-task exact GP model using fixed noise levels.

Parameters:
  • train_X (Tensor) – A n x d or batch_shape x n x d (batch mode) tensor of training features.
  • train_Y (Tensor) – A n x (o) or batch_shape x n x (o) (batch mode) tensor of training observations.
  • train_Yvar (Tensor) – A batch_shape x n x (o) or batch_shape x n x (o) (batch mode) tensor of observed measurement noise.

Example

>>> train_X = torch.rand(20, 2)
>>> train_Y = torch.sin(train_X[:, 0]]) + torch.cos(train_X[:, 1])
>>> train_Yvar = torch.full_like(train_Y, 0.2)
>>> model = FixedNoiseGP(train_X, train_Y, train_Yvar)
fantasize(X, sampler, observation_noise=True, **kwargs)[source]

Construct a fantasy model.

Constructs a fantasy model in the following fashion: (1) compute the model posterior at X (if observation_noise=True, this includes observation noise, which is taken as the mean across the observation noise in the training data). (2) sample from this posterior (using sampler) to generate “fake” observations. (3) condition the model on the new fake observations.

Parameters:
  • X (Tensor) – A batch_shape x m x d-dim Tensor, where d is the dimension of the feature space, m is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).
  • sampler (MCSampler) – The sampler used for sampling from the posterior at X.
  • observation_noise (bool) – If True, include the mean across the observation noise in the training data as observation noise in the posterior from which the samples are drawn.
Return type:

FixedNoiseGP

Returns:

The constructed fantasy model.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Return type:MultivariateNormal

HeteroskedasticSingleTaskGP

class botorch.models.gp_regression.HeteroskedasticSingleTaskGP(train_X, train_Y, train_Yvar)[source]

A single-task exact GP model using a heteroskeastic noise model.

This model internally wraps another GP (a SingleTaskGP) to model the observation noise. This allows the likelihood to make out-of-sample predictions for the observation noise levels.

A single-task exact GP model using a heteroskedastic noise model.

Parameters:
  • train_X (Tensor) – A n x d or batch_shape x n x d (batch mode) tensor of training features.
  • train_Y (Tensor) – A n x (o) or batch_shape x n x (o) (batch mode) tensor of training observations.
  • train_Yvar (Tensor) – A batch_shape x n x (o) or batch_shape x n x (o) (batch mode) tensor of observed measurement noise.

Example

>>> train_X = torch.rand(20, 2)
>>> train_Y = torch.sin(train_X[:, 0]]) + torch.cos(train_X[:, 1])
>>> se = torch.norm(train_X - 0.5, dim=-1)
>>> train_Yvar = 0.1 + se * torch.rand_like(train_Y)
>>> model = HeteroskedasticSingleTaskGP(train_X, train_Y, train_Yvar)
condition_on_observations(X, Y, **kwargs)[source]

Condition the model on new observations.

Parameters:
  • X (Tensor) – A batch_shape x m x d-dim Tensor, where d is the dimension of the feature space, m is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).
  • Y (Tensor) – A batch_shape’ x m x (o)-dim Tensor, where o is the number of model outputs, m is the number of points per batch, and batch_shape’ is the batch shape of the observations. batch_shape’ must be broadcastable to batch_shape using standard broadcasting semantics. If Y has fewer batch dimensions than X, its is assumed that the missing batch dimensions are the same for all Y.
Return type:

HeteroskedasticSingleTaskGP

Returns:

A BatchedMultiOutputGPyTorchModel object of the same type with n + m training examples, representing the original model conditioned on the new observations (X, Y) (and possibly noise observations passed in via kwargs).

Example

>>> train_X = torch.rand(20, 2)
>>> train_Y = torch.cat(
>>>     [torch.sin(train_X[:, 0]), torch.cos(train_X[:, 1])], -1
>>> )
>>> model = SingleTaskGP(train_X, train_Y)
>>> new_X = torch.rand(5, 2)
>>> new_Y = torch.cat([torch.sin(new_X[:, 0]), torch.cos(new_X[:, 1])], -1)
>>> model = model.condition_on_observations(X=new_X, Y=new_Y)

ModelListGP

class botorch.models.model_list_gp_regression.ModelListGP(*gp_models)[source]

A multi-output GP model with independent GPs for the outputs.

This model supports different-shaped training inputs for each of its sub-models. It can be used with any BoTorch models.

Internally, this model is just a list of individual models, but it implements the same input/output interface as all other BoTorch models. This makes it very flexible and convenient to work with. The sequential evaluation comes at a performance cost though - if you are using a block design (i.e. the same number of training example for each output, and a similar model structure, you should consider using a batched GP model instead).

A multi-output GP model with independent GPs for the outputs.

Parameters:*gp_models – An variable number of single-output BoTorch models.

Example

>>> model1 = SingleTaskGP(train_X1, train_Y1)
>>> model2 = SingleTaskGP(train_X2, train_Y2)
>>> model = ModelListGP(model1, model2)
condition_on_observations(X, Y, **kwargs)[source]

Condition the model on new observations.

Parameters:
  • X (Tensor) – A batch_shape x m x d-dim Tensor, where d is the dimension of the feature space, m is the number of points per batch, and batch_shape is the batch shape (must be compatible with the batch shape of the model).
  • Y (Tensor) – A batch_shape’ x m x (o)-dim Tensor, where o is the number of model outputs, m is the number of points per batch, and batch_shape’ is the batch shape of the observations. batch_shape’ must be broadcastable to batch_shape using standard broadcasting semantics. If Y has fewer batch dimensions than X, its is assumed that the missing batch dimensions are the same for all Y.
Return type:

ModelListGP

Returns:

A ModelListGPyTorchModel representing the original model conditioned on the new observations (X, Y) (and possibly noise observations passed in via kwargs). Here the i-th model has n_i + m training examples, where the m training examples have been added and all test-time caches have been updated.

MultiTaskGP

class botorch.models.multitask.MultiTaskGP(train_X, train_Y, task_feature, output_tasks=None, rank=None)[source]

Multi-Task GP model using an ICM kernel, inferring observation noise.

Multi-task exact GP that uses a simple ICM kernel. Can be single-output or multi-output. This model uses relatively strong priors on the base Kernel hyperparameters, which work best when covariates are normalized to the unit cube and outcomes are standardized (zero mean, unit variance).

This model infers the noise level. WARNING: It currently does not support different noise levels for the different tasks. If you have known observation noise, please use FixedNoiseMultiTaskGP instead.

Multi-Task GP model using an ICM kernel, inferring observation noise.

Parameters:
  • train_X (Tensor) – A n x (d + 1) or b x n x (d + 1) (batch mode) tensor of training data. One of the columns should contain the task features (see task_feature argument).
  • train_Y (Tensor) – A n or b x n (batch mode) tensor of training observations.
  • task_feature (int) – The index of the task feature (-d <= task_feature <= d).
  • output_tasks (Optional[List[int]]) – A list of task indices for which to compute model outputs for. If omitted, return outputs for all task indices.
  • rank (Optional[int]) – The rank to be used for the index kernel. If omitted, use a full rank (i.e. number of tasks) kernel.

Example

>>> X1, X2 = torch.rand(10, 2), torch.rand(20, 2)
>>> i1, i2 = torch.zeros(10, 1), torch.ones(20, 1)
>>> train_X = torch.stack([
>>>     torch.cat([X1, i1], -1), torch.cat([X2, i2], -1),
>>> ])
>>> train_Y = torch.cat(f1(X1), f2(X2))
>>> model = MultiTaskGP(train_X, train_Y, task_feature=-1)
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Return type:MultivariateNormal

FixedNoiseMultiTaskGP

class botorch.models.multitask.FixedNoiseMultiTaskGP(train_X, train_Y, train_Yvar, task_feature, output_tasks=None, rank=None)[source]

Multi-Task GP model using an ICM kernel, with known observation noise.

Multi-task exact GP that uses a simple ICM kernel. Can be single-output or multi-output. This model uses relatively strong priors on the base Kernel hyperparameters, which work best when covariates are normalized to the unit cube and outcomes are standardized (zero mean, unit variance).

This model requires observation noise data (specified in train_Yvar).

Multi-Task GP model using an ICM kernel and known observatioon noise.

Parameters:
  • train_X (Tensor) – A n x (d + 1) or b x n x (d + 1) (batch mode) tensor of training data. One of the columns should contain the task features (see task_feature argument).
  • train_Y (Tensor) – A n or b x n (batch mode) tensor of training observations.
  • train_Yvar (Tensor) – A n or b x n (batch mode) tensor of observation noise standard errors.
  • task_feature (int) – The index of the task feature (-d <= task_feature <= d).
  • output_tasks (Optional[List[int]]) – A list of task indices for which to compute model outputs for. If omitted, return outputs for all task indices.
  • rank (Optional[int]) – The rank to be used for the index kernel. If omitted, use a full rank (i.e. number of tasks) kernel.

Example

>>> X1, X2 = torch.rand(10, 2), torch.rand(20, 2)
>>> i1, i2 = torch.zeros(10, 1), torch.ones(20, 1)
>>> train_X = torch.cat([
>>>     torch.cat([X1, i1], -1), torch.cat([X2, i2], -1),
>>> ], dim=0)
>>> train_Y = torch.cat(f1(X1), f2(X2))
>>> train_Yvar = 0.1 + 0.1 * torch.rand_like(train_Y)
>>> model = FixedNoiseMultiTaskGP(train_X, train_Y, train_Yvar, -1)