BoTorch provides a convenient botorch.fit.fit_gpytorch_model
function with sensible defaults that work on most basic models, including those that botorch ships with. Internally, this function uses L-BFGS-B to fit the parameters. However, in more advanced use cases you may need or want to implement your own model fitting logic.
This tutorial allows you to customize model fitting to your needs using the familiar PyTorch-style model fitting loop.
This tutorial is adapted from GPyTorch's Simple GP Regression Tutorial and has very few changes because the out-of-the box models that BoTorch provides are GPyTorch models; in fact, they are proper subclasses that add the botorch.models.Model
API functions.
import math
import torch
# use a GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dtype = torch.float
In this tutorial we will model a simple sinusoidal function with i.i.d. Gaussian noise:
$$y = \sin(2\pi x) + \epsilon, ~\epsilon \sim \mathcal N(0, 0.15)$$# use regular spaced points on the interval [0, 1]
train_X = torch.linspace(0, 1, 15, dtype=dtype, device=device)
# training data needs to be explicitly multi-dimensional
train_X = train_X.unsqueeze(1)
# sample observed values and add some synthetic noise
train_Y = torch.sin(train_X * (2 * math.pi)) + 0.15 * torch.randn_like(train_X)
# input observations must be single-dimensional for single-output model
train_Y = train_Y.view(-1)
We will model the function using a SingleTaskGP
, which by default uses a GaussianLikelihood
and infers the unknown noise level.
The default optimizer for the SingleTaskGP
is L-BFGS-B, which takes as input explicit bounds on the noise parameter. However, the torch
optimizers don't support parameter bounds as input. To use the torch
optimizers, then, we'll need to manually register a constraint on the noise level. When registering a constraint, the softplus
transform is applied by default, enabling us to enforce a lower bound on the noise.
Note: Without manual registration, the model itself does not apply any constraints, due to the interaction between constraints and transforms. Although the SingleTaskGP
constructor does in fact define a constraint, the constructor sets transform=None
, which means that the constraint is not enforced. See the GPyTorch constraints module for additional information.
from botorch.models import SingleTaskGP
from gpytorch.constraints import GreaterThan
model = SingleTaskGP(train_X=train_X, train_Y=train_Y)
model.likelihood.noise_covar.register_constraint("raw_noise", GreaterThan(1e-5))
We will jointly optimize the kernel hyperparameters and the likelihood's noise parameter, by minimizing the negative gpytorch.mlls.ExactMarginalLogLikelihood
(our loss function).
from gpytorch.mlls import ExactMarginalLogLikelihood
mll = ExactMarginalLogLikelihood(likelihood=model.likelihood, model=model)
# set mll and all submodules to the specified dtype and device
mll = mll.to(train_X)
We will use stochastic gradient descent (torch.optim.SGD
) to optimize the kernel hyperparameters and the noise level. In this example, we will use a simple fixed learning rate of 0.1, but in practice the learning rate may need to be adjusted.
Notes:
GaussianLikelihood
module is a of child (submodule) of the SingleTaskGP
moduel, model.parameters()
will also include the noise level of the GaussianLikelihood
. from torch.optim import SGD
optimizer = SGD([{'params': model.parameters()}], lr=0.1)
Now we are ready to write our optimization loop. We will perform 150 epochs of stochastic gradient descent using our entire training set.
NUM_EPOCHS = 150
model.train()
for epoch in range(NUM_EPOCHS):
# clear gradients
optimizer.zero_grad()
# forward pass through the model to obtain the output MultivariateNormal
output = model(train_X)
# Compute negative marginal log likelihood
loss = - mll(output, train_Y)
# back prop gradients
loss.backward()
# print every 10 iterations
if (epoch + 1) % 10 == 0:
print(
f"Epoch {epoch+1:>3}/{NUM_EPOCHS} - Loss: {loss.item():>4.3f} "
f"lengthscale: {model.covar_module.base_kernel.lengthscale.item():>4.3f} "
f"noise: {model.likelihood.noise.item():>4.3f}"
)
optimizer.step()
Epoch 10/150 - Loss: 1.944 lengthscale: 0.648 noise: 2.000 Epoch 20/150 - Loss: 1.907 lengthscale: 0.605 noise: 1.858 Epoch 30/150 - Loss: 1.870 lengthscale: 0.568 noise: 1.713 Epoch 40/150 - Loss: 1.831 lengthscale: 0.536 noise: 1.568 Epoch 50/150 - Loss: 1.791 lengthscale: 0.509 noise: 1.422 Epoch 60/150 - Loss: 1.749 lengthscale: 0.484 noise: 1.278 Epoch 70/150 - Loss: 1.704 lengthscale: 0.462 noise: 1.136 Epoch 80/150 - Loss: 1.657 lengthscale: 0.442 noise: 0.999 Epoch 90/150 - Loss: 1.606 lengthscale: 0.423 noise: 0.867 Epoch 100/150 - Loss: 1.552 lengthscale: 0.405 noise: 0.743 Epoch 110/150 - Loss: 1.495 lengthscale: 0.387 noise: 0.629 Epoch 120/150 - Loss: 1.435 lengthscale: 0.369 noise: 0.524 Epoch 130/150 - Loss: 1.372 lengthscale: 0.352 noise: 0.432 Epoch 140/150 - Loss: 1.306 lengthscale: 0.337 noise: 0.351 Epoch 150/150 - Loss: 1.239 lengthscale: 0.324 noise: 0.282
We plot the posterior mean and the 2 standard deviations from the mean.
# set model (and likelihood)
model.eval();
from matplotlib import pyplot as plt
%matplotlib inline
# Initialize plot
f, ax = plt.subplots(1, 1, figsize=(6, 4))
# test model on 101 regular spaced points on the interval [0, 1]
test_X = torch.linspace(0, 1, 101, dtype=dtype, device=device)
# no need for gradients
with torch.no_grad():
# compute posterior
posterior = model.posterior(test_X)
# Get upper and lower confidence bounds (2 standard deviations from the mean)
lower, upper = posterior.mvn.confidence_region()
# Plot training points as black stars
ax.plot(train_X.cpu().numpy(), train_Y.cpu().numpy(), 'k*')
# Plot posterior means as blue line
ax.plot(test_X.cpu().numpy(), posterior.mean.cpu().numpy(), 'b')
# Shade between the lower and upper confidence bounds
ax.fill_between(test_X.cpu().numpy(), lower.cpu().numpy(), upper.cpu().numpy(), alpha=0.5)
ax.legend(['Observed Data', 'Mean', 'Confidence'])
plt.tight_layout()
It is simple to package up a custom optimizer loop like the one above and use it within Ax. As described in the Using BoTorch with Ax tutorial, this requires defining a custom model_constructor
callable that can then be passed to the get_botorch
factory function.
def _get_and_fit_model(Xs, Ys, **kwargs):
train_X, train_Y = Xs[0], Ys[0]
model = SingleTaskGP(train_X=train_X, train_Y=train_Y)
mll = ExactMarginalLogLikelihood(model.likelihood, model).to(train_X)
model.train()
optimizer = SGD([{'params': model.parameters()}], lr=kwargs.get("lr"))
for epoch in range(kwargs.get("epochs")):
optimizer.zero_grad()
output = model(train_X)
loss = -mll(output, train_Y)
loss.backward()
optimizer.step()
return model