#!/usr/bin/env python3
# coding: utf-8
# ## High-Dimensional sample-efficient Bayesian Optimization with SAASBO
#
# This tutorial shows how to use the Sparse Axis-Aligned Subspace Bayesian Optimization (SAASBO)
# method for high-dimensional Bayesian optimization [1]. SAASBO places strong priors on the
# inverse lengthscales to avoid overfitting in high-dimensional spaces. Specifically, SAASBO
# uses a hierarchical sparsity prior consisting of a global shrinkage parameter
# $\tau \sim \mathcal{HC}(\beta)$ and inverse lengthscales $\rho_d \sim \mathcal{HC}(\tau)$
# for $d=1, \ldots, D$, where $\mathcal{HC}$ is the half-Cauchy distribution.
# While half-Cauchy priors favor values near zero they also have heavy tails, which allows the
# inverse lengthscales of the most important parameters to escape zero. To perform inference in the
# SAAS model we use Hamiltonian Monte Carlo (HMC) as we found that to outperform MAP inference.
#
# We find that SAASBO performs well on problems with hundreds of dimensions. As we rely on HMC
# and in particular the No-U-Turn-Sampler (NUTS) for inference, the overhead of SAASBO scales
# cubically with the number of datapoints. Depending on the problem, using more than a few hundred
# evaluations may not be feasible as SAASBO is designed for problems with a limited evaluation budget.
#
# In general, we recommend using [Ax](https://ax.dev) for a simple BO setup like this one. See [here](https://ax.dev/tutorials/saasbo.html) for a SAASBO tutorial in Ax, which uses the Log Noisy Expected Improvement acquisition function. Therefore, this tutorial shows a minimal illustrative example of how to use SAASBO with only BoTorch. To customize the acquisition function used with SAASBO in Ax, see the [custom acquisition tutorial](./custom_acquisition), where adding `\"surrogate\": Surrogate(SaasFullyBayesianSingleTaskGP),` to the `model_kwargs` of `BOTORCH_MODULAR` step is sufficient to enable the SAAS model.
#
# [1]: [D. Eriksson, M. Jankowiak. High-Dimensional Bayesian Optimization with Sparse Axis-Aligned Subspaces. Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021.](https://proceedings.mlr.press/v161/eriksson21a.html)
# In[1]:
import os
import torch
from torch.quasirandom import SobolEngine
from botorch import fit_fully_bayesian_model_nuts
from botorch.acquisition.logei import qLogExpectedImprovement
from botorch.models.fully_bayesian import SaasFullyBayesianSingleTaskGP
from botorch.models.transforms import Standardize
from botorch.optim import optimize_acqf
from botorch.test_functions import Branin
SMOKE_TEST = os.environ.get("SMOKE_TEST")
# In[2]:
tkwargs = {
"device": torch.device("cuda" if torch.cuda.is_available() else "cpu"),
"dtype": torch.double,
}
# The time to fit the SAAS model can be decreased by lowering
# `WARMUP_STEPS` and `NUM_SAMPLES`.
#
# We recommend using 512 warmup steps and 256 samples when
# possible and to not use fewer than 256 warmup steps and 128 samples. By default, we only
# keep each 16th sample which with 256 samples results in 32 hyperparameter samples.
#
# To make this tutorial run faster we use 256 warmup steps and 128 samples.
# In[3]:
WARMUP_STEPS = 256 if not SMOKE_TEST else 32
NUM_SAMPLES = 128 if not SMOKE_TEST else 16
THINNING = 16
# ## Simple model fitting
# We generate a simple function that only depends on the first parameter and show that the SAAS
# model sets all other lengthscales to large values.
# In[4]:
train_X = torch.rand(10, 4, **tkwargs)
test_X = torch.rand(5, 4, **tkwargs)
train_Y = torch.sin(train_X[:, :1])
test_Y = torch.sin(test_X[:, :1])
# By default, we infer the unknown noise variance in the data. You can also pass in a known
# noise variance (`train_Yvar`) for each observation, which may be useful in cases where you for example
# know that the problem is noise-free and can then set the noise variance to a small value such as `1e-6`.
#
# In this case you can construct a model as follows:
# ```
# gp = SaasFullyBayesianSingleTaskGP(train_X=train_X, train_Y=train_Y, train_Yvar=torch.full_like(train_Y, 1e-6))
# ```
# In[5]:
gp = SaasFullyBayesianSingleTaskGP(
train_X=train_X,
train_Y=train_Y,
outcome_transform=Standardize(m=1)
)
fit_fully_bayesian_model_nuts(
gp,
warmup_steps=WARMUP_STEPS,
num_samples=NUM_SAMPLES,
thinning=THINNING,
disable_progbar=True,
)
with torch.no_grad():
posterior = gp.posterior(test_X)
# Computing the median lengthscales over the MCMC dimensions makes it clear that the first feature has the smallest lengthscale
#
# In[6]:
print(gp.median_lengthscale.detach())
# ### Make predictions with the model
#
# In the next cell we show how to make predictions with the SAAS model. You compute the mean
# and variance for test points just like for any other BoTorch posteriors. Note that the mean
# and posterior will have an extra batch dimension at -3 that corresponds to the number of MCMC
# samples (which is 8 in this tutorial).
# In[7]:
print(posterior.mean.shape)
print(posterior.variance.shape)
# We also provide several convenience methods for computing different statistics over the MCMC samples:
# ```
# mixture_mean = posterior.mixture_mean
# mixture_variance = posterior.mixture_variance
# mixture_quantile = posterior.quantile(q=0.95)
# ```
# In[8]:
print(f"Ground truth: {test_Y.squeeze(-1)}")
print(f"Mixture mean: {posterior.mixture_mean.squeeze(-1)}")
# ## Optimize Branin embedded in a 30D space
# We take the standard 2D Branin problem and embed it in a 30D space. In particular,
# we let dimensions 0 and 1 correspond to the true dimensions. We will show that
# SAASBO is able to identify the important dimensions and efficiently optimize this function.
# We work with the domain $[0, 1]^d$ and unnormalize the inputs to the true domain of Branin
# before evaluating the function.
# In[9]:
branin = Branin().to(**tkwargs)
def branin_emb(x):
"""x is assumed to be in [0, 1]^d"""
lb, ub = branin.bounds
return branin(lb + (ub - lb) * x[..., :2])
# In[10]:
DIM = 30 if not SMOKE_TEST else 2
# Evaluation budget
N_INIT = 10
N_ITERATIONS = 8 if not SMOKE_TEST else 1
BATCH_SIZE = 5 if not SMOKE_TEST else 1
print(f"Using a total of {N_INIT + BATCH_SIZE * N_ITERATIONS} function evaluations")
# ### Run the optimization
# We use 10 initial Sobol points followed by 8 iterations of BO using a batch size of 5,
# which results in a total of 50 function evaluations. As our goal is to minimize Branin, we flip
# the sign of the function values before fitting the SAAS model as the BoTorch acquisition
# functions assume maximization.
# In[11]:
X = SobolEngine(dimension=DIM, scramble=True, seed=0).draw(N_INIT).to(**tkwargs)
Y = branin_emb(X).unsqueeze(-1)
print(f"Best initial point: {Y.min().item():.3f}")
for i in range(N_ITERATIONS):
train_Y = -1 * Y # Flip the sign since we want to minimize f(x)
gp = SaasFullyBayesianSingleTaskGP(
train_X=X,
train_Y=train_Y,
train_Yvar=torch.full_like(train_Y, 1e-6),
outcome_transform=Standardize(m=1),
)
fit_fully_bayesian_model_nuts(
gp,
warmup_steps=WARMUP_STEPS,
num_samples=NUM_SAMPLES,
thinning=THINNING,
disable_progbar=True,
)
EI = qLogExpectedImprovement(model=gp, best_f=train_Y.max())
candidates, acq_values = optimize_acqf(
EI,
bounds=torch.cat((torch.zeros(1, DIM), torch.ones(1, DIM))).to(**tkwargs),
q=BATCH_SIZE,
num_restarts=10,
raw_samples=1024,
)
Y_next = torch.cat([branin_emb(x).unsqueeze(-1) for x in candidates]).unsqueeze(-1)
if Y_next.min() < Y.min():
ind_best = Y_next.argmin()
x0, x1 = candidates[ind_best, :2].tolist()
print(
f"{i + 1}) New best: {Y_next[ind_best].item():.3f} @ "
f"[{x0:.3f}, {x1:.3f}]"
)
X = torch.cat((X, candidates))
Y = torch.cat((Y, Y_next))
# ## Plot the results
#
# We can see that we were able to get close to the global optimium of $\approx 0.398$ after 50 function evaluations.
#
# In[12]:
import matplotlib.pyplot as plt
import numpy as np
get_ipython().run_line_magic('matplotlib', 'inline')
Y_np = Y.cpu().numpy()
fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(np.minimum.accumulate(Y_np), color="b", label="SAASBO")
ax.plot([0, len(Y_np)], [0.398, 0.398], "--", c="g", lw=3, label="Optimal value")
ax.grid(True)
ax.set_title(f"Branin, D = {DIM}", fontsize=20)
ax.set_xlabel("Number of evaluations", fontsize=20)
ax.set_xlim([0, len(Y_np)])
ax.set_ylabel("Best value found", fontsize=20)
ax.set_ylim([0, 8])
ax.legend(fontsize=18)
plt.show()
# ## Predict on some test points
# We fit a model using the 50 datapoints collected by SAASBO and predict on 50 test
# points in order to see how well the SAAS model predicts out-of-sample.
# The plot shows the mean and a 95% confidence interval for each test point.
# In[13]:
train_X = SobolEngine(dimension=DIM, scramble=True, seed=0).draw(50).to(**tkwargs)
test_X = SobolEngine(dimension=DIM, scramble=True, seed=1).draw(50).to(**tkwargs)
train_Y = branin_emb(train_X).unsqueeze(-1)
test_Y = branin_emb(test_X).unsqueeze(-1)
gp = SaasFullyBayesianSingleTaskGP(
train_X=train_X,
train_Y=train_Y,
train_Yvar=torch.full_like(train_Y, 1e-6),
outcome_transform=Standardize(m=1),
)
fit_fully_bayesian_model_nuts(
gp,
warmup_steps=WARMUP_STEPS,
num_samples=NUM_SAMPLES,
thinning=THINNING,
disable_progbar=True,
)
# In[14]:
with torch.no_grad():
posterior = gp.posterior(test_X)
median = posterior.quantile(value=torch.tensor([0.5], **tkwargs))
q1 = posterior.quantile(value=torch.tensor([0.025], **tkwargs))
q2 = posterior.quantile(value=torch.tensor([0.975], **tkwargs))
# In[15]:
fig, ax = plt.subplots(1, 1, figsize=(8, 6))
ax.plot([0, 80], [0, 80], "b--", lw=2)
yerr1, yerr2 = median - q1, q2 - median
yerr = torch.cat((yerr1.unsqueeze(0), yerr2.unsqueeze(0)), dim=0).squeeze(-1)
markers, caps, bars = ax.errorbar(
test_Y.squeeze(-1).cpu().numpy(),
median.squeeze(-1).cpu().numpy(),
yerr=yerr.cpu().numpy(),
fmt=".",
capsize=4,
elinewidth=2.0,
ms=14,
c="k",
ecolor="gray",
)
ax.set_xlim([0, 80])
ax.set_ylim([0, 80])
[bar.set_alpha(0.8) for bar in bars]
[cap.set_alpha(0.8) for cap in caps]
ax.set_xlabel("True value", fontsize=20)
ax.set_ylabel("Predicted value", fontsize=20)
ax.set_aspect("equal")
ax.grid(True)
# ## Look a the lengthscales from the final model
#
# As SAASBO places strong priors on the inverse lengthscales, we only expect parameters
# 0 and 1 to be identified as important by the model since the other parameters have no effect.
# We can confirm that this is the case below as the lengthscales of parameters 0 and 1 are
# small with all other lengthscales being large.
# In[16]:
median_lengthscales = gp.median_lengthscale
for i in median_lengthscales.argsort()[:10]:
print(f"Parameter {i:2}) Median lengthscale = {median_lengthscales[i].item():.2e}")
# In[17]: