Version: Next

Implementing a custom acquisition function

Upper Confidence Bound (UCB)

The Upper Confidence Bound (UCB) acquisition function balances exploration and exploitation by assigning a score of $\mu + \sqrt{\beta} \cdot \sigma$ if the posterior distribution is normal with mean $\mu$ and variance $\sigma^2$ . This "analytic" version is implemented in the UpperConfidenceBound class. The Monte Carlo version of UCB is implemented in the qUpperConfidenceBound class, which also allows for q-batches of size greater than one. (The derivation of q-UCB is given in Appendix A of Wilson et. al., 2017).

A scalarized version of q-UCB

Suppose now that we are in a multi-output setting, where, e.g., we model the effects of a design on multiple metrics. We first show a simple extension of the q-UCB acquisition function that accepts a multi-output model and performs q-UCB on a scalarized version of the multiple outputs, achieved via a vector of weights. Implementing a new acquisition function in botorch is easy; one simply needs to implement the constructor and a forward method.

import math
from typing import Optional

from botorch.acquisition.monte_carlo import MCAcquisitionFunction
from botorch.models.model import Model
from botorch.sampling.base import MCSampler
from botorch.sampling.normal import SobolQMCNormalSampler
from botorch.utils import average_over_ensemble_models, t_batch_mode_transform
from torch import Tensor


class qScalarizedUpperConfidenceBound(MCAcquisitionFunction):
    def __init__(
        self,
        model: Model,
        beta: Tensor,
        weights: Tensor,
        sampler: Optional[MCSampler] = None,
    ) -> None:
        # we use the AcquisitionFunction constructor, since that of
        # MCAcquisitionFunction performs some validity checks that we don't want here
        super(MCAcquisitionFunction, self).__init__(model=model)
        if sampler is None:
            sampler = SobolQMCNormalSampler(sample_shape=torch.Size([512]))
        self.sampler = sampler
        self.register_buffer("beta", torch.as_tensor(beta))
        self.register_buffer("weights", torch.as_tensor(weights))

    @t_batch_mode_transform()
    @average_over_ensemble_models
    def forward(self, X: Tensor) -> Tensor:
        """Evaluate scalarized qUCB on the candidate set `X`.

        Args:
            X: A `(b) x q x d`-dim Tensor of `(b)` t-batches with `q` `d`-dim
                design points each.

        Returns:
            Tensor: A `(b)`-dim Tensor of Upper Confidence Bound values at the
                given design points `X`.
        """
        posterior = self.model.posterior(X)
        samples = self.get_posterior_samples(posterior)  # n x b x q x o
        scalarized_samples = samples.matmul(self.weights)  # n x b x q
        mean = posterior.mean  # b x q x o
        scalarized_mean = mean.matmul(self.weights)  # b x q
        ucb_samples = (
            scalarized_mean
            + math.sqrt(self.beta * math.pi / 2)
            * (scalarized_samples - scalarized_mean).abs()
        )
        return ucb_samples.max(dim=-1)[0].mean(dim=0)

Output:
/data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbcode/f3b9a99e517e0a13/bento/kernels/__bento_kernel_axoptics__/bento_kernel_axoptics#link-tree/sympy/solvers/diophantine.py:3188: SyntaxWarning:
"is" with a literal. Did you mean "=="?
/data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbcode/f3b9a99e517e0a13/bento/kernels/__bento_kernel_axoptics__/bento_kernel_axoptics#link-tree/sympy/plotting/plot.py:520: SyntaxWarning:
"is" with a literal. Did you mean "=="?
/data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbcode/f3b9a99e517e0a13/bento/kernels/__bento_kernel_axoptics__/bento_kernel_axoptics#link-tree/sympy/plotting/plot.py:540: SyntaxWarning:
"is" with a literal. Did you mean "=="?
/data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbcode/f3b9a99e517e0a13/bento/kernels/__bento_kernel_axoptics__/bento_kernel_axoptics#link-tree/sympy/plotting/plot.py:553: SyntaxWarning:
"is" with a literal. Did you mean "=="?
/data/sandcastle/boxes/fbsource/buck-out/v2/gen/fbcode/f3b9a99e517e0a13/bento/kernels/__bento_kernel_axoptics__/bento_kernel_axoptics#link-tree/sympy/plotting/plot.py:560: SyntaxWarning:
"is" with a literal. Did you mean "=="?

Note that qScalarizedUpperConfidenceBound is very similar to qUpperConfidenceBound and only requires a few lines of new code to accomodate scalarization of multiple outputs. The @t_batch_mode_transform decorator ensures that the input X has an explicit t-batch dimension (code comments are added with shapes for clarity). The @average_over_ensemble_models decorator averages the acquisition values of ensemble models over elements of the ensemble, for example for the fully Bayesian SAAS model.

See the end of this tutorial for a quick and easy way of achieving the same scalarization effect using ScalarizedPosteriorTransform.

Ad-hoc testing q-Scalarized-UCB

Before hooking the newly defined acquisition function into a Bayesian Optimization loop, we should test it. For this we'll just make sure that it properly evaluates on a compatible multi-output model. Here we just define a basic multi-output SingleTaskGP model trained on synthetic data.

import torch

from botorch.fit import fit_gpytorch_mll
from botorch.models import SingleTaskGP
from botorch.utils import standardize
from gpytorch.mlls import ExactMarginalLogLikelihood

torch.set_default_dtype(torch.double)

# generate synthetic data
X = torch.rand(20, 2)
Y = torch.stack([torch.sin(X[:, 0]), torch.cos(X[:, 1])], -1)

# construct and fit the multi-output model
gp = SingleTaskGP(train_X=X, train_Y=Y)
mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_mll(mll)

# construct the acquisition function
qSUCB = qScalarizedUpperConfidenceBound(gp, beta=0.1, weights=torch.tensor([0.1, 0.5]))

# evaluate on single q-batch with q=3
qSUCB(torch.rand(3, 2))

Output:
tensor([0.5625], grad_fn=<MeanBackward1>)

# batch-evaluate on two q-batches with q=3
qSUCB(torch.rand(2, 3, 2))

Output:
tensor([0.5165, 0.3577], grad_fn=<MeanBackward1>)

A scalarized version of analytic UCB (`q=1` only)

We can also write an analytic version of UCB for a multi-output model, assuming a multivariate normal posterior and q=1. The new class ScalarizedUpperConfidenceBound subclasses AnalyticAcquisitionFunction instead of MCAcquisitionFunction. In contrast to the MC version, instead of using the weights on the MC samples, we directly scalarize the mean vector $\mu$ and covariance matrix $\Sigma$ and apply standard UCB on the univariate normal distribution, which has mean $w^T \mu$ and variance $w^T \Sigma w$ . In addition to the @t_batch_transform decorator, here we are also using expected_q=1 to ensure the input X has a q=1.

Note: BoTorch also provides a ScalarizedPosteriorTransform abstraction that can be used with any existing analytic acqusition functions and automatically performs the scalarization we implement manually below. See the end of this tutorial for a usage example.

from botorch.acquisition import AnalyticAcquisitionFunction


class ScalarizedUpperConfidenceBound(AnalyticAcquisitionFunction):
    def __init__(
        self,
        model: Model,
        beta: Tensor,
        weights: Tensor,
        maximize: bool = True,
    ) -> None:
        # we use the AcquisitionFunction constructor, since that of
        # AnalyticAcquisitionFunction performs some validity checks that we don't want here
        super(AnalyticAcquisitionFunction, self).__init__(model)
        self.maximize = maximize
        self.register_buffer("beta", torch.as_tensor(beta))
        self.register_buffer("weights", torch.as_tensor(weights))

    @t_batch_mode_transform(expected_q=1)
    @average_over_ensemble_models
    def forward(self, X: Tensor) -> Tensor:
        """Evaluate the Upper Confidence Bound on the candidate set X using scalarization

        Args:
            X: A `(b) x d`-dim Tensor of `(b)` t-batches of `d`-dim design
                points each.

        Returns:
            A `(b)`-dim Tensor of Upper Confidence Bound values at the given
                design points `X`.
        """
        self.beta = self.beta.to(X)
        batch_shape = X.shape[:-2]
        posterior = self.model.posterior(X)
        means = posterior.mean.squeeze(dim=-2)  # b x o
        scalarized_mean = means.matmul(self.weights)  # b
        covs = posterior.mvn.covariance_matrix  # b x o x o
        weights = self.weights.view(
            1, -1, 1
        )  # 1 x o x 1 (assume single batch dimension)
        weights = weights.expand(batch_shape + weights.shape[1:])  # b x o x 1
        weights_transpose = weights.permute(0, 2, 1)  # b x 1 x o
        scalarized_variance = torch.bmm(
            weights_transpose, torch.bmm(covs, weights)
        ).view(
            batch_shape
        )  # b
        delta = (self.beta.expand_as(scalarized_mean) * scalarized_variance).sqrt()
        if self.maximize:
            return scalarized_mean + delta
        else:
            return scalarized_mean - delta

Ad-hoc testing Scalarized-UCB

Notice that we pass in an explicit q-batch dimension for consistency, even though q=1.

# construct the acquisition function
SUCB = ScalarizedUpperConfidenceBound(gp, beta=0.1, weights=torch.tensor([0.1, 0.5]))

# evaluate on single point
SUCB(torch.rand(1, 2))

Output:
tensor([0.4957], grad_fn=<AddBackward0>)

# batch-evaluate on 3 points
SUCB(torch.rand(3, 1, 2))

Output:
tensor([0.4183, 0.5188, 0.4453], grad_fn=<AddBackward0>)

Appendix: Using `ScalarizedPosteriorTransform`

Using the ScalarizedPosteriorTransform abstraction, the functionality of ScalarizedUpperConfidenceBound implemented above can be easily achieved in just a few lines of code. PosteriorTransforms can be used with both the MC and analytic acquisition functions.

from botorch.acquisition.objective import ScalarizedPosteriorTransform
from botorch.acquisition.analytic import UpperConfidenceBound

pt = ScalarizedPosteriorTransform(weights=torch.tensor([0.1, 0.5]))
SUCB = UpperConfidenceBound(gp, beta=0.1, posterior_transform=pt)

Upper Confidence Bound (UCB)​

A scalarized version of q-UCB​

Ad-hoc testing q-Scalarized-UCB​

A scalarized version of analytic UCB (q=1 only)​

Ad-hoc testing Scalarized-UCB​

Appendix: Using ScalarizedPosteriorTransform​