Composite Bayesian optimization with the High Order Gaussian Process
Composite Bayesian Optimization with the High Order Gaussian Process
In this tutorial, we're going to explore composite Bayesian optimization Astudillo & Frazier, ICML, '19 with the High Order Gaussian Process (HOGP) model of Zhe et al, AISTATS, '19. The setup for composite Bayesian optimization is that we have an unknown (black box) function mapping input parameters to several outputs, and a second, known function describing the quality of the functional output. We wish to find input parameters that maximize the output metric function. We wish to find input parameters that maximize the output metric function in a black-box manner.
Specifically, this can be described as where is unknown and is known. As in traditional Bayesian optimization, we are going to construct a Gaussian process surrogate model over the expensive to evaluate function and will use a HOGP to model this function.
HOGP model description
The High Order Gaussian Process (HOGP) model is a Gaussian process model designed specifically to operate over tensors or multi-dimensional arrays and exploits structure in the tensor to be able to operate efficiently. Specifically, the HOGP takes as inputs and assumes that where Each dimension of the tensor has its own kernel function, as well as a set of latent parameters that can be optimized over.
Recently, Maddox et al, '21 proposed a method for computing posterior samples from the HOGP by exploiting structure in the posterior distribution, thereby enabling its usage in BO settings. While they show that this approach allows to use composite BO on problems with tens or thousands of outputs, for scalability we consider a much smaller example here (that does not require GPU acceleration).
# Install dependencies if we are running in colab
import sys
if 'google.colab' in sys.modules:
%pip install botorch
import math
import os
import time
from functools import partial
import gpytorch.settings as gpt_settings
import matplotlib.pyplot as plt
import torch
from botorch.acquisition import qExpectedImprovement
from botorch.acquisition.objective import GenericMCObjective
from botorch.models import HigherOrderGP, SingleTaskGP
from botorch.models.higher_order_gp import FlattenedStandardize
from botorch.models.transforms import Normalize, Standardize
from botorch.optim import optimize_acqf
from botorch.optim.fit import fit_gpytorch_mll_torch
from botorch.sampling.normal import IIDNormalSampler
from gpytorch.mlls import ExactMarginalLogLikelihood
from linear_operator.settings import _fast_solves
from torch.optim import Adam
%matplotlib inline
SMOKE_TEST = os.environ.get("SMOKE_TEST")
I0215 081151.431 _utils_internal.py:247] NCCL_DEBUG env var is set to None
I0215 081151.436 _utils_internal.py:265] NCCL_DEBUG is forced to WARN from None
Set Device and dtype
torch.manual_seed(0)
device = (
torch.device("cpu") if not torch.cuda.is_available() else torch.device("cuda:4")
)
dtype = torch.float
print("Using ", device)
Using cpu
models_used = (
"rnd",
"ei",
"ei_hogp_cf",
)
Problem Description
We use a simple test problem describing the concentration of pollutants after a chemical spill from Astudillo & Frazier, ICML, '19 defined over a grid of values and we wish to optimize the parameters w.r.t. their true values, to estimate the true value of parameters, The function is given by