In this tutorial, we illustrate how to use the hypervolume knowledge gradient for problems where the objectives can be evaluated independently (decoupled).
There are two types of decoupling:
Competitive decoupling: where the objectives are evaluated using the same evaluation resource. Often the objectives have heterogenous costs and therefore it is prudent to select what design and objective to evaluate in a cost-aware fashion.
Non-competitive decoupling: where the objectives have independent evaluation resources and potentially different numbers of designs can be evaluated in parallel. In this scenario, all available evaluation resources should be exploited and the goal is to optimize the objectives as well as possible within a fixed number of time steps.
In this tutorial, we focus on competitive decoupling and show how HVKG can be used for efficient optimization.
Note: pymoo
is an optional dependency that is used for determining the Pareto set of optimal designs under the model posterior mean using NSGA-II (which is not a sample efficient method, but sample efficiency is not critical for this step). If pymoo
is not available, the Pareto set of optimal designs is selected from a discrete set. This will work okay for low-dim (e.g. $\leq2$ dimensions) problems, but in general NSGA-II will yield far better results.
Note: HVKG aggressively exploits parallel hardware and is much faster when run on a GPU.
import os
import torch
tkwargs = {
"dtype": torch.double,
"device": torch.device("cuda" if torch.cuda.is_available() else "cpu"),
}
SMOKE_TEST = os.environ.get("SMOKE_TEST")
I1110 064940.229 _utils_internal.py:230] NCCL_DEBUG env var is set to None
I1110 064940.231 _utils_internal.py:239] NCCL_DEBUG is INFO from /etc/nccl.conf
In this tutorial, we optimize a bi-objective synthetic function ZDT2 over a 6-dimensional space. The costs of evaluating each objective are 3 and 1, respectively, which we choose to be different to reflect that many multi-objective optimization problems have heterogeneous costs.
from botorch.test_functions.multi_objective import ZDT2
from botorch.models.cost import FixedCostModel
problem = ZDT2(negate=True, dim=6).to(**tkwargs)
# define the cost model
objective_costs = {0: 3.0, 1: 1.0}
objective_indices = list(objective_costs.keys())
objective_costs = {int(k): v for k, v in objective_costs.items()}
objective_costs_t = torch.tensor(
[objective_costs[k] for k in sorted(objective_costs.keys())], **tkwargs
)
cost_model = FixedCostModel(fixed_cost=objective_costs_t)
We use a list of SingleTaskGP
s to model the two objectives with known noise variances. The models are initialized with $2(d+1)=14$ points drawn randomly from $[0,1]^2$. Since the objectives can be evaluated independently, the number of observations of each objective can be different. Therefore, we must use a ModelListGP
.
from botorch.models.gp_regression import SingleTaskGP
from botorch.models.model_list_gp_regression import ModelListGP
from botorch.models.transforms.outcome import Standardize
from botorch.utils.sampling import draw_sobol_samples
from botorch.utils.transforms import normalize, unnormalize
from gpytorch.mlls.sum_marginal_log_likelihood import SumMarginalLogLikelihood
from torch import Tensor
from gpytorch.priors import GammaPrior
from gpytorch.kernels import MaternKernel, ScaleKernel
def generate_initial_data(n):
# generate training data
train_x = draw_sobol_samples(bounds=problem.bounds, n=n, q=1).squeeze(1)
train_obj_true = problem(train_x)
return train_x, train_obj_true
def initialize_model(train_x_list, train_obj_list):
# define models for objective and constraint
train_x_list = [normalize(train_x, problem.bounds) for train_x in train_x_list]
models = []
for i in range(len(train_obj_list)):
train_y = train_obj_list[i]
train_yvar = torch.full_like(train_y, 1e-7) # noiseless
models.append(
SingleTaskGP(
train_X=train_x_list[i],
train_Y=train_y,
train_Yvar=train_yvar,
outcome_transform=Standardize(m=1),
covar_module=ScaleKernel(
MaternKernel(
nu=2.5,
ard_num_dims=train_x_list[0].shape[-1],
lengthscale_prior=GammaPrior(2.0, 2.0),
),
outputscale_prior=GammaPrior(2.0, 0.15),
)
)
)
model = ModelListGP(*models)
mll = SumMarginalLogLikelihood(model.likelihood, model)
return mll, model
The helper function below initializes the $q$NEHVI acquisition function (a strong baseline, but one that does not support decoupled evaluations), optimizes it, and returns the candidate along with the observed function values.
Reference Point
$q$NEHVI and HVKG require specifying a reference point, which is the lower bound on the objectives used for computing hypervolume. In this tutorial, we assume the reference point is known. In practice the reference point can be set 1) using domain knowledge to be slightly worse than the lower bound of objective values, where the lower bound is the minimum acceptable value of interest for each objective, or 2) using a dynamic reference point selection strategy.
from botorch.acquisition.multi_objective.monte_carlo import (
qNoisyExpectedHypervolumeImprovement,
)
from botorch.optim.optimize import optimize_acqf
BATCH_SIZE = 1
NUM_RESTARTS = 10 if not SMOKE_TEST else 2
RAW_SAMPLES = 512 if not SMOKE_TEST else 4
standard_bounds = torch.zeros(2, problem.dim, **tkwargs)
standard_bounds[1] = 1
def optimize_qnehvi_and_get_observation(model, train_x, sampler):
"""Optimizes the qNEHVI acquisition function, and returns a new candidate and observation."""
# partition non-dominated space into disjoint rectangles
acq_func = qNoisyExpectedHypervolumeImprovement(
model=model,
ref_point=problem.ref_point.tolist(), # use known reference point
X_baseline=normalize(train_x, problem.bounds),
prune_baseline=True, # prune baseline points that have estimated zero probability of being Pareto optimal
sampler=sampler,
)
# optimize
candidates, _ = optimize_acqf(
acq_function=acq_func,
bounds=standard_bounds,
q=BATCH_SIZE,
num_restarts=NUM_RESTARTS,
raw_samples=RAW_SAMPLES, # used for intialization heuristic
options={"batch_limit": 5, "maxiter": 200},
sequential=True,
)
# observe new values
new_x = unnormalize(candidates.detach(), bounds=problem.bounds)
new_obj_true = problem(new_x)
return new_x, new_obj_true
Below we define the following helper functions:
get_current_value
for computing the current hypervolume of the hypervolume maximizing set under the posterior mean.optimize_HVKG_and_get_obs_decoupled
to initialize and optimize HVKG to determine which design to evaluate and which objective to evaluate the design on. This method obtains the observation corresponding to that design.from botorch.acquisition.cost_aware import InverseCostWeightedUtility
from botorch.acquisition.multi_objective.hypervolume_knowledge_gradient import (
_get_hv_value_function,
qHypervolumeKnowledgeGradient,
)
from botorch.models.deterministic import GenericDeterministicModel
from botorch.sampling.list_sampler import ListSampler
from botorch.sampling.normal import IIDNormalSampler
NUM_PARETO = 2 if SMOKE_TEST else 10
NUM_FANTASIES = 2 if SMOKE_TEST else 8
NUM_HVKG_RESTARTS = 1
def get_current_value(
model,
ref_point,
bounds,
):
"""Helper to get the hypervolume of the current hypervolume
maximizing set.
"""
curr_val_acqf = _get_hv_value_function(
model=model,
ref_point=ref_point,
use_posterior_mean=True,
)
_, current_value = optimize_acqf(
acq_function=curr_val_acqf,
bounds=bounds,
q=NUM_PARETO,
num_restarts=20,
raw_samples=1024,
return_best_only=True,
options={"batch_limit": 5},
)
return current_value
def optimize_HVKG_and_get_obs_decoupled(model):
"""Utility to initialize and optimize HVKG."""
cost_aware_utility = InverseCostWeightedUtility(cost_model=cost_model)
current_value = get_current_value(
model=model,
ref_point=problem.ref_point,
bounds=standard_bounds,
)
acq_func = qHypervolumeKnowledgeGradient(
model=model,
ref_point=problem.ref_point, # use known reference point
num_fantasies=NUM_FANTASIES,
num_pareto=NUM_PARETO,
current_value=current_value,
cost_aware_utility=cost_aware_utility,
)
# optimize acquisition functions and get new observations
objective_vals = []
objective_candidates = []
for objective_idx in objective_indices:
# set evaluation index to only condition on one objective
# this could be multiple objectives
X_evaluation_mask = torch.zeros(
1,
len(objective_indices),
dtype=torch.bool,
device=standard_bounds.device,
)
X_evaluation_mask[0, objective_idx] = 1
acq_func.X_evaluation_mask = X_evaluation_mask
candidates, vals = optimize_acqf(
acq_function=acq_func,
num_restarts=NUM_HVKG_RESTARTS,
raw_samples=RAW_SAMPLES,
bounds=standard_bounds,
q=BATCH_SIZE,
sequential=False,
options={"batch_limit": 5},
)
objective_vals.append(vals.view(-1))
objective_candidates.append(candidates)
best_objective_index = torch.cat(objective_vals, dim=-1).argmax().item()
eval_objective_indices = [best_objective_index]
candidates = objective_candidates[best_objective_index]
vals = objective_vals[best_objective_index]
# observe new values
new_x = unnormalize(candidates.detach(), bounds=problem.bounds)
new_obj = problem(new_x)
new_obj = new_obj[..., eval_objective_indices]
return new_x, new_obj, eval_objective_indices
import numpy as np
from botorch.utils.multi_objective.box_decompositions.non_dominated import (
FastNondominatedPartitioning,
)
from botorch.utils.multi_objective.pareto import _is_non_dominated_loop
from gpytorch import settings
try:
from pymoo.algorithms.nsga2 import NSGA2
from pymoo.model.problem import Problem
from pymoo.optimize import minimize
from pymoo.util.termination.max_gen import MaximumGenerationTermination
def get_model_identified_hv_maximizing_set(
model,
population_size=250,
max_gen=100,
):
"""Optimize the posterior mean using NSGA-II."""
tkwargs = {
"dtype": problem.ref_point.dtype,
"device": problem.ref_point.device,
}
dim = problem.dim
class PosteriorMeanPymooProblem(Problem):
def __init__(self):
super().__init__(
n_var=dim,
n_obj=problem.num_objectives,
type_var=np.double,
)
self.xl = np.zeros(dim)
self.xu = np.ones(dim)
def _evaluate(self, x, out, *args, **kwargs):
X = torch.from_numpy(x).to(**tkwargs)
is_fantasy_model = (
isinstance(model, ModelListGP)
and model.models[0].train_targets.ndim > 2
) or (
not isinstance(model, ModelListGP) and model.train_targets.ndim > 2
)
with torch.no_grad():
with settings.cholesky_max_tries(9):
# eval in batch mode
y = model.posterior(X.unsqueeze(-2)).mean.squeeze(-2)
if is_fantasy_model:
y = y.mean(dim=-2)
out["F"] = -y.cpu().numpy()
pymoo_problem = PosteriorMeanPymooProblem()
algorithm = NSGA2(
pop_size=population_size,
eliminate_duplicates=True,
)
res = minimize(
pymoo_problem,
algorithm,
termination=MaximumGenerationTermination(max_gen),
# seed=0, # fix seed
verbose=False,
)
X = torch.tensor(
res.X,
**tkwargs,
)
X = unnormalize(X, problem.bounds)
Y = problem(X)
# compute HV
partitioning = FastNondominatedPartitioning(ref_point=problem.ref_point, Y=Y)
return partitioning.compute_hypervolume().item()
except ImportError:
NUM_DISCRETE_POINTS = 100 if SMOKE_TEST else 100000
CHUNK_SIZE = 512
def get_model_identified_hv_maximizing_set(
model,
):
"""Optimize the posterior mean over a discrete set."""
tkwargs = {
"dtype": problem.ref_point.dtype,
"device": problem.ref_point.device,
}
dim = problem.dim
discrete_set = torch.rand(NUM_DISCRETE_POINTS, dim, **tkwargs)
with torch.no_grad():
preds_list = []
for start in range(0, NUM_DISCRETE_POINTS, CHUNK_SIZE):
preds = model.posterior(
discrete_set[start : start + CHUNK_SIZE].unsqueeze(-2)
).mean.squeeze(-2)
preds_list.append(preds)
preds = torch.cat(preds_list, dim=0)
pareto_mask = _is_non_dominated_loop(preds)
pareto_X = discrete_set[pareto_mask]
pareto_X = unnormalize(pareto_X, problem.bounds)
Y = problem(pareto_X)
# compute HV
partitioning = FastNondominatedPartitioning(ref_point=problem.ref_point, Y=Y)
return partitioning.compute_hypervolume().item()
The Bayesian optimization "loop" for a batch size of 1 simply iterates the following steps:
The loop will continue to run until a pre-specified evaluation budget (in terms of cost) is exhausted.
import time
import warnings
from botorch import fit_gpytorch_mll
from botorch.exceptions import BadInitialCandidatesWarning
from botorch.sampling.normal import SobolQMCNormalSampler
warnings.filterwarnings("ignore", category=BadInitialCandidatesWarning)
warnings.filterwarnings("ignore", category=RuntimeWarning)
MC_SAMPLES = 128 if not SMOKE_TEST else 16
COST_BUDGET = 90 if not SMOKE_TEST else 54
torch.manual_seed(0)
verbose = True
N_INIT = 2 * problem.dim + 1
total_cost = {"hvkg": 0.0, "qnehvi": 0.0, "random": 0.0}
# call helper functions to generate initial training data and initialize model
train_x_hvkg, train_obj_hvkg = generate_initial_data(n=N_INIT)
train_obj_hvkg_list = list(train_obj_hvkg.split(1, dim=-1))
train_x_hvkg_list = [train_x_hvkg] * len(train_obj_hvkg_list)
mll_hvkg, model_hvkg = initialize_model(train_x_hvkg_list, train_obj_hvkg_list)
train_obj_random_list = train_obj_hvkg_list
train_x_random_list = train_x_hvkg_list
train_x_qnehvi_list, train_obj_qnehvi_list = (
train_x_hvkg_list,
train_obj_hvkg_list,
)
cost_hvkg = cost_model(train_x_hvkg).sum(dim=-1)
total_cost["hvkg"] += cost_hvkg.sum().item()
cost_qnehvi = cost_hvkg
cost_random = cost_hvkg
total_cost["qnehvi"] = total_cost["hvkg"]
total_cost["random"] = total_cost["hvkg"]
mll_qnehvi, model_qnehvi = initialize_model(train_x_qnehvi_list, train_obj_qnehvi_list)
mll_random, model_random = initialize_model(train_x_random_list, train_obj_random_list)
# fit the models
fit_gpytorch_mll(mll_hvkg)
fit_gpytorch_mll(mll_qnehvi)
fit_gpytorch_mll(mll_random)
# compute hypervolume
hv = get_model_identified_hv_maximizing_set(model=model_qnehvi)
hvs_hvkg, hvs_qehvi, hvs_qnehvi, hvs_random = [hv], [hv], [hv], [hv]
if verbose:
print(
f"\nInitial: Hypervolume (random, qHVKG, qNEHVI) = "
f"({hvs_random[-1]:>4.2f}, {hvs_hvkg[-1]:>4.2f}, {hvs_qnehvi[-1]:>4.2f}).",
end="",
)
# run N_BATCH rounds of BayesOpt after the initial random batch
iteration = 0
active_algos = {k for k, v in total_cost.items() if v < COST_BUDGET}
while any(v < COST_BUDGET for v in total_cost.values()):
t0 = time.monotonic()
if "hvkg" in active_algos:
# generate candidates
(
new_x_hvkg,
new_obj_hvkg,
eval_objective_indices_hvkg,
) = optimize_HVKG_and_get_obs_decoupled(
model_hvkg,
)
# update training points
for i in eval_objective_indices_hvkg:
train_x_hvkg_list[i] = torch.cat([train_x_hvkg_list[i], new_x_hvkg])
train_obj_hvkg_list[i] = torch.cat(
[train_obj_hvkg_list[i], new_obj_hvkg], dim=0
)
# update costs
all_outcome_cost = cost_model(new_x_hvkg)
new_cost_hvkg = all_outcome_cost[..., eval_objective_indices_hvkg].sum(dim=-1)
cost_hvkg = torch.cat([cost_hvkg, new_cost_hvkg], dim=0)
total_cost["hvkg"] += new_cost_hvkg.sum().item()
# fit models
mll_hvkg, model_hvkg = initialize_model(train_x_hvkg_list, train_obj_hvkg_list)
fit_gpytorch_mll(mll_hvkg)
if "qnehvi" in active_algos:
qnehvi_sampler = SobolQMCNormalSampler(sample_shape=torch.Size([MC_SAMPLES]))
# generate candidates
new_x_qnehvi, new_obj_qnehvi = optimize_qnehvi_and_get_observation(
model_qnehvi, train_x_qnehvi_list[0], qnehvi_sampler
)
# update training points
for i in objective_indices:
train_x_qnehvi_list[i] = torch.cat([train_x_qnehvi_list[i], new_x_qnehvi])
train_obj_qnehvi_list[i] = torch.cat(
[train_obj_qnehvi_list[i], new_obj_qnehvi[..., i : i + 1]]
)
# update costs
new_cost_qnehvi = cost_model(new_x_qnehvi).sum(dim=-1)
cost_qnehvi = torch.cat([cost_qnehvi, new_cost_qnehvi], dim=0)
total_cost["qnehvi"] += new_cost_qnehvi.sum().item()
# fit models
mll_qnehvi, model_qnehvi = initialize_model(
train_x_qnehvi_list, train_obj_qnehvi_list
)
fit_gpytorch_mll(mll_qnehvi)
if "random" in active_algos:
# generate candidates
new_x_random, new_obj_random = generate_initial_data(n=BATCH_SIZE)
# update training points
for i in objective_indices:
train_x_random_list[i] = torch.cat([train_x_random_list[i], new_x_random])
train_obj_random_list[i] = torch.cat(
[train_obj_random_list[i], new_obj_random[..., i : i + 1]]
)
# update costs
new_cost_random = cost_model(new_x_random).sum(dim=-1)
cost_random = torch.cat([cost_random, new_cost_random], dim=0)
total_cost["random"] += new_cost_random.sum().item()
# fit models
mll_random, model_random = initialize_model(
train_x_random_list, train_obj_random_list
)
fit_gpytorch_mll(mll_random)
# compute hypervolume
for label, model, hv_list in zip(
["hvkg", "qnehvi", "random"],
[model_hvkg, model_qnehvi, model_random],
[hvs_hvkg, hvs_qnehvi, hvs_random],
):
if label in active_algos:
hv = get_model_identified_hv_maximizing_set(model=model)
hv_list.append(hv)
else:
# no update performed
hv_list.append(hv_list[-1])
t1 = time.monotonic()
if verbose:
print(
f"\nBatch {iteration:>2}: Costs (random, qHVKG, qNEHVI) = "
f"({total_cost['random']:>4.2f}, {total_cost['hvkg']:>4.2f}, {total_cost['qnehvi']:>4.2f}). "
)
print(
f"\nHypervolume (random, qHVKG, qNEHVI) = "
f"({hvs_random[-1]:>4.2f}, {hvs_hvkg[-1]:>4.2f}, {hvs_qnehvi[-1]:>4.2f}), "
f"time = {t1-t0:>4.2f}.",
end="",
)
else:
print(".", end="")
iteration += 1
active_algos = {k for k, v in total_cost.items() if v < COST_BUDGET}
Initial: Hypervolume (random, qHVKG, qNEHVI) = (89.34, 89.34, 89.34).
The plot below shows the a common metric of multi-objective optimization performance, the log hypervolume difference: the log difference between the hypervolume of the true pareto front and the hypervolume of the inferred pareto set of designs identified by each algorithm. The log hypervolume difference is plotted cover cost. This is also known as inference regret.
The plot shows that HVKG identifies the Pareto optimal designs much faster than $q$NEHVI, and Sobol.
from matplotlib import pyplot as plt
%matplotlib inline
log_hv_difference_hvkg = np.log10(problem.max_hv - np.asarray(hvs_hvkg))
log_hv_difference_qnehvi = np.log10(problem.max_hv - np.asarray(hvs_qnehvi))
log_hv_difference_rnd = np.log10(problem.max_hv - np.asarray(hvs_random))
fig, ax = plt.subplots(1, 1, figsize=(8, 6))
running_cost_random = np.cumsum(cost_random.cpu()[N_INIT-1:])
running_cost_qnehvi = np.cumsum(cost_qnehvi.cpu()[N_INIT-1:])
running_cost_hvkg = np.cumsum(cost_hvkg.cpu()[N_INIT-1:])
ax.errorbar(
running_cost_random,
log_hv_difference_rnd[: len(running_cost_random)],
label="Sobol",
linewidth=1.5,
ls="--",
marker="s",
)
ax.errorbar(
running_cost_qnehvi,
log_hv_difference_qnehvi[: len(running_cost_qnehvi)],
label="qNEHVI",
linewidth=1.5,
ls="--",
marker="o"
)
ax.errorbar(
running_cost_hvkg,
log_hv_difference_hvkg[: len(running_cost_hvkg)],
label="HVKG",
linewidth=1.5,
ls="--",
marker="d"
)
ax.set(
xlabel="Cost",
ylabel="Log Hypervolume Difference",
)
ax.legend(loc="upper right")
<matplotlib.legend.Legend at 0x7f0b089d09d0>