Bayesian optimization with input warping
BO with Warped Gaussian Processes
In this tutorial, we illustrate how to use learned input warping functions for robust Bayesian Optimization when the outcome may be non-stationary functions. When the lengthscales are non-stationarity in the raw input space, learning a warping function that maps raw inputs to a warped space where the lengthscales are stationary can be useful, because then standard stationary kernels can be used to effectively model the function.
In general, for a relatively simple setup (like this one), we recommend using
Ax, since this will simplify your setup (including the amount of code
you need to write) considerably. See Ax's
Modular BoTorch tutorial tutorial. To
use input warping with MODULAR_BOTORCH, we can pass the warp_tf, constructed as
below, by adding input_transform=warp_tf argument to the Surrogate(...) call.
We consider use a Kumaraswamy CDF as the class of input warping function and learn the concentration parameters ( and ). Kumaraswamy CDFs are quite flexible and map inputs in [0, 1] to outputs in [0, 1]. This work follows the Beta CDF input warping proposed by Snoek et al., but replaces the Beta distribution Kumaraswamy distribution, which has a differentiable and closed-form CDF.
This enables maximum likelihood (or maximum a posteriori) estimation of the CDF hyperparameters using gradient methods to maximize the likelihood (or posterior probability) jointly with the GP hyperparameters. (Snoek et al. use a fully Bayesian treatment of the CDF parameters). Each input dimension is transformed using a separate warping function.
We use the Log Noisy Expected Improvement (qLogNEI) acquisition function to optimize a synthetic Hartmann6 test function. The standard problem is