Information-theoretic acquisition functions
This notebook illustrates the use of some information-theoretic acquisition functions in BoTorch for single and multi-objective optimization. We present a single-objective example in section 1 and a multi-objective example in section 2. Before introducing these examples, we present an overview on the different approaches and how they are estimated.
Notation
We consider the problem of maximizing a function . In the single-objective setting (), the maximum is defined as usual with respect to the total ordering over the real numbers. In the multi-objective setting (), the maximum is defined with respect to the Pareto partial ordering over vectors. By an abuse in notation, we denote the optimal set of inputs and outputs by
respectively for both the single and multi-objective setting. We denote the collection of optimal input-output pairs by .
Information-theoretic acquisition functions
Information-theoretic (IT) acquisition functions work by quantifying the utility of an input based on how "informative" the corresponding observation will be in learning more about the distribution of some statistic of the function . Here, we define the notion of information via the mutual information ():
where denotes the data set of sampled inputs and observations and the function denotes the differential entropy . The main difference between existing information-theoretic acquisition functions in the literature is the choice of statistic and the modelling assumptions that are made in order to estimate the resulting acquisition function. In this notebook, we focus on three particular cases of information-theoretic acquisition functions:
Predictive Entropy Search (PES)
The PES acquisition function [1] considers the problem of learning more about the distribution of the optimal inputs: .
Max-value Entropy Search (MES)
The MES acquisition function [2] considers the problem of learning more about the distribution of the optimal outputs: .
Joint Entropy Search (JES)
The JES acquisition function [3] considers the problem of learning more about the distribution of the optimal inputs and outputs: .
Estimation
In order to estimate the three acquistion functions listed above, we make two simplfying assumptions:
[Assumption 1] We assume an independent Gaussian process prior on each objective function.
[Assumption 2] We assume a Gaussian observation likelihood.
First term
Under the modelling assumptions, the first term in each of the acquisition functions is an entropy of a Gaussian random variable, which is analytically tractable.
Second term
The second term in each of the acquisition functions is an expectation of an entropy over an intractable distribution. The expectation can be estimated using Monte Carlo, whilst the entropy has to be approximated using different strategies such as moment-matching.
Monte Carlo. To sample from the distribution over the optimal points, we can first (approximately) sample a collection of posterior paths and then optimize them to obtain the sample of optimal points for .
PES entropy estimate. In qPredictiveEntropySearch and
qMultiObjectivePredictiveEntropySearch, we approximate the entropy term arising in PES
using the expectation propagation strategy described in [4]. In particular, we first
relax the global optimality condition:
(1) This statement follows from the observation that conditioning on the optimal points is equivalent to knowing that all points lie below the objective values at the optimal inputs: .
(2) We replace the global optimality condition with the local optimality condition: , where