Information-theoretic acquisition functions
This notebook illustrates the use of some information-theoretic acquisition functions in BoTorch for single and multi-objective optimization. We present a single-objective example in section 1 and a multi-objective example in section 2. Before introducing these examples, we present an overview on the different approaches and how they are estimated.
Notation
We consider the problem of maximizing a function . In the single-objective setting (), the maximum is defined as usual with respect to the total ordering over the real numbers. In the multi-objective setting (), the maximum is defined with respect to the Pareto partial ordering over vectors. By an abuse in notation, we denote the optimal set of inputs and outputs by
respectively for both the single and multi-objective setting. We denote the collection of optimal input-output pairs by .
Information-theoretic acquisition functions
Information-theoretic (IT) acquisition functions work by quantifying the utility of an input based on how "informative" the corresponding observation will be in learning more about the distribution of some statistic of the function . Here, we define the notion of information via the mutual information ():
where denotes the data set of sampled inputs and observations and the function denotes the differential entropy . The main difference between existing information-theoretic acquisition functions in the literature is the choice of statistic and the modelling assumptions that are made in order to estimate the resulting acquisition function. In this notebook, we focus on three particular cases of information-theoretic acquisition functions:
Predictive Entropy Search (PES)
The PES acquisition function [1] considers the problem of learning more about the distribution of the optimal inputs: .