Information-theoretic acquisition functions
This notebook illustrates the use of some information-theoretic acquisition functions in
BoTorch for single and multi-objective optimization. We present a single-objective
example in section 1 and a multi-objective example in section 2. Before introducing
these examples, we present an overview on the different approaches and how they are
estimated.
Notation
We consider the problem of maximizing a function
f:X→RM. In the single-objective setting (M=1), the
maximum is defined as usual with respect to the total ordering over the real numbers. In
the multi-objective setting (M>1), the maximum is defined with respect to the Pareto
partial ordering over vectors. By an abuse in notation, we denote the optimal set of
inputs and outputs by
X∗=argx∈Xmaxf(x)⊆XandY∗=f(X∗)=x∈Xmaxf(x)⊂RM,
respectively for both the single and multi-objective setting. We denote the collection
of optimal input-output pairs by (X∗,Y∗).
Information-theoretic (IT) acquisition functions work by quantifying the utility of an
input x∈X based on how "informative" the corresponding
observation y∈RM will be in learning more about the
distribution of some statistic of the function S(f). Here, we define the notion of
information via the mutual information (MI):
αIT(x∣Dn)=MI(y;S(f)∣x,Dn)=H[p(y∣Dn)]−Ep(S(f)∣Dn)[H[p(y∣x,Dn,S(f)]],
where Dn=(xt,yt)t=1,…,n denotes the data set of
sampled inputs and observations and the function H denotes the differential entropy
H[p(x)]=−∫p(x)log(p(x))dx. The main
difference between existing information-theoretic acquisition functions in the
literature is the choice of statistic S and the modelling assumptions that are made in
order to estimate the resulting acquisition function. In this notebook, we focus on
three particular cases of information-theoretic acquisition functions:
Predictive Entropy Search (PES)
The PES acquisition function [1] considers the problem of learning more about the
distribution of the optimal inputs: S(f)=X∗.
αPES(x∣Dn)=MI(y;X∗∣x,Dn)=H[p(y∣Dn)]−Ep(X∗∣Dn)[H[p(y∣x,Dn,X∗)]].
Max-value Entropy Search (MES)
The MES acquisition function [2] considers the problem of learning more about the
distribution of the optimal outputs: S(f)=Y∗.
αMES(x∣Dn)=MI(y;Y∗∣x,Dn)=H[p(y∣Dn)]−Ep(Y∗∣Dn)[H[p(y∣x,Dn,Y∗)]].
Joint Entropy Search (JES)
The JES acquisition function [3] considers the problem of learning more about the
distribution of the optimal inputs and outputs: S(f)=(X∗,Y∗).
αJES(x∣Dn)=MI(y;(X∗,Y∗)∣x,Dn