GPyTorch regression with derivative information in 2d

Introduction

In this notebook, we show how to train a GP regression model in GPyTorch of a 2-dimensional function given function values and derivative observations. We consider modeling the Franke function where the values and derivatives are contaminated with independent \(\mathcal{N}(0, 0.5)\) distributed noise.

[1]:
import torch
import gpytorch
import math
from matplotlib import cm
from matplotlib import pyplot as plt
import numpy as np

%matplotlib inline
%load_ext autoreload
%autoreload 2
/home/gpleiss/miniconda3/envs/gpytorch/lib/python3.7/site-packages/matplotlib/__init__.py:1003: UserWarning: Duplicate key in file "/home/gpleiss/.config/matplotlib/matplotlibrc", line #57
  (fname, cnt))

Franke function

The following is a vectorized implementation of the 2-dimensional Franke function (https://www.sfu.ca/~ssurjano/franke2d.html)

[2]:
def franke(X, Y):
    term1 = .75*torch.exp(-((9*X - 2).pow(2) + (9*Y - 2).pow(2))/4)
    term2 = .75*torch.exp(-((9*X + 1).pow(2))/49 - (9*Y + 1)/10)
    term3 = .5*torch.exp(-((9*X - 7).pow(2) + (9*Y - 3).pow(2))/4)
    term4 = .2*torch.exp(-(9*X - 4).pow(2) - (9*Y - 7).pow(2))

    f = term1 + term2 + term3 - term4
    dfx = -2*(9*X - 2)*9/4 * term1 - 2*(9*X + 1)*9/49 * term2 + \
          -2*(9*X - 7)*9/4 * term3 + 2*(9*X - 4)*9 * term4
    dfy = -2*(9*Y - 2)*9/4 * term1 - 9/10 * term2 + \
          -2*(9*Y - 3)*9/4 * term3 + 2*(9*Y - 7)*9 * term4

    return f, dfx, dfy

Setting up the training data

We use a grid with 100 points in \([0,1] \times [0,1]\) with 10 uniformly distributed points per dimension.

[3]:
xv, yv = torch.meshgrid([torch.linspace(0, 1, 10), torch.linspace(0, 1, 10)])
train_x = torch.cat((
    xv.contiguous().view(xv.numel(), 1),
    yv.contiguous().view(yv.numel(), 1)),
    dim=1
)

f, dfx, dfy = franke(train_x[:, 0], train_x[:, 1])
train_y = torch.stack([f, dfx, dfy], -1).squeeze(1)
train_y += 0.05 * torch.randn(train_y.size())

Setting up the model

A GP prior on the function values implies a multi-output GP prior on the function values and the partial derivatives, see 9.4 in http://www.gaussianprocess.org/gpml/chapters/RW9.pdf for more details. This allows using a MultitaskMultivariateNormal and MultitaskGaussianLikelihood to train a GP model from both function values and gradients. The resulting RBF kernel that models the covariance between the values and partial derivatives has been implemented in RBFKernelGrad and the extension of a constant mean is implemented in ConstantMeanGrad.

The RBFKernelGrad is generally worse conditioned than the RBFKernel, so we place a lower bound on the noise parameter to keep the smallest eigenvalues of the kernel matrix away from zero.

[4]:
class GPModelWithDerivatives(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(GPModelWithDerivatives, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMeanGrad()
        self.base_kernel = gpytorch.kernels.RBFKernelGrad()
        self.covar_module = gpytorch.kernels.ScaleKernel(self.base_kernel)

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultitaskMultivariateNormal(mean_x, covar_x)

likelihood = gpytorch.likelihoods.MultitaskGaussianLikelihood(num_tasks=3)  # Value + x-derivative + y-derivative
likelihood.initialize(noise=0.01*train_y.std())  # Require 1% noise (approximately)
likelihood.raw_noise.requires_grad = False
model = GPModelWithDerivatives(train_x, train_y, likelihood)

Training the model

The model training is similar to training a standard GP regression model

[5]:
# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam([
    {'params': model.parameters()},  # Includes GaussianLikelihood parameters
], lr=0.1)

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

n_iter = 50
for i in range(n_iter):
    optimizer.zero_grad()
    output = model(train_x)
    loss = -mll(output, train_y)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f   lengthscale: %.3f   noise: %.3f' % (
        i + 1, n_iter, loss.item(),
        model.covar_module.base_kernel.lengthscale.item(),
        model.likelihood.noise.item()
    ))
    optimizer.step()
Iter 1/50 - Loss: 110.599   lengthscale: 0.693   noise: 0.009
Iter 2/50 - Loss: 108.303   lengthscale: 0.644   noise: 0.009
Iter 3/50 - Loss: 103.957   lengthscale: 0.598   noise: 0.009
Iter 4/50 - Loss: 99.515   lengthscale: 0.554   noise: 0.009
Iter 5/50 - Loss: 95.034   lengthscale: 0.513   noise: 0.009
Iter 6/50 - Loss: 92.725   lengthscale: 0.473   noise: 0.009
Iter 7/50 - Loss: 89.589   lengthscale: 0.436   noise: 0.009
Iter 8/50 - Loss: 82.739   lengthscale: 0.402   noise: 0.009
Iter 9/50 - Loss: 76.709   lengthscale: 0.370   noise: 0.009
Iter 10/50 - Loss: 74.856   lengthscale: 0.340   noise: 0.009
Iter 11/50 - Loss: 69.440   lengthscale: 0.313   noise: 0.009
Iter 12/50 - Loss: 69.494   lengthscale: 0.291   noise: 0.009
Iter 13/50 - Loss: 70.631   lengthscale: 0.277   noise: 0.009
Iter 14/50 - Loss: 65.169   lengthscale: 0.272   noise: 0.009
Iter 15/50 - Loss: 61.315   lengthscale: 0.273   noise: 0.009
Iter 16/50 - Loss: 54.774   lengthscale: 0.279   noise: 0.009
Iter 17/50 - Loss: 50.667   lengthscale: 0.287   noise: 0.009
Iter 18/50 - Loss: 45.541   lengthscale: 0.297   noise: 0.009
Iter 19/50 - Loss: 42.602   lengthscale: 0.306   noise: 0.009
Iter 20/50 - Loss: 42.981   lengthscale: 0.313   noise: 0.009
Iter 21/50 - Loss: 36.291   lengthscale: 0.316   noise: 0.009
Iter 22/50 - Loss: 35.341   lengthscale: 0.315   noise: 0.009
Iter 23/50 - Loss: 30.169   lengthscale: 0.309   noise: 0.009
Iter 24/50 - Loss: 26.163   lengthscale: 0.301   noise: 0.009
Iter 25/50 - Loss: 21.606   lengthscale: 0.290   noise: 0.009
Iter 26/50 - Loss: 20.486   lengthscale: 0.279   noise: 0.009
Iter 27/50 - Loss: 17.299   lengthscale: 0.270   noise: 0.009
Iter 28/50 - Loss: 14.477   lengthscale: 0.265   noise: 0.009
Iter 29/50 - Loss: 12.003   lengthscale: 0.265   noise: 0.009
Iter 30/50 - Loss: 5.255   lengthscale: 0.268   noise: 0.009
Iter 31/50 - Loss: 3.595   lengthscale: 0.270   noise: 0.009
Iter 32/50 - Loss: 0.765   lengthscale: 0.270   noise: 0.009
Iter 33/50 - Loss: -5.474   lengthscale: 0.270   noise: 0.009
Iter 34/50 - Loss: -8.094   lengthscale: 0.268   noise: 0.009
Iter 35/50 - Loss: -8.584   lengthscale: 0.264   noise: 0.009
Iter 36/50 - Loss: -16.166   lengthscale: 0.258   noise: 0.009
Iter 37/50 - Loss: -16.839   lengthscale: 0.251   noise: 0.009
Iter 38/50 - Loss: -16.507   lengthscale: 0.245   noise: 0.009
Iter 39/50 - Loss: -19.224   lengthscale: 0.242   noise: 0.009
Iter 40/50 - Loss: -21.473   lengthscale: 0.240   noise: 0.009
Iter 41/50 - Loss: -22.681   lengthscale: 0.238   noise: 0.009
Iter 42/50 - Loss: -27.049   lengthscale: 0.234   noise: 0.009
Iter 43/50 - Loss: -25.780   lengthscale: 0.233   noise: 0.009
Iter 44/50 - Loss: -26.668   lengthscale: 0.233   noise: 0.009
Iter 45/50 - Loss: -30.996   lengthscale: 0.234   noise: 0.009
Iter 46/50 - Loss: -34.794   lengthscale: 0.233   noise: 0.009
Iter 47/50 - Loss: -34.021   lengthscale: 0.228   noise: 0.009
Iter 48/50 - Loss: -34.319   lengthscale: 0.222   noise: 0.009
Iter 49/50 - Loss: -34.365   lengthscale: 0.215   noise: 0.009
Iter 50/50 - Loss: -31.525   lengthscale: 0.209   noise: 0.009

Making predictions with the model

Model predictions are also similar to GP regression with only function values, but we need more CG iterations to get accurate estimates of the predictive variance

[6]:
# Set into eval mode
model.eval()
likelihood.eval()

# Initialize plots
fig, ax = plt.subplots(2, 3, figsize=(14, 10))

# Test points
n1, n2 = 50, 50
xv, yv = torch.meshgrid([torch.linspace(0, 1, n1), torch.linspace(0, 1, n2)])
f, dfx, dfy = franke(xv, yv)

# Make predictions
with torch.no_grad(), gpytorch.settings.max_cg_iterations(50):
    test_x = torch.stack([xv.reshape(n1*n2, 1), yv.reshape(n1*n2, 1)], -1).squeeze(1)
    predictions = likelihood(model(test_x))
    mean = predictions.mean

extent = (xv.min(), xv.max(), yv.max(), yv.min())
ax[0, 0].imshow(f, extent=extent, cmap=cm.jet)
ax[0, 0].set_title('True values')
ax[0, 1].imshow(dfx, extent=extent, cmap=cm.jet)
ax[0, 1].set_title('True x-derivatives')
ax[0, 2].imshow(dfy, extent=extent, cmap=cm.jet)
ax[0, 2].set_title('True y-derivatives')

ax[1, 0].imshow(mean[:, 0].detach().numpy().reshape(n1, n2), extent=extent, cmap=cm.jet)
ax[1, 0].set_title('Predicted values')
ax[1, 1].imshow(mean[:, 1].detach().numpy().reshape(n1, n2), extent=extent, cmap=cm.jet)
ax[1, 1].set_title('Predicted x-derivatives')
ax[1, 2].imshow(mean[:, 2].detach().numpy().reshape(n1, n2), extent=extent, cmap=cm.jet)
ax[1, 2].set_title('Predicted y-derivatives')

None
../../_images/examples_10_GP_Regression_Derivative_Information_Simple_GP_Regression_Derivative_Information_2d_11_0.png
[ ]: