# Sparse Gaussian Process Regression (SGPR)¶

## Overview¶

In this notebook, we’ll overview how to use SGPR, the method of http://proceedings.mlr.press/v5/titsias09a/titsias09a.pdf in which the inducing point locations are learned.

[1]:

import math
import torch
import gpytorch
from matplotlib import pyplot as plt

# Make plots inline
%matplotlib inline


For this example notebook, we’ll be using the elevators UCI dataset used in the paper. Running the next cell downloads a copy of the dataset that has already been scaled and normalized appropriately. For this notebook, we’ll simply be splitting the data using the first 80% of the data as training and the last 20% as testing.

Note: Running the next cell will attempt to download a ~400 KB dataset file to the current directory.

[2]:

import urllib.request
import os.path
from scipy.io import loadmat
from math import floor

if not os.path.isfile('elevators.mat'):

X = data[:, :-1]
X = X - X.min(0)[0]
X = 2 * (X / X.max(0)[0]) - 1
y = data[:, -1]

train_n = int(floor(0.8*len(X)))

train_x = X[:train_n, :].contiguous().cuda()
train_y = y[:train_n].contiguous().cuda()

test_x = X[train_n:, :].contiguous().cuda()
test_y = y[train_n:].contiguous().cuda()

[3]:

X.size()

[3]:

torch.Size([16599, 18])


## Defining the GP Model¶

We now define the GP model. For more details on the use of GP models, see our simpler examples. This model constructs a base scaled RBF kernel, and then simply wraps it in an InducingPointKernel. Other than this, everything should look the same as in the simple GP models.

[14]:

from gpytorch.means import ConstantMean
from gpytorch.kernels import ScaleKernel, RBFKernel, InducingPointKernel
from gpytorch.distributions import MultivariateNormal

class GPRegressionModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood):
super(GPRegressionModel, self).__init__(train_x, train_y, likelihood)
self.mean_module = ConstantMean()
self.base_covar_module = ScaleKernel(RBFKernel())
self.covar_module = InducingPointKernel(self.base_covar_module, inducing_points=train_x[:500, :], likelihood=likelihood)

def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return MultivariateNormal(mean_x, covar_x)

[15]:

likelihood = gpytorch.likelihoods.GaussianLikelihood().cuda()
model = GPRegressionModel(train_x, train_y, likelihood).cuda()


## Training the model¶

[16]:

# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

training_iterations = 25
def train():
for i in range(training_iterations):
# Zero backprop gradients
# Get output from model
output = model(train_x)
# Calc loss and backprop derivatives
loss = -mll(output, train_y)
loss.backward()
print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))
optimizer.step()
torch.cuda.empty_cache()

# See dkl_mnist.ipynb for explanation of this flag
with gpytorch.settings.use_toeplitz(True):
%time train()

Iter 1/25 - Loss: 0.796
Iter 2/25 - Loss: 0.786
Iter 3/25 - Loss: 0.773
Iter 4/25 - Loss: 0.762
Iter 5/25 - Loss: 0.748
Iter 6/25 - Loss: 0.735
Iter 7/25 - Loss: 0.724
Iter 8/25 - Loss: 0.711
Iter 9/25 - Loss: 0.698
Iter 10/25 - Loss: 0.685
Iter 11/25 - Loss: 0.670
Iter 12/25 - Loss: 0.657
Iter 13/25 - Loss: 0.645
Iter 14/25 - Loss: 0.631
Iter 15/25 - Loss: 0.617
Iter 16/25 - Loss: 0.602
Iter 17/25 - Loss: 0.588
Iter 18/25 - Loss: 0.574
Iter 19/25 - Loss: 0.561
Iter 20/25 - Loss: 0.545
Iter 21/25 - Loss: 0.529
Iter 22/25 - Loss: 0.515
Iter 23/25 - Loss: 0.500
Iter 24/25 - Loss: 0.484
Iter 25/25 - Loss: 0.470
CPU times: user 10.2 s, sys: 13.2 s, total: 23.4 s
Wall time: 3.29 s


## Making Predictions¶

The next cell makes predictions with SKIP. We use the same max_root_decomposition size, and we also demonstrate increasing the max preconditioner size. Increasing the preconditioner size on this dataset is not necessary, but can make a big difference in final test performance, and is often preferable to increasing the number of CG iterations if you can afford the space.

[17]:

model.eval()
likelihood.eval()

[18]:

print('Test MAE: {}'.format(torch.mean(torch.abs(preds.mean - test_y))))

Test MAE: 0.0909833088517189

[ ]: