# GPyTorch Classification Tutorial¶

## Introduction¶

This example is the simplest form of using an RBF kernel in an
`VariationalGP`

module for classification. This basic model is usable
when there is not much training data and no advanced techniques are
required.

In this example, we’re modeling a unit wave with period 1/2 centered with positive values @ x=0. We are going to classify the points as either +1 or -1.

Variational inference uses the assumption that the posterior distribution factors multiplicatively over the input variables. This makes approximating the distribution via the KL divergence possible to obtain a fast approximation to the posterior. For a good explanation of variational techniques, sections 4-6 of the following may be useful: https://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf

```
In [1]:
```

```
import math
import torch
import gpytorch
from matplotlib import pyplot as plt
%matplotlib inline
```

### Set up training data¶

In the next cell, we set up the training data for this example. We’ll be using 15 regularly spaced points on [0,1] which we evaluate the function on and add Gaussian noise to get the training labels. Labels are unit wave with period 1/2 centered with positive values @ x=0.

```
In [2]:
```

```
train_x = torch.linspace(0, 1, 10)
train_y = torch.sign(torch.cos(train_x * (4 * math.pi)))
```

## Setting up the classification model¶

The next cell demonstrates the simplist way to define a classification
Gaussian process model in GPyTorch. If you have already done the GP
regression
tutorial, you
have already seen how GPyTorch model construction differs from other GP
packages. In particular, the GP model expects a user to write out a
`forward`

method in a way analogous to PyTorch models. This gives the
user the most possible flexibility.

Since exact inference is intractable for GP classification, GPyTorch
approximates the classification posterior using **variational
inference.** We believe that variational inference is ideal for a number
of reasons. Firstly, variational inference commonly relies on gradient
descent techniques, which take full advantage of PyTorch’s autograd.
This reduces the amount of code needed to develop complex variational
models. Additionally, variational inference can be performed with
stochastic gradient decent, which can be extremely scalable for large
datasets.

If you are unfamiliar with variational inference, we recommend the following resources: - Variational Inference: A Review for Statisticians by David M. Blei, Alp Kucukelbir, Jon D. McAuliffe. - Scalable Variational Gaussian Process Classification by James Hensman, Alex Matthews, Zoubin Ghahramani.

### The necessary casses¶

For most GP regression models, you will need to construct the following GPyTorch objects:

- A
**GP Model**(`gpytorch.models.VariationalGP`

) - This handles basic variational inference. - A
**Likelihood**(`gpytorch.likelihoods.BernoulliLikelihood`

) - This is a good likelihood for binary classification - A
**Mean**- This defines the prior mean of the GP.

- If you don’t know which mean to use, a
`gpytorch.means.ConstantMean()`

is a good place to start.

- A
**Kernel**- This defines the prior covariance of the GP.

- If you don’t know which kernel to use, a
`gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())`

is a good place to start.

- A
**MultivariateNormal**Distribution (`gpytorch.distributions.MultivariateNormal`

) - This is the object used to represent multivariate normal distributions.

#### The GP Model¶

The `VariationalGP`

model is GPyTorch’s simplist approximate inference
model. It approximates the true posterior with a multivariate normal
distribution. The model defines all the variational parameters that are
needed, and keeps all of this information under the hood.

The components of a user built `VariationalGP`

model in GPyTorch are:

- An
`__init__`

method that takes the training data as an input. The`__init__`

function will also construct a mean module, a kernel module, and whatever other modules might be necessary. - A
`forward`

method that takes in some \(n \times d\) data`x`

and returns a MultivariateNormal with the*prior*mean and covariance evaluated at`x`

. In other words, we return the vector \(\mu(x)\) and the \(n \times n\) matrix \(K_{xx}\) representing the prior mean and covariance matrix of the GP.

(For those who are unfamiliar with GP classification: even though we are
performing classification, the GP model still returns a
`MultivariateNormal`

. The likelihood transforms this latent Gaussian
variable into a Bernoulli variable)

Here we present a simple classification model, but it is posslbe to construct more complex models. See some of the scalable classification examples or deep kernel learning examples for some other examples.

```
In [3]:
```

```
class GPClassificationModel(gpytorch.models.VariationalGP):
def __init__(self, train_x):
super(GPClassificationModel, self).__init__(train_x)
self.mean_module = gpytorch.means.ConstantMean()
self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
latent_pred = gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
return latent_pred
# Initialize model and likelihood
model = GPClassificationModel(train_x)
likelihood = gpytorch.likelihoods.BernoulliLikelihood()
```

### Model modes¶

Like most PyTorch modules, the `ExactGP`

has a `.train()`

and
`.eval()`

mode. - `.train()`

mode is for optimizing variational
parameters model hyperameters. - `.eval()`

mode is for computing
predictions through the model posterior.

## Learn the variational parameters (and other hyperparameters)¶

In the next cell, we optimize the variational parameters of our Gaussian process. In addition, this optimization loop also performs Type-II MLE to train the hyperparameters of the Gaussian process.

The most obvious difference here compared to many other GP
implementations is that, as in standard PyTorch, the core training loop
is written by the user. In GPyTorch, we make use of the standard PyTorch
optimizers as from `torch.optim`

, and all trainable parameters of the
model should be of type `torch.nn.Parameter`

. The variational
parameters are predefined as part of the `VariationalGP`

model.

In most cases, the boilerplate code below will work well. It has the same basic components as the standard PyTorch training loop:

- Zero all parameter gradients
- Call the model and compute the loss
- Call backward on the loss to fill in gradients
- Take a step on the optimizer

However, defining custom training loops allows for greater flexibility. For example, it is possible to learn the variational parameters and kernel hyperparameters with different learning rates.

```
In [4]:
```

```
# Find optimal model hyperparameters
model.train()
likelihood.train()
# Use the adam optimizer
optimizer = torch.optim.Adam([
{'params': model.parameters()},
# BernoulliLikelihood has no parameters
], lr=0.1)
# "Loss" for GPs - the marginal log likelihood
# num_data refers to the amount of training data
mll = gpytorch.mlls.VariationalMarginalLogLikelihood(likelihood, model, num_data=len(train_y))
training_iter = 50
for i in range(training_iter):
# Zero backpropped gradients from previous iteration
optimizer.zero_grad()
# Get predictive output
output = model(train_x)
# Calc loss and backprop gradients
loss = -mll(output, train_y)
loss.backward()
print('Iter %d/%d - Loss: %.3f' % (
i + 1, training_iter, loss.item(),
))
optimizer.step()
```

```
Iter 1/50 - Loss: 326.033
Iter 2/50 - Loss: 229.205
Iter 3/50 - Loss: 147.791
Iter 4/50 - Loss: 94.322
Iter 5/50 - Loss: 58.356
Iter 6/50 - Loss: 34.048
Iter 7/50 - Loss: 20.180
Iter 8/50 - Loss: 14.053
Iter 9/50 - Loss: 12.395
Iter 10/50 - Loss: 11.749
Iter 11/50 - Loss: 10.924
Iter 12/50 - Loss: 10.087
Iter 13/50 - Loss: 9.328
Iter 14/50 - Loss: 8.994
Iter 15/50 - Loss: 7.611
Iter 16/50 - Loss: 6.660
Iter 17/50 - Loss: 6.201
Iter 18/50 - Loss: 5.975
Iter 19/50 - Loss: 5.992
Iter 20/50 - Loss: 5.758
Iter 21/50 - Loss: 5.404
Iter 22/50 - Loss: 5.043
Iter 23/50 - Loss: 4.632
Iter 24/50 - Loss: 4.990
Iter 25/50 - Loss: 5.021
Iter 26/50 - Loss: 5.250
Iter 27/50 - Loss: 4.832
Iter 28/50 - Loss: 4.951
Iter 29/50 - Loss: 5.426
Iter 30/50 - Loss: 4.512
Iter 31/50 - Loss: 5.000
Iter 32/50 - Loss: 4.938
Iter 33/50 - Loss: 4.912
Iter 34/50 - Loss: 4.636
Iter 35/50 - Loss: 4.414
Iter 36/50 - Loss: 4.946
Iter 37/50 - Loss: 4.769
Iter 38/50 - Loss: 4.891
Iter 39/50 - Loss: 4.702
Iter 40/50 - Loss: 4.603
Iter 41/50 - Loss: 4.739
Iter 42/50 - Loss: 4.913
Iter 43/50 - Loss: 4.942
Iter 44/50 - Loss: 4.547
Iter 45/50 - Loss: 4.944
Iter 46/50 - Loss: 4.412
Iter 47/50 - Loss: 4.744
Iter 48/50 - Loss: 4.679
Iter 49/50 - Loss: 4.549
Iter 50/50 - Loss: 4.697
```

## Make predictions with the model¶

In the next cell, we make predictions with the model. To do this, we simply put the model and likelihood in eval mode, and call both modules on the test data.

In `.eval()`

mode, when we call `model()`

- we get GP’s latent
posterior predictions. These will be MultivariateNormal distributions.
But since we are performing binary classification, we want to transform
these outputs to classification probabilities using our likelihood.

When we call `likelihood(model())`

, we get a
`torch.distributions.Bernoulli`

distribution, which represents our
posterior probability that the data points belong to the positive class.

```
f_preds = model(test_x)
y_preds = likelihood(model(test_x))
f_mean = f_preds.mean
f_samples = f_preds.sample(sample_shape=torch.Size((1000,))
```

The `gpytorch.fast_pred_var`

context is not needed, but here we are
giving a preview of using one of our cool features, getting faster
predictive distributions using
LOVE.

```
In [5]:
```

```
# Go into eval mode
model.eval()
likelihood.eval()
with torch.no_grad(), gpytorch.fast_pred_var():
# Test x are regularly spaced by 0.01 0,1 inclusive
test_x = torch.linspace(0, 1, 101)
# Get classification predictions
observed_pred = likelihood(model(test_x))
# Initialize fig and axes for plot
f, ax = plt.subplots(1, 1, figsize=(4, 3))
ax.plot(train_x.numpy(), train_y.numpy(), 'k*')
# Get the predicted labels (probabilites of belonging to the positive class)
# Transform these probabilities to be 0/1 labels
pred_labels = observed_pred.mean.ge(0.5).float().mul(2).sub(1)
ax.plot(test_x.numpy(), pred_labels.numpy(), 'b')
ax.set_ylim([-3, 3])
ax.legend(['Observed Data', 'Mean', 'Confidence'])
```

```
In [ ]:
```

```
```