# Variational GPs w/ Multiple Outputs¶

## Introduction¶

In this example, we will demonstrate how to construct approximate/variational GPs that can model vector-valued functions (e.g. multitask/multi-output GPs).

[1]:

import math
import torch
import gpytorch
import tqdm
from matplotlib import pyplot as plt

%matplotlib inline


### Set up training data¶

In the next cell, we set up the training data for this example. We’ll be using 100 regularly spaced points on [0,1] which we evaluate the function on and add Gaussian noise to get the training labels.

We’ll have two functions - a sine function (y1) and a cosine function (y2).

For MTGPs, our train_targets will actually have two dimensions: with the second dimension corresponding to the different tasks.

[2]:

train_x = torch.linspace(0, 1, 100)

train_y = torch.stack([
torch.sin(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2,
torch.cos(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2,
], -1)


This model should look like a combination of the SVGP regression example and the batch independent multitask GP example.

We are going to construct a batch variational GP - using a CholeskyVariationalDistribution and a VariationalStrategy. Each of the batch dimensions is going to correspond to one of the outputs. In addition, we will wrap the variational strategy to make the output appear as a MultitaskMultivariateNormal distribution. Here are the changes that we’ll need to make:

1. Our inducing points will need to have shape 2 x m x 1 (where m is the number of inducing points). This ensures that we learn a different set of inducing points for each output dimension.
2. The CholeskyVariationalDistribution, mean module, and covariance modules will all need to include a batch_shape=torch.Size([2]) argument. This ensures that we learn a different set of variational parameters and hyperparameters for each output dimension.
3. The VariationalStrategy object should be wrapped in a ~gpytorch.variational.MultitaskVariationalStrategy. This converts the output from a batch MultivariateNormal distribution into a MultitaskMultivariateNormal distribution. This makes it possible to use multitask likelihooods (e.g. ~gpytorch.likelihoods.MultitaskGaussianLikelihood, ~gpytorch.likelihoods.SoftmaxLikelihood, etc.) with this model.
[3]:

class MultitaskGPModel(gpytorch.models.ApproximateGP):
def __init__(self, inducing_points):
# We have to mark the CholeskyVariationalDistribution as batch
# so that we learn a variational distribution for each task
variational_distribution = gpytorch.variational.CholeskyVariationalDistribution(
inducing_points.size(-2), batch_shape=torch.Size([2])
)

# We have to wrap the VariationalStrategy in a MultitaskVariationalStrategy
# so that the output will be a MultitaskMultivariateNormal rather than a batch output
gpytorch.variational.VariationalStrategy(
self, inducing_points, variational_distribution, learn_inducing_locations=True
)

super().__init__(variational_strategy)

# The mean and covariance modules should be marked as batch
# so we learn a different set of hyperparameters
self.mean_module = gpytorch.means.ConstantMean(batch_shape=torch.Size([2]))
self.covar_module = gpytorch.kernels.ScaleKernel(
gpytorch.kernels.RBFKernel(batch_shape=torch.Size([2])),
batch_shape=torch.Size([2])
)

def forward(self, x):
# The forward function should be written as if we were dealing with each output
# dimension in batch
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

# The shape of the inducing points should be (2 x m x 1) - so that we learn different inducing
# points for each output
inducing_points = torch.rand(2, 16, 1)

# We're going to use a multitask likeihood with this model


With all of the batch_shape arguments - it may look like we’re learning a batch of GPs. However, the MultitaskVariationalStrategy object converts this into a (non-batch) MultitaskMultivariateNormal, where each of the tasks represents an output.

[4]:

model(train_x)

[4]:

MultitaskMultivariateNormal(loc: torch.Size([200]))

[5]:

model(train_x).rsample().shape

[5]:

torch.Size([100, 2])


### Train the model¶

This code should look similar to the SVGP training code

[6]:

# this is for running the notebook in our testing framework
import os
smoke_test = ('CI' in os.environ)
num_epochs = 1 if smoke_test else 20

model.train()
likelihood.train()

# We use SGD here, rather than Adam. Emperically, we find that SGD is better for variational regression
{'params': model.parameters()},
{'params': likelihood.parameters()},
], lr=0.01)

# Our loss object. We're using the VariationalELBO, which essentially just computes the ELBO
mll = gpytorch.mlls.VariationalELBO(likelihood, model, num_data=train_y.size(0))

# We use more CG iterations here because the preconditioner introduced in the NeurIPS paper seems to be less
# effective for VI.
epochs_iter = tqdm.tqdm_notebook(range(num_epochs), desc="Epoch")
for i in epochs_iter:
# Within each iteration, we will go over each minibatch of data
for x_batch, y_batch in minibatch_iter:
output = model(x_batch)
loss = -mll(output, y_batch)
minibatch_iter.set_postfix(loss=loss.item())
loss.backward()
optimizer.step()




### Make predictions with the model¶

[7]:

# Set into eval mode
model.eval()
likelihood.eval()

# Initialize plots
f, (y1_ax, y2_ax) = plt.subplots(1, 2, figsize=(8, 3))

# Make predictions
test_x = torch.linspace(0, 1, 51)
predictions = likelihood(model(test_x))
mean = predictions.mean
lower, upper = predictions.confidence_region()

# This contains predictions for both tasks, flattened out
# The first half of the predictions is for the first task
# The second half is for the second task

# Plot training data as black stars
y1_ax.plot(train_x.detach().numpy(), train_y[:, 0].detach().numpy(), 'k*')
# Predictive mean as blue line
y1_ax.plot(test_x.numpy(), mean[:, 0].numpy(), 'b')
y1_ax.fill_between(test_x.numpy(), lower[:, 0].numpy(), upper[:, 0].numpy(), alpha=0.5)
y1_ax.set_ylim([-3, 3])
y1_ax.legend(['Observed Data', 'Mean', 'Confidence'])
y1_ax.set_title('Observed Values (Likelihood)')

# Plot training data as black stars
y2_ax.plot(train_x.detach().numpy(), train_y[:, 1].detach().numpy(), 'k*')
# Predictive mean as blue line
y2_ax.plot(test_x.numpy(), mean[:, 1].numpy(), 'b')