Hadamard Multitask GP Regression

Introduction

This notebook demonstrates how to perform “Hadamard” multitask regression with kernels.IndexKernel.

This differs from the multitask gp regression example notebook in one key way: - Here, we assume that we have observations for one task per input. For each input, we specify the task of the input that we observe. (The kernel that we learn is expressed as a Hadamard product of an input kernel and a task kernel) - In the other notebook, we assume that we observe all tasks per input. (The kernel in that notebook is the Kronecker product of an input kernel and a task kernel).

Multitask regression, first introduced in this paper learns similarities in the outputs simultaneously. It’s useful when you are performing regression on multiple functions that share the same inputs, especially if they have similarities (such as being sinusodial).

Given inputs \(x\) and \(x'\), and tasks \(i\) and \(j\), the covariance between two datapoints and two tasks is given by

\begin{equation*} k([x, i], [x', j]) = k_\text{inputs}(x, x') * k_\text{tasks}(i, j) \end{equation*}

where \(k_\text{inputs}\) is a standard kernel (e.g. RBF) that operates on the inputs. \(k_\text{task}\) is a special kernel - the IndexKernel - which is a lookup table containing inter-task covariance.

[1]:
import math
import torch
import gpytorch
from matplotlib import pyplot as plt

%matplotlib inline
%load_ext autoreload
%autoreload 2

Set up training data

In the next cell, we set up the training data for this example. For each task we’ll be using 50 random points on [0,1), which we evaluate the function on and add Gaussian noise to get the training labels. Note that different inputs are used for each task.

We’ll have two functions - a sine function (y1) and a cosine function (y2).

[2]:
train_x1 = torch.rand(50)
train_x2 = torch.rand(50)

train_y1 = torch.sin(train_x1 * (2 * math.pi)) + torch.randn(train_x1.size()) * 0.2
train_y2 = torch.cos(train_x2 * (2 * math.pi)) + torch.randn(train_x2.size()) * 0.2

Set up the model

The model should be somewhat similar to the ExactGP model in the simple regression example.

The differences:

  1. The model takes two input: the inputs (x) and indices. The indices indicate which task the observation is for.
  2. Rather than just using a RBFKernel, we’re using that in conjunction with a IndexKernel.
  3. We don’t use a ScaleKernel, since the IndexKernel will do some scaling for us. (This way we’re not overparameterizing the kernel.)
[3]:
class MultitaskGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(MultitaskGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.RBFKernel()

        # We learn an IndexKernel for 2 tasks
        # (so we'll actually learn 2x2=4 tasks with correlations)
        self.task_covar_module = gpytorch.kernels.IndexKernel(num_tasks=2, rank=1)

    def forward(self,x,i):
        mean_x = self.mean_module(x)

        # Get input-input covariance
        covar_x = self.covar_module(x)
        # Get task-task covariance
        covar_i = self.task_covar_module(i)
        # Multiply the two together to get the covariance we want
        covar = covar_x.mul(covar_i)

        return gpytorch.distributions.MultivariateNormal(mean_x, covar)

likelihood = gpytorch.likelihoods.GaussianLikelihood()

train_i_task1 = torch.full_like(train_x1, dtype=torch.long, fill_value=0)
train_i_task2 = torch.full_like(train_x2, dtype=torch.long, fill_value=1)

full_train_x = torch.cat([train_x1, train_x2])
full_train_i = torch.cat([train_i_task1, train_i_task2])
full_train_y = torch.cat([train_y1, train_y2])

# Here we have two iterms that we're passing in as train_inputs
model = MultitaskGPModel((full_train_x, full_train_i), full_train_y, likelihood)

Training the model

In the next cell, we handle using Type-II MLE to train the hyperparameters of the Gaussian process.

See the simple regression example for more info on this step.

[4]:
# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam([
    {'params': model.parameters()},  # Includes GaussianLikelihood parameters
], lr=0.1)

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

for i in range(50):
    optimizer.zero_grad()
    output = model(full_train_x, full_train_i)
    loss = -mll(output, full_train_y)
    loss.backward()
    print('Iter %d/50 - Loss: %.3f' % (i + 1, loss.item()))
    optimizer.step()
Iter 1/50 - Loss: 0.956
Iter 2/50 - Loss: 0.916
Iter 3/50 - Loss: 0.877
Iter 4/50 - Loss: 0.839
Iter 5/50 - Loss: 0.803
Iter 6/50 - Loss: 0.767
Iter 7/50 - Loss: 0.730
Iter 8/50 - Loss: 0.692
Iter 9/50 - Loss: 0.653
Iter 10/50 - Loss: 0.613
Iter 11/50 - Loss: 0.575
Iter 12/50 - Loss: 0.539
Iter 13/50 - Loss: 0.504
Iter 14/50 - Loss: 0.470
Iter 15/50 - Loss: 0.435
Iter 16/50 - Loss: 0.399
Iter 17/50 - Loss: 0.362
Iter 18/50 - Loss: 0.325
Iter 19/50 - Loss: 0.287
Iter 20/50 - Loss: 0.250
Iter 21/50 - Loss: 0.215
Iter 22/50 - Loss: 0.182
Iter 23/50 - Loss: 0.151
Iter 24/50 - Loss: 0.122
Iter 25/50 - Loss: 0.093
Iter 26/50 - Loss: 0.066
Iter 27/50 - Loss: 0.040
Iter 28/50 - Loss: 0.016
Iter 29/50 - Loss: -0.005
Iter 30/50 - Loss: -0.022
Iter 31/50 - Loss: -0.037
Iter 32/50 - Loss: -0.049
Iter 33/50 - Loss: -0.059
Iter 34/50 - Loss: -0.066
Iter 35/50 - Loss: -0.071
Iter 36/50 - Loss: -0.074
Iter 37/50 - Loss: -0.075
Iter 38/50 - Loss: -0.074
Iter 39/50 - Loss: -0.071
Iter 40/50 - Loss: -0.068
Iter 41/50 - Loss: -0.065
Iter 42/50 - Loss: -0.062
Iter 43/50 - Loss: -0.060
Iter 44/50 - Loss: -0.058
Iter 45/50 - Loss: -0.057
Iter 46/50 - Loss: -0.056
Iter 47/50 - Loss: -0.057
Iter 48/50 - Loss: -0.058
Iter 49/50 - Loss: -0.060
Iter 50/50 - Loss: -0.062

Make predictions with the model

[5]:
# Set into eval mode
model.eval()
likelihood.eval()

# Initialize plots
f, (y1_ax, y2_ax) = plt.subplots(1, 2, figsize=(8, 3))

# Test points every 0.02 in [0,1]
test_x = torch.linspace(0, 1, 51)
tast_i_task1 = torch.full_like(test_x, dtype=torch.long, fill_value=0)
test_i_task2 = torch.full_like(test_x, dtype=torch.long, fill_value=1)

# Make predictions - one task at a time
# We control the task we cae about using the indices

# The gpytorch.settings.fast_pred_var flag activates LOVE (for fast variances)
# See https://arxiv.org/abs/1803.06058
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    observed_pred_y1 = likelihood(model(test_x, tast_i_task1))
    observed_pred_y2 = likelihood(model(test_x, test_i_task2))

# Define plotting function
def ax_plot(ax, train_y, train_x, rand_var, title):
    # Get lower and upper confidence bounds
    lower, upper = rand_var.confidence_region()
    # Plot training data as black stars
    ax.plot(train_x.detach().numpy(), train_y.detach().numpy(), 'k*')
    # Predictive mean as blue line
    ax.plot(test_x.detach().numpy(), rand_var.mean.detach().numpy(), 'b')
    # Shade in confidence
    ax.fill_between(test_x.detach().numpy(), lower.detach().numpy(), upper.detach().numpy(), alpha=0.5)
    ax.set_ylim([-3, 3])
    ax.legend(['Observed Data', 'Mean', 'Confidence'])
    ax.set_title(title)

# Plot both tasks
ax_plot(y1_ax, train_y1, train_x1, observed_pred_y1, 'Observed Values (Likelihood)')
ax_plot(y2_ax, train_y2, train_x2, observed_pred_y2, 'Observed Values (Likelihood)')
../../_images/examples_03_Multitask_GP_Regression_Hadamard_Multitask_GP_Regression_9_0.png