GP Regression with Grid Structured Training Data

In this notebook, we demonstrate how to perform GP regression when your training data lies on a regularly spaced grid. For this example, we’ll be modeling a 2D function where the training data is on an evenly spaced grid on (0,1)x(0, 2) with 100 grid points in each dimension.

In other words, we have 10000 training examples. However, the grid structure of the training data will allow us to perform inference very quickly anyways.

[1]:
import gpytorch
import torch
import math

Make the grid and training data

In the next cell, we create the grid, along with the 10000 training examples and labels. After running this cell, we create three important tensors:

  • grid is a tensor that is grid_size x 2 and contains the 1D grid for each dimension.
  • train_x is a tensor containing the full 10000 training examples.
  • train_y are the labels. For this, we’re just using a simple sine function.
[11]:
grid_bounds = [(0, 1), (0, 2)]
grid_size = 100
grid = torch.zeros(grid_size, len(grid_bounds))
for i in range(len(grid_bounds)):
    grid_diff = float(grid_bounds[i][1] - grid_bounds[i][0]) / (grid_size - 2)
    grid[:, i] = torch.linspace(grid_bounds[i][0] - grid_diff, grid_bounds[i][1] + grid_diff, grid_size)

train_x = gpytorch.utils.grid.create_data_from_grid(grid)
train_y = torch.sin((train_x[:, 0] + train_x[:, 1]) * (2 * math.pi)) + torch.randn_like(train_x[:, 0]).mul(0.01)

Creating the GP Model

In the next cell we create our GP model. Like other scalable GP methods, we’ll use a scalable kernel that wraps a base kernel. In this case, we create a GridKernel that wraps an RBFKernel.

[10]:
class GridGPRegressionModel(gpytorch.models.ExactGP):
    def __init__(self, grid, train_x, train_y, likelihood):
        super(GridGPRegressionModel, self).__init__(train_x, train_y, likelihood)
        num_dims = train_x.size(-1)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.GridKernel(gpytorch.kernels.RBFKernel(), grid=grid)

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = GridGPRegressionModel(grid, train_x, train_y, likelihood)
[4]:
# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam([
    {'params': model.parameters()},  # Includes GaussianLikelihood parameters
], lr=0.1)

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

training_iter = 50
for i in range(training_iter):
    # Zero gradients from previous iteration
    optimizer.zero_grad()
    # Output from model
    output = model(train_x)
    # Calc loss and backprop gradients
    loss = -mll(output, train_y)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f   log_lengthscale: %.3f   log_noise: %.3f' % (
        i + 1, training_iter, loss.item(),
        model.covar_module.base_kernel.log_lengthscale.item(),
        model.likelihood.log_noise.item()
    ))
    optimizer.step()
/home/jrg365/gpytorch/gpytorch/utils/cholesky.py:14: UserWarning: torch.potrf is deprecated in favour of torch.cholesky and will be removed in the next release. Please use torch.cholesky instead and note that the :attr:`upper` argument in torch.cholesky defaults to ``False``.
  potrf_list = [sub_mat.potrf() for sub_mat in mat.view(-1, *mat.shape[-2:])]
/home/jrg365/gpytorch/gpytorch/lazy/added_diag_lazy_tensor.py:66: UserWarning: torch.potrf is deprecated in favour of torch.cholesky and will be removed in the next release. Please use torch.cholesky instead and note that the :attr:`upper` argument in torch.cholesky defaults to ``False``.
  ld_one = lr_flipped.potrf().diag().log().sum() * 2
Iter 1/50 - Loss: 1.150   log_lengthscale: 0.000   log_noise: 0.000
Iter 2/50 - Loss: 1.113   log_lengthscale: -0.100   log_noise: -0.100
Iter 3/50 - Loss: 1.070   log_lengthscale: -0.196   log_noise: -0.200
Iter 4/50 - Loss: 1.019   log_lengthscale: -0.290   log_noise: -0.300
Iter 5/50 - Loss: 0.963   log_lengthscale: -0.385   log_noise: -0.400
Iter 6/50 - Loss: 0.899   log_lengthscale: -0.481   log_noise: -0.500
Iter 7/50 - Loss: 0.819   log_lengthscale: -0.579   log_noise: -0.601
Iter 8/50 - Loss: 0.718   log_lengthscale: -0.676   log_noise: -0.702
Iter 9/50 - Loss: 0.612   log_lengthscale: -0.774   log_noise: -0.804
Iter 10/50 - Loss: 0.520   log_lengthscale: -0.874   log_noise: -0.907
Iter 11/50 - Loss: 0.446   log_lengthscale: -0.972   log_noise: -1.011
Iter 12/50 - Loss: 0.384   log_lengthscale: -1.066   log_noise: -1.115
Iter 13/50 - Loss: 0.327   log_lengthscale: -1.153   log_noise: -1.221
Iter 14/50 - Loss: 0.273   log_lengthscale: -1.232   log_noise: -1.326
Iter 15/50 - Loss: 0.221   log_lengthscale: -1.304   log_noise: -1.433
Iter 16/50 - Loss: 0.171   log_lengthscale: -1.368   log_noise: -1.539
Iter 17/50 - Loss: 0.117   log_lengthscale: -1.426   log_noise: -1.646
Iter 18/50 - Loss: 0.069   log_lengthscale: -1.476   log_noise: -1.753
Iter 19/50 - Loss: 0.015   log_lengthscale: -1.520   log_noise: -1.860
Iter 20/50 - Loss: -0.036   log_lengthscale: -1.558   log_noise: -1.968
Iter 21/50 - Loss: -0.087   log_lengthscale: -1.590   log_noise: -2.075
Iter 22/50 - Loss: -0.138   log_lengthscale: -1.616   log_noise: -2.182
Iter 23/50 - Loss: -0.187   log_lengthscale: -1.636   log_noise: -2.289
Iter 24/50 - Loss: -0.238   log_lengthscale: -1.651   log_noise: -2.396
Iter 25/50 - Loss: -0.291   log_lengthscale: -1.660   log_noise: -2.503
Iter 26/50 - Loss: -0.338   log_lengthscale: -1.663   log_noise: -2.610
Iter 27/50 - Loss: -0.391   log_lengthscale: -1.660   log_noise: -2.717
Iter 28/50 - Loss: -0.442   log_lengthscale: -1.652   log_noise: -2.824
Iter 29/50 - Loss: -0.487   log_lengthscale: -1.637   log_noise: -2.930
Iter 30/50 - Loss: -0.536   log_lengthscale: -1.617   log_noise: -3.037
Iter 31/50 - Loss: -0.591   log_lengthscale: -1.591   log_noise: -3.143
Iter 32/50 - Loss: -0.641   log_lengthscale: -1.559   log_noise: -3.249
Iter 33/50 - Loss: -0.697   log_lengthscale: -1.520   log_noise: -3.354
Iter 34/50 - Loss: -0.750   log_lengthscale: -1.478   log_noise: -3.460
Iter 35/50 - Loss: -0.801   log_lengthscale: -1.430   log_noise: -3.565
Iter 36/50 - Loss: -0.850   log_lengthscale: -1.379   log_noise: -3.670
Iter 37/50 - Loss: -0.900   log_lengthscale: -1.320   log_noise: -3.775
Iter 38/50 - Loss: -0.947   log_lengthscale: -1.259   log_noise: -3.880
Iter 39/50 - Loss: -1.011   log_lengthscale: -1.191   log_noise: -3.984
Iter 40/50 - Loss: -1.060   log_lengthscale: -1.123   log_noise: -4.089
Iter 41/50 - Loss: -1.115   log_lengthscale: -1.056   log_noise: -4.193
Iter 42/50 - Loss: -1.162   log_lengthscale: -0.993   log_noise: -4.298
Iter 43/50 - Loss: -1.201   log_lengthscale: -0.937   log_noise: -4.402
Iter 44/50 - Loss: -1.242   log_lengthscale: -0.892   log_noise: -4.506
Iter 45/50 - Loss: -1.282   log_lengthscale: -0.870   log_noise: -4.611
Iter 46/50 - Loss: -1.335   log_lengthscale: -0.875   log_noise: -4.715
Iter 47/50 - Loss: -1.389   log_lengthscale: -0.901   log_noise: -4.820
Iter 48/50 - Loss: -1.453   log_lengthscale: -0.934   log_noise: -4.924
Iter 49/50 - Loss: -1.506   log_lengthscale: -0.968   log_noise: -5.028
Iter 50/50 - Loss: -1.546   log_lengthscale: -0.998   log_noise: -5.132

Testing the model

In the next cell, we create a set of 400 test examples and make predictions. Note that unlike other scalable GP methods, testing is more complicated. Because our test data can be different from the training data, in general we may not be able to avoid creating a num_train x num_test (e.g., 10000 x 400) kernel matrix between the training and test data.

For this reason, if you have large numbers of test points, memory may become a concern. The time complexity should still be reasonable, however, because we will still exploit structure in the train-train covariance matrix.

[5]:
model.eval()
likelihood.eval()
n = 20
test_x = torch.zeros(int(pow(n, 2)), 2)
for i in range(n):
    for j in range(n):
        test_x[i * n + j][0] = float(i) / (n-1)
        test_x[i * n + j][1] = float(j) / (n-1)

with torch.no_grad(), gpytorch.settings.fast_pred_var():
    observed_pred = likelihood(model(test_x))
/home/jrg365/gpytorch/gpytorch/utils/cholesky.py:14: UserWarning: torch.potrf is deprecated in favour of torch.cholesky and will be removed in the next release. Please use torch.cholesky instead and note that the :attr:`upper` argument in torch.cholesky defaults to ``False``.
  potrf_list = [sub_mat.potrf() for sub_mat in mat.view(-1, *mat.shape[-2:])]
/home/jrg365/gpytorch/gpytorch/lazy/added_diag_lazy_tensor.py:66: UserWarning: torch.potrf is deprecated in favour of torch.cholesky and will be removed in the next release. Please use torch.cholesky instead and note that the :attr:`upper` argument in torch.cholesky defaults to ``False``.
  ld_one = lr_flipped.potrf().diag().log().sum() * 2
[8]:
import matplotlib.pyplot as plt
%matplotlib inline

pred_labels = observed_pred.mean.view(n, n)

# Calc abosolute error
test_y_actual = torch.sin(((test_x[:, 0] + test_x[:, 1]) * (2 * math.pi))).view(n, n)
delta_y = torch.abs(pred_labels - test_y_actual).detach().numpy()

# Define a plotting function
def ax_plot(f, ax, y_labels, title):
    im = ax.imshow(y_labels)
    ax.set_title(title)
    f.colorbar(im)

# Plot our predictive means
f, observed_ax = plt.subplots(1, 1, figsize=(4, 3))
ax_plot(f, observed_ax, pred_labels, 'Predicted Values (Likelihood)')

# Plot the true values
f, observed_ax2 = plt.subplots(1, 1, figsize=(4, 3))
ax_plot(f, observed_ax2, test_y_actual, 'Actual Values (Likelihood)')

# Plot the absolute errors
f, observed_ax3 = plt.subplots(1, 1, figsize=(4, 3))
ax_plot(f, observed_ax3, delta_y, 'Absolute Error Surface')
../../_images/examples_05_Scalable_GP_Regression_Multidimensional_Grid_GP_Regression_9_0.png
../../_images/examples_05_Scalable_GP_Regression_Multidimensional_Grid_GP_Regression_9_1.png
../../_images/examples_05_Scalable_GP_Regression_Multidimensional_Grid_GP_Regression_9_2.png
[ ]: