gpytorch.kernels

If you don’t know what kernel to use, we recommend that you start out with a gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel).

Kernel

class gpytorch.kernels.Kernel(has_lengthscale=False, ard_num_dims=None, batch_size=1, active_dims=None, lengthscale_prior=None, param_transform=<MagicMock id='140117993298296'>, inv_param_transform=None, eps=1e-06, **kwargs)[source]

Kernels in GPyTorch are implemented as a gpytorch.Module that, when called on two torch.tensor objects x1 and x2 returns either a torch.tensor or a gpytorch.lazy.LazyTensor that represents the covariance matrix between x1 and x2.

In the typical use case, to extend this class means to implement the forward() method.

Note

The __call__() does some additional internal work. In particular, all kernels are lazily evaluated so that, in some cases, we can index in to the kernel matrix before actually computing it. Furthermore, many built in kernel modules return LazyTensors that allow for more efficient inference than if we explicitly computed the kernel matrix itselfself.

As a result, if you want to use a gpytorch.kernels.Kernel object just to get an actual torch.tensor representing the covariance matrix, you may need to call the gpytorch.lazy.LazyTensor.evaluate() method on the output.

This base Kernel class includes a lengthscale parameter \(\Theta\), which is used by many common kernel functions. There are a few options for the lengthscale:

  • Default: No lengthscale (i.e. \(\Theta\) is the identity matrix).
  • Single lengthscale: One lengthscale can be applied to all input dimensions/batches (i.e. \(\Theta\) is a constant diagonal matrix). This is controlled by setting has_lengthscale=True.
  • ARD: Each input dimension gets its own separate lengthscale (i.e. \(\Theta\) is a non-constant diagonal matrix). This is controlled by the ard_num_dims keyword argument (as well has has_lengthscale=True).

In batch-mode (i.e. when \(x_1\) and \(x_2\) are batches of input matrices), each batch of data can have its own lengthscale parameter by setting the batch_size keyword argument to the appropriate number of batches.

Note

The lengthscale parameter is parameterized on a log scale to constrain it to be positive. You can set a prior on this parameter using the lengthscale_prior argument.

Base Args:
has_lengthscale (bool):
Set this if the kernel has a lengthscale. Default: False.
ard_num_dims (int, optional):
Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. Default: None
batch_size (int, optional):
Set this if you want a separate lengthscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: 1
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
lengthscale_prior (Prior, optional):
Set this if you want to apply a prior to the lengthscale parameter. Default: None
param_transform (function, optional):
Set this if you want to use something other than softplus to ensure positiveness of parameters.
inv_param_transform (function, optional):
Set this to allow setting parameters directly in transformed space and sampling from priors. Automatically inferred for common transformations such as torch.exp or torch.nn.functional.softplus.
eps (float):
The minimum value that the lengthscale can take (prevents divide by zero errors). Default: 1e-6.
Base Attributes:
lengthscale (Tensor):
The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_size arguments.
Example:
>>> covar_module = gpytorch.kernels.LinearKernel()
>>> x1 = torch.randn(50, 3)
>>> lazy_covar_matrix = covar_module(x1) # Returns a RootLazyTensor
>>> tensor_covar_matrix = lazy_covar_matrix.evaluate() # Gets the actual tensor for this kernel matrix
forward(x1, x2, diag=False, batch_dims=None, **params)[source]

Computes the covariance between x1 and x2. This method should be imlemented by all Kernel subclasses.

Note

All non-compositional kernels should use the gpytorch.kernels.Kernel._create_input_grid() method to create a meshgrid between x1 and x2 (if necessary).

Do not manually create the grid - this is inefficient and will cause erroneous behavior in certain evaluation modes.

Args:
  • x1 (Tensor n x d or b x n x d)
  • x2 (Tensor m x d or b x m x d)
  • diag (bool):
    Should the Kernel compute the whole kernel, or just the diag? For most Kernels, this option will be passed into create_input_grid
  • batch_dims (tuple, optional):
    If this option is passed in, it will tell the tensor which of the three dimensions are batch dimensions. Currently accepts: standard mode (either None or (0,)) or (0, 2) for use with Additive/Multiplicative kernels
Returns:
  • Tensor or gpytorch.lazy.LazyTensor.
    The exact size depends on the kernel’s evaluation mode:
    • full_covar: n x m or b x n x m
    • full_covar with batch_dims=(0, 2): k x n x m or b x k x n x m
    • diag: n or b x n
    • diag with batch_dims=(0, 2): k x n or b x k x n

Standard Kernels

CosineKernel

class gpytorch.kernels.CosineKernel(active_dims=None, batch_size=1, period_length_prior=None, eps=1e-06, param_transform=<MagicMock id='140117992849304'>, inv_param_transform=None, **kwargs)[source]

Computes a covariance matrix based on the cosine kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{Cosine}}(\mathbf{x_1}, \mathbf{x_2}) = \cos \left( \pi \Vert \mathbf{x_1} - \mathbf{x_2} \Vert_2 / p \right) \end{equation*}\]

where \(p\) is the periord length parameter.

Args:
batch_size (int, optional):
Set this if you want a separate lengthscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: 1
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
log_period_length_prior (Prior, optional):
Set this if you want to apply a prior to the period length parameter. Default: None
eps (float):
The minimum value that the lengthscale/period length can take (prevents divide by zero errors). Default: 1e-6.
param_transform (function, optional):
Set this if you want to use something other than softplus to ensure positiveness of parameters.
inv_param_transform (function, optional):
Set this to allow setting parameters directly in transformed space and sampling from priors. Automatically inferred for common transformations such as torch.exp or torch.nn.functional.softplus.
Attributes:
period_length (Tensor):
The period length parameter. Size = batch_size x 1 x 1.
Example:
>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.CosineKernel())
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.CosineKernel())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.CosineKernel(batch_size=2))
>>> covar = covar_module(x)  # Output: LazyVariable of size (2 x 10 x 10)

LinearKernel

class gpytorch.kernels.LinearKernel(num_dimensions, variance_prior=None, offset_prior=None, active_dims=None)[source]

Computes a covariance matrix based on the Linear kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_\text{Linear}(\mathbf{x_1}, \mathbf{x_2}) = (\mathbf{x_1} - \mathbf{o})^\top (\mathbf{x_2} - \mathbf{o}) + v. \end{equation*}\]

where

  • \(\mathbf o\) is an offset parameter.
  • \(v\) is a variance parameter.

Note

To implement this efficiently, we use a gpytorch.lazy.RootLazyTensor during training and a gpytorch.lazy.MatmulLazyTensor during test. These lazy tensors represent matrices of the form \(K = XX^{\top}\) and \(K = XZ^{\top}\). This makes inference efficient because a matrix-vector product \(Kv\) can be computed as \(Kv=X(X^{\top}v)\), where the base multiply \(Xv\) takes only \(O(nd)\) time and space.

Args:
num_dimensions (int):
Number of data dimensions to expect. This is necessary to create the offset parameter.
variance_prior (gpytorch.priors.Prior):
Prior over the variance parameter (default None).
offset_prior (gpytorch.priors.Prior):
Prior over the offset parameter (default None).
active_dims (list):
List of data dimensions to operate on. len(active_dims) should equal num_dimensions.

MaternKernel

class gpytorch.kernels.MaternKernel(nu=2.5, ard_num_dims=None, batch_size=1, active_dims=None, lengthscale_prior=None, param_transform=<MagicMock id='140117992491440'>, inv_param_transform=None, eps=1e-06, **kwargs)[source]

Computes a covariance matrix based on the Matern kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{Matern}}(\mathbf{x_1}, \mathbf{x_2}) = \frac{2^{1 - \nu}}{\Gamma(\nu)} \left( \sqrt{2 \nu} d \right) K_\nu \left( \sqrt{2 \nu} d \right) \end{equation*}\]

where

  • \(d = (\mathbf{x_1} - \mathbf{x_2})^\top \Theta^{-1} (\mathbf{x_1} - \mathbf{x_2})\) is the distance between \(x_1\) and \(x_2\) scaled by the lengthscale parameter \(\Theta\).
  • \(\nu\) is a smoothness parameter (takes values 1/2, 3/2, or 5/2). Smaller values are less smooth.
  • \(K_\nu\) is a modified Bessel function.

There are a few options for the lengthscale parameter \(\Theta\): See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Args:
nu (float):
The smoothness parameter: either 1/2, 3/2, or 5/2.
ard_num_dims (int, optional):
Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. Default: None
batch_size (int, optional):
Set this if you want a separate lengthscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: 1
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
log_lengthscale_prior (Prior, optional):
Set this if you want to apply a prior to the lengthscale parameter. Default: None
param_transform (function, optional):
Set this if you want to use something other than softplus to ensure positiveness of parameters.
inv_param_transform (function, optional):
Set this to allow setting parameters directly in transformed space and sampling from priors. Automatically inferred for common transformations such as torch.exp or torch.nn.functional.softplus.
eps (float):
The minimum value that the lengthscale can take (prevents divide by zero errors). Default: 1e-6.
Attributes:
lengthscale (Tensor):
The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_size arguments.
Example:
>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel(nu=0.5))
>>> # Non-batch: ARD (different lengthscale for each input dimension)
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel(nu=0.5, ard_num_dims=5))
>>> covar = covar_module(x)  # Output: LazyVariable of size (10 x 10)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel(nu=0.5))
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel(nu=0.5, batch_size=2))
>>> covar = covar_module(x)  # Output: LazyVariable of size (2 x 10 x 10)

PeriodicKernel

class gpytorch.kernels.PeriodicKernel(active_dims=None, batch_size=1, lengthscale_prior=None, period_length_prior=None, param_transform=<MagicMock id='140117992490544'>, inv_param_transform=None, eps=1e-06, **kwargs)[source]

Computes a covariance matrix based on the periodic kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{Periodic}}(\mathbf{x_1}, \mathbf{x_2}) = \exp \left( \frac{2 \sin^2 \left( \pi \Vert \mathbf{x_1} - \mathbf{x_2} \Vert_1 / p \right) } { \ell^2 } \right) \end{equation*}\]

where

  • \(p\) is the periord length parameter.
  • \(\ell\) is a lengthscale parameter.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Note

This kernel does not have an ARD lengthscale option.

Args:
batch_size (int, optional):
Set this if you want a separate lengthscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: 1.
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
period_length_prior (Prior, optional):
Set this if you want to apply a prior to the period length parameter. Default: None.
lengthscale_prior (Prior, optional):
Set this if you want to apply a prior to the lengthscale parameter. Default: None.
param_transform (function, optional):
Set this if you want to use something other than softplus to ensure positiveness of parameters.
inv_param_transform (function, optional):
Set this to allow setting parameters directly in transformed space and sampling from priors. Automatically inferred for common transformations such as torch.exp or torch.nn.functional.softplus.
eps (float):
The minimum value that the lengthscale/period length can take (prevents divide by zero errors). Default: 1e-6.
Attributes:
lengthscale (Tensor):
The lengthscale parameter. Size = batch_size x 1 x 1.
period_length (Tensor):
The period length parameter. Size = batch_size x 1 x 1.
Example:
>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.PeriodicKernel())
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.PeriodicKernel())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.PeriodicKernel(batch_size=2))
>>> covar = covar_module(x)  # Output: LazyVariable of size (2 x 10 x 10)

RBFKernel

class gpytorch.kernels.RBFKernel(ard_num_dims=None, batch_size=1, active_dims=None, lengthscale_prior=None, param_transform=<MagicMock id='140117992791624'>, inv_param_transform=None, eps=1e-06, **kwargs)[source]

Computes a covariance matrix based on the RBF (squared exponential) kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{RBF}}(\mathbf{x_1}, \mathbf{x_2}) = \exp \left( -\frac{1}{2} (\mathbf{x_1} - \mathbf{x_2})^\top \Theta^{-1} (\mathbf{x_1} - \mathbf{x_2}) \right) \end{equation*}\]

where \(\Theta\) is a lengthscale parameter. See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Args:
ard_num_dims (int, optional):
Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. Default: None
batch_size (int, optional):
Set this if you want a separate lengthscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: 1.
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
lengthscale_prior (Prior, optional):
Set this if you want to apply a prior to the lengthscale parameter. Default: None.
param_transform (function, optional):
Set this if you want to use something other than softplus to ensure positiveness of parameters.
inv_param_transform (function, optional):
Set this to allow setting parameters directly in transformed space and sampling from priors. Automatically inferred for common transformations such as torch.exp or torch.nn.functional.softplus.
eps (float):
The minimum value that the lengthscale can take (prevents divide by zero errors). Default: 1e-6.
Attributes:
lengthscale (Tensor):
The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_size arguments.
Example:
>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
>>> # Non-batch: ARD (different lengthscale for each input dimension)
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(ard_num_dims=5))
>>> covar = covar_module(x)  # Output: LazyTensor of size (10 x 10)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(batch_size=2))
>>> covar = covar_module(x)  # Output: LazyTensor of size (2 x 10 x 10)

SpectralMixtureKernel

class gpytorch.kernels.SpectralMixtureKernel(num_mixtures=None, ard_num_dims=1, batch_size=1, active_dims=None, eps=1e-06, mixture_scales_prior=None, mixture_means_prior=None, mixture_weights_prior=None, param_transform=<MagicMock id='140117993550624'>, inv_param_transform=None, **kwargs)[source]

Computes a covariance matrix based on the Spectral Mixture Kernel between inputs \(mathbf{x_1}\) and \(mathbf{x_2}\): It was proposed in Gaussian Process Kernels for Pattern Discovery and Extrapolation.

Note

Unlike other kernels, * ard_num_dums must equal the number of dimensions of the data * batch_size must equal the batch size of the data (1 if the data is not batched) * This kernel should not be combined with a gpytorch.kernels.ScaleKernel.

Args:
num_mixtures (int, optional):
The number of components in the mixture.
ard_num_dims (int, optional):
Set this to match the dimensionality of the input. It should be d if x1 is a n x d matrix. Default: 1
batch_size (int, optional):
Set this if the data is batch of input data. It should be b if x1 is a b x n x d tensor. Default: 1
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
param_transform (function, optional):
Set this if you want to use something other than softplus to ensure positiveness of parameters.
inv_param_transform (function, optional):
Set this to allow setting parameters directly in transformed space and sampling from priors. Automatically inferred for common transformations such as torch.exp or torch.nn.functional.softplus.
eps (float):
The minimum value that the lengthscale can take (prevents divide by zero errors). Default: 1e-6.
Attributes:
mixture_lengthscale (Tensor):
The lengthscale parameter. Given k mixture components, and b x n x d data, this will be of size b x k x 1 x d.
mixture_means (Tensor):
The mixture mean parameters (b x k x 1 x d).
mixture_weights (Tensor):
The mixture weight parameters (b x k).
Example:
>>> # Non-batch
>>> x = torch.randn(10, 5)
>>> covar_module = gpytorch.kernels.SpectralMixtureKernel(num_mixtures=4, ard_dum_dims=5)
>>> covar = covar_module(x)  # Output: LazyVariable of size (10 x 10)
>>>
>>> # Batch
>>> batch_x = torch.randn(2, 10, 5)
>>> covar_module = gpytorch.kernels.SpectralMixtureKernel(num_mixtures=4, batch_size=2, ard_dum_dims=5)
>>> covar = covar_module(x)  # Output: LazyVariable of size (10 x 10)

WhiteNoiseKernel

class gpytorch.kernels.WhiteNoiseKernel(variances)[source]

A “random” kernel that adds pre-specified white noise variance to training inputs. This is most commonly used in conjunction with another kernel.

Note

The white noise is only applied to the portion of the kernel matrix that represents the training data.

Args:
variances (Tensor n x 1 or b x n x 1):
The random variances to be applied to training inputs. b and n should correspond to the size of the training data.
Example:
>>> train_x = torch.randn(10, 5)
>>> wn_variances = torch.randn(10)
>>>
>>> covar_module = gpytorch.kernels.ScaleKernel(
>>>     gpytorch.kernels.WhiteNoiseKernel(wn_variances) + gpytorch.kernels.MaternKernel(nu=0.5)
>>> )
>>> covar = covar_module(train_x)  # Output: LazyVariable of size (10 x 10) (Matern kernel + random variances)

Composition/Decoration Kernels

AdditiveKernel

class gpytorch.kernels.AdditiveKernel(*kernels)[source]

A Kernel that supports summing over multiple component kernels.

Example:
>>> covar_module = RBFKernel(active_dims=torch.tensor([1])) + RBFKernel(active_dims=torch.tensor([2]))
>>> x1 = torch.randn(50, 2)
>>> additive_kernel_matrix = covar_module(x1)

AdditiveStructureKernel

class gpytorch.kernels.AdditiveStructureKernel(base_kernel, num_dims, active_dims=None)[source]

A Kernel decorator for kernels with additive structure. If a kernel decomposes additively, then this module will be much more computationally efficient.

A kernel function k decomposes additively if it can be written as

\[\begin{equation*} k(\mathbf{x_1}, \mathbf{x_2}) = k'(x_1^{(1)}, x_2^{(1)}) + \ldots + k'(x_1^{(d)}, x_2^{(d)}) \end{equation*}\]

for some kernel \(k'\) that operates on a subset of dimensions.

Given a b x n x d input, AdditiveStructureKernel computes d one-dimensional kernels (using the supplied base_kernel), and then adds the component kernels together. Unlike AdditiveKernel, AdditiveStructureKernel computes each of the additive terms in batch, making it very fast.

Args:
base_kernel (Kernel):
The kernel to approximate with KISS-GP
num_dims (int):
The dimension of the input data.
active_dims (tuple of ints, optional):
Passed down to the base_kernel.

ProductKernel

class gpytorch.kernels.ProductKernel(*kernels)[source]

A Kernel that supports elementwise multiplying multiple component kernels together.

Example:
>>> covar_module = RBFKernel(active_dims=torch.tensor([1])) * RBFKernel(active_dims=torch.tensor([2]))
>>> x1 = torch.randn(50, 2)
>>> kernel_matrix = covar_module(x1) # The RBF Kernel already decomposes multiplicatively, so this is foolish!

ProductStructureKernel

class gpytorch.kernels.ProductStructureKernel(base_kernel, num_dims, active_dims=None)[source]

A Kernel decorator for kernels with product structure. If a kernel decomposes multiplicatively, then this module will be much more computationally efficient.

A kernel function k has product structure if it can be written as

\[\begin{equation*} k(\mathbf{x_1}, \mathbf{x_2}) = k'(x_1^{(1)}, x_2^{(1)}) * \ldots * k'(x_1^{(d)}, x_2^{(d)}) \end{equation*}\]

for some kernel \(k'\) that operates on each dimension.

Given a b x n x d input, ProductStructureKernel computes d one-dimensional kernels (using the supplied base_kernel), and then multiplies the component kernels together. Unlike ProductKernel, ProductStructureKernel computes each of the product terms in batch, making it very fast.

See Product Kernel Interpolation for Scalable Gaussian Processes for more detail.

Args:
  • base_kernel (Kernel):
    The kernel to approximate with KISS-GP
  • num_dims (int):
    The dimension of the input data.
  • active_dims (tuple of ints, optional):
    Passed down to the base_kernel.

ScaleKernel

class gpytorch.kernels.ScaleKernel(base_kernel, batch_size=1, outputscale_prior=None, param_transform=<MagicMock id='140117993470216'>, inv_param_transform=None, **kwargs)[source]

Decorates an existing kernel object with an output scale, i.e.

\[\begin{equation*} K_{\text{scaled}} = \theta_\text{scale} K_{\text{orig}} \end{equation*}\]

where \(\theta_\text{scale}\) is the outputscale parameter.

In batch-mode (i.e. when \(x_1\) and \(x_2\) are batches of input matrices), each batch of data can have its own outputscale parameter by setting the batch_size keyword argument to the appropriate number of batches.

Note

The outputscale parameter is parameterized on a log scale to constrain it to be positive. You can set a prior on this parameter using the outputscale_prior argument.

Args:
base_kernel (Kernel):
The base kernel to be scaled.
batch_size (int, optional):
Set this if you want a separate outputscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: 1
outputscale_prior (Prior, optional): Set this if you want to apply a prior to the outputscale
parameter. Default: None
param_transform (function, optional):
Set this if you want to use something other than softplus to ensure positiveness of parameters.
inv_param_transform (function, optional):
Set this to allow setting parameters directly in transformed space and sampling from priors. Automatically inferred for common transformations such as torch.exp or torch.nn.functional.softplus.
Attributes:
base_kernel (Kernel):
The kernel module to be scaled.
outputscale (Tensor):
The outputscale parameter. Size/shape of parameter depends on the batch_size arguments.
Example:
>>> x = torch.randn(10, 5)
>>> base_covar_module = gpytorch.kernels.RBFKernel()
>>> scaled_covar_module = gpytorch.kernels.ScaleKernel(base_covar_module)
>>> covar = scaled_covar_module(x)  # Output: LazyTensor of size (10 x 10)

Specialty Kernels

IndexKernel

class gpytorch.kernels.IndexKernel(num_tasks, rank=1, batch_size=1, prior=None, param_transform=<MagicMock id='140117995074168'>, inv_param_transform=None)[source]

A kernel for discrete indices. Kernel is defined by a lookup table.

\[\begin{equation} k(i, j) = \left(BB^\top + \text{diag}(\mathbf v) \right)_{i, j} \end{equation}\]

where \(B\) is a low-rank matrix, and \(\mathbf v\) is a non-negative vector. These parameters are learned.

Args:
num_tasks (int):
Total number of indices.
batch_size (int, optional):
Set if the MultitaskKernel is operating on batches of data (and you want different parameters for each batch)
rank (int):
Rank of \(B\) matrix.
prior (gpytorch.priors.Prior):
Prior for \(B\) matrix.
param_transform (function, optional):
Set this if you want to use something other than softplus to ensure positiveness of parameters.
inv_param_transform (function, optional):
Set this to allow setting parameters directly in transformed space and sampling from priors. Automatically inferred for common transformations such as torch.exp or torch.nn.functional.softplus.
Attributes:
covar_factor:
The \(B\) matrix.
lov_var:
The element-wise log of the \(\mathbf v\) vector.

LCMKernel

class gpytorch.kernels.LCMKernel(base_kernels, num_tasks, rank=1, task_covar_prior=None)[source]

This kernel supports the LCM kernel. It allows the user to specify a list of base kernels to use, and individual MultitaskKernel objects are fit to each of them. The final kernel is the linear sum of the Kronecker product of all these base kernels with their respective MultitaskKernel objects.

The returned object is of type gpytorch.lazy.KroneckerProductLazyTensor.

size(x1, x2)[source]

Given n data points x1 and m datapoints x2, this multitask kernel returns an (n*num_tasks) x (m*num_tasks) covariance matrix.

MultitaskKernel

class gpytorch.kernels.MultitaskKernel(data_covar_module, num_tasks, rank=1, batch_size=1, task_covar_prior=None)[source]

Kernel supporting Kronecker style multitask Gaussian processes (where every data point is evaluated at every task) using gpytorch.kernels.IndexKernel as a basic multitask kernel.

Given a base covariance module to be used for the data, \(K_{XX}\), this kernel computes a task kernel of specified size \(K_{TT}\) and returns \(K = K_{TT} \otimes K_{XX}\). as an gpytorch.lazy.KroneckerProductLazyTensor.

Args:
data_covar_module (gpytorch.kernels.Kernel):
Kernel to use as the data kernel.
num_tasks (int):
Number of tasks
batch_size (int, optional):
Set if the MultitaskKernel is operating on batches of data (and you want different parameters for each batch)
rank (int):
Rank of index kernel to use for task covariance matrix.
task_covar_prior (gpytorch.priors.Prior):
Prior to use for task kernel. See gpytorch.kernels.IndexKernel for details.
size(x1, x2)[source]

Given n data points x1 and m datapoints x2, this multitask kernel returns an (n*num_tasks) x (m*num_tasks) covariancn matrix.

Kernels for Scalable GP Regression Methods

GridKernel

class gpytorch.kernels.GridKernel(base_kernel, grid, interpolation_mode=False, active_dims=None)[source]

If the input data \(X\) are regularly spaced on a grid, then GridKernel can dramatically speed up computatations for stationary kernel.

GridKernel exploits Toeplitz and Kronecker structure within the covariance matrix. See Fast kernel learning for multidimensional pattern extrapolation for more info.

Note

GridKernel can only wrap stationary kernels (such as RBF, Matern, Periodic, Spectral Mixture, etc.)

Args:
base_kernel (Kernel):
The kernel to speed up with grid methods.
active_dims (tuple of ints, optional):
Passed down to the base_kernel.
update_grid(grid)[source]

Supply a new grid if it ever changes.

GridInterpolationKernel

class gpytorch.kernels.GridInterpolationKernel(base_kernel, grid_size, num_dims=None, grid_bounds=None, active_dims=None)[source]

Implements the KISS-GP (or SKI) approximation for a given kernel. It was proposed in Kernel Interpolation for Scalable Structured Gaussian Processes, and offers extremely fast and accurate Kernel approximations for large datasets.

Given a base kernel k, the covariance \(k(\mathbf{x_1}, \mathbf{x_2})\) is approximated by using a grid of regularly spaced inducing points:

\[\begin{equation*} k(\mathbf{x_1}, \mathbf{x_2}) = \mathbf{w_{x_1}}^\top K_{U,U} \mathbf{w_{x_2}} \end{equation*}\]

where

  • \(U\) is the set of gridded inducing points
  • \(K_{U,U}\) is the kernel matrix between the inducing points
  • \(\mathbf{w_{x_1}}\) and \(\mathbf{w_{x_2}}\) are sparse vectors based on \(\mathbf{x_1}\) and \(\mathbf{x_2}\) that apply cubic interpolation.

The user should supply the size of the grid (using the grid_size attribute). To choose a reasonable grid value, we highly recommend using the gpytorch.utils.grid.choose_grid_size() helper function. The bounds of the grid will automatically be determined by data.

(Alternatively, you can hard-code bounds using the grid_bounds, which will speed up this kernel’s computations.)

Note

GridInterpolationKernel can only wrap stationary kernels (such as RBF, Matern, Periodic, Spectral Mixture, etc.)

Args:
  • base_kernel (Kernel):
    The kernel to approximate with KISS-GP
  • grid_size (int):
    The size of the grid (in each dimension)
  • num_dims (int):
    The dimension of the input data. Required if grid_bounds=None
  • grid_bounds (tuple(float, float), optional):
    The bounds of the grid, if known (high performance mode). The length of the tuple must match the number of dimensions. The entries represent the min/max values for each dimension.
  • active_dims (tuple of ints, optional):
    Passed down to the base_kernel.

InducingPointKernel

class gpytorch.kernels.InducingPointKernel(base_kernel, inducing_points, likelihood, active_dims=None)[source]