gpytorch.kernels

If you don’t know what kernel to use, we recommend that you start out with a gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel).

Kernel

class gpytorch.kernels.Kernel(has_lengthscale=False, ard_num_dims=None, batch_shape=<MagicMock name='mock()' id='139903493845792'>, active_dims=None, lengthscale_prior=None, lengthscale_constraint=None, eps=1e-06, **kwargs)[source]

Kernels in GPyTorch are implemented as a gpytorch.Module that, when called on two torch.tensor objects x1 and x2 returns either a torch.tensor or a gpytorch.lazy.LazyTensor that represents the covariance matrix between x1 and x2.

In the typical use case, to extend this class means to implement the forward() method.

Note

The __call__() does some additional internal work. In particular, all kernels are lazily evaluated so that, in some cases, we can index in to the kernel matrix before actually computing it. Furthermore, many built in kernel modules return LazyTensors that allow for more efficient inference than if we explicitly computed the kernel matrix itselfself.

As a result, if you want to use a gpytorch.kernels.Kernel object just to get an actual torch.tensor representing the covariance matrix, you may need to call the gpytorch.lazy.LazyTensor.evaluate() method on the output.

This base Kernel class includes a lengthscale parameter \(\Theta\), which is used by many common kernel functions. There are a few options for the lengthscale:

  • Default: No lengthscale (i.e. \(\Theta\) is the identity matrix).
  • Single lengthscale: One lengthscale can be applied to all input dimensions/batches (i.e. \(\Theta\) is a constant diagonal matrix). This is controlled by setting has_lengthscale=True.
  • ARD: Each input dimension gets its own separate lengthscale (i.e. \(\Theta\) is a non-constant diagonal matrix). This is controlled by the ard_num_dims keyword argument (as well has has_lengthscale=True).

In batch-mode (i.e. when \(x_1\) and \(x_2\) are batches of input matrices), each batch of data can have its own lengthscale parameter by setting the batch_shape keyword argument to the appropriate number of batches.

Note

The lengthscale parameter is parameterized on a log scale to constrain it to be positive. You can set a prior on this parameter using the lengthscale_prior argument.

Base Args:
has_lengthscale (bool):
Set this if the kernel has a lengthscale. Default: False.
ard_num_dims (int, optional):
Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. Default: None
batch_shape (torch.Size, optional):
Set this if you want a separate lengthscale for each batch of input data. It should be b1 x … x bk if x1 is a b1 x … x bk x n x d tensor.
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
lengthscale_prior (Prior, optional):
Set this if you want to apply a prior to the lengthscale parameter. Default: None
lengthscale_constraint (Constraint, optional):
Set this if you want to apply a constraint to the lengthscale parameter. Default: Positive.
eps (float):
The minimum value that the lengthscale can take (prevents divide by zero errors). Default: 1e-6.
Base Attributes:
lengthscale (Tensor):
The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.
Example:
>>> covar_module = gpytorch.kernels.LinearKernel()
>>> x1 = torch.randn(50, 3)
>>> lazy_covar_matrix = covar_module(x1) # Returns a RootLazyTensor
>>> tensor_covar_matrix = lazy_covar_matrix.evaluate() # Gets the actual tensor for this kernel matrix
covar_dist(x1, x2, diag=False, last_dim_is_batch=False, square_dist=False, dist_postprocess_func=<function default_postprocess_script>, postprocess=True, **params)[source]

This is a helper method for computing the Euclidean distance between all pairs of points in x1 and x2.

Args:
x1 (Tensor n x d or b1 x … x bk x n x d):
First set of data.
x2 (Tensor m x d or b1 x … x bk x m x d):
Second set of data.
diag (bool):
Should we return the whole distance matrix, or just the diagonal? If True, we must have x1 == x2.
last_dim_is_batch (tuple, optional):
Is the last dimension of the data a batch dimension or not?
square_dist (bool):
Should we square the distance matrix before returning?
Returns:
(Tensor, Tensor) corresponding to the distance matrix between `x1 and x2. The shape depends on the kernel’s mode * diag=False * diag=False and last_dim_is_batch=True: (b x d x n x n) * diag=True * diag=True and last_dim_is_batch=True: (b x d x n)
forward(x1, x2, diag=False, last_dim_is_batch=False, **params)[source]

Computes the covariance between x1 and x2. This method should be imlemented by all Kernel subclasses.

Args:
x1 (Tensor n x d or b x n x d):
First set of data
x2 (Tensor m x d or b x m x d):
Second set of data
diag (bool):
Should the Kernel compute the whole kernel, or just the diag?
last_dim_is_batch (tuple, optional):
If this is true, it treats the last dimension of the data as another batch dimension. (Useful for additive structure over the dimensions). Default: False
Returns:
Tensor or gpytorch.lazy.LazyTensor.

The exact size depends on the kernel’s evaluation mode:

  • full_covar: n x m or b x n x m
  • full_covar with last_dim_is_batch=True: k x n x m or b x k x n x m
  • diag: n or b x n
  • diag with last_dim_is_batch=True: k x n or b x k x n
num_outputs_per_input(x1, x2)[source]

How many outputs are produced per input (default 1) if x1 is size n x d and x2 is size m x d, then the size of the kernel will be (n * num_outputs_per_input) x (m * num_outputs_per_input) Default: 1

Standard Kernels

CosineKernel

class gpytorch.kernels.CosineKernel(period_length_prior=None, period_length_constraint=None, **kwargs)[source]

Computes a covariance matrix based on the cosine kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{Cosine}}(\mathbf{x_1}, \mathbf{x_2}) = \cos \left( \pi \Vert \mathbf{x_1} - \mathbf{x_2} \Vert_2 / p \right) \end{equation*}\]

where \(p\) is the periord length parameter.

Args:
batch_shape (torch.Size, optional):
Set this if you want a separate lengthscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([])
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
period_length_prior (Prior, optional):
Set this if you want to apply a prior to the period length parameter. Default: None
period_length_constraint (Constraint, optional):
Set this if you want to apply a constraint to the period length parameter. Default: Positive.
eps (float):
The minimum value that the lengthscale/period length can take (prevents divide by zero errors). Default: 1e-6.
Attributes:
period_length (Tensor):
The period length parameter. Size = *batch_shape x 1 x 1.
Example:
>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.CosineKernel())
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.CosineKernel())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.CosineKernel(batch_shape=torch.Size([2])))
>>> covar = covar_module(x)  # Output: LazyVariable of size (2 x 10 x 10)

CylindricalKernel

class gpytorch.kernels.CylindricalKernel(num_angular_weights: int, radial_base_kernel: gpytorch.kernels.kernel.Kernel, eps: Optional[int] = 1e-06, angular_weights_prior: Optional[gpytorch.priors.prior.Prior] = None, angular_weights_constraint: Optional[gpytorch.constraints.constraints.Interval] = None, alpha_prior: Optional[gpytorch.priors.prior.Prior] = None, alpha_constraint: Optional[gpytorch.constraints.constraints.Interval] = None, beta_prior: Optional[gpytorch.priors.prior.Prior] = None, beta_constraint: Optional[gpytorch.constraints.constraints.Interval] = None, **kwargs)[source]

Computes a covariance matrix based on the Cylindrical Kernel between inputs \(mathbf{x_1}\) and \(mathbf{x_2}\). It was proposed in BOCK: Bayesian Optimization with Cylindrical Kernels. See http://proceedings.mlr.press/v80/oh18a.html for more details

Note

The data must lie completely within the unit ball.

Args:
num_angular_weights (int):
The number of components in the angular kernel
radial_base_kernel (gpytorch.kernel):
The base kernel for computing the radial kernel
batch_size (int, optional):
Set this if the data is batch of input data. It should be b if x1 is a b x n x d tensor. Default: 1
eps (float):
Small floating point number used to improve numerical stability in kernel computations. Default: 1e-6
param_transform (function, optional):
Set this if you want to use something other than softplus to ensure positiveness of parameters.
inv_param_transform (function, optional):
Set this to allow setting parameters directly in transformed space and sampling from priors. Automatically inferred for common transformations such as torch.exp or torch.nn.functional.softplus.

LinearKernel

class gpytorch.kernels.LinearKernel(num_dimensions=None, offset_prior=None, variance_prior=None, variance_constraint=None, **kwargs)[source]

Computes a covariance matrix based on the Linear kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_\text{Linear}(\mathbf{x_1}, \mathbf{x_2}) = v\mathbf{x_1}^\top \mathbf{x_2}. \end{equation*}\]

where

  • \(v\) is a variance parameter.

Note

To implement this efficiently, we use a gpytorch.lazy.RootLazyTensor during training and a gpytorch.lazy.MatmulLazyTensor during test. These lazy tensors represent matrices of the form \(K = XX^{\top}\) and \(K = XZ^{\top}\). This makes inference efficient because a matrix-vector product \(Kv\) can be computed as \(Kv=X(X^{\top}v)\), where the base multiply \(Xv\) takes only \(O(nd)\) time and space.

Args:
variance_prior (gpytorch.priors.Prior):
Prior over the variance parameter (default None).
variance_constraint (Constraint, optional):
Constraint to place on variance parameter. Default: Positive.
active_dims (list):
List of data dimensions to operate on. len(active_dims) should equal num_dimensions.

MaternKernel

class gpytorch.kernels.MaternKernel(nu=2.5, **kwargs)[source]

Computes a covariance matrix based on the Matern kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{Matern}}(\mathbf{x_1}, \mathbf{x_2}) = \frac{2^{1 - \nu}}{\Gamma(\nu)} \left( \sqrt{2 \nu} d \right) K_\nu \left( \sqrt{2 \nu} d \right) \end{equation*}\]

where

  • \(d = (\mathbf{x_1} - \mathbf{x_2})^\top \Theta^{-1} (\mathbf{x_1} - \mathbf{x_2})\) is the distance between \(x_1\) and \(x_2\) scaled by the lengthscale parameter \(\Theta\).
  • \(\nu\) is a smoothness parameter (takes values 1/2, 3/2, or 5/2). Smaller values are less smooth.
  • \(K_\nu\) is a modified Bessel function.

There are a few options for the lengthscale parameter \(\Theta\): See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Args:
nu (float):
The smoothness parameter: either 1/2, 3/2, or 5/2.
ard_num_dims (int, optional):
Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. Default: None
batch_shape (torch.Size, optional):
Set this if you want a separate lengthscale for each
batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([])
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
lengthscale_prior (Prior, optional):
Set this if you want to apply a prior to the lengthscale parameter. Default: None
lengthscale_constraint (Constraint, optional):
Set this if you want to apply a constraint to the lengthscale parameter. Default: Positive.
eps (float):
The minimum value that the lengthscale can take (prevents divide by zero errors). Default: 1e-6.
Attributes:
lengthscale (Tensor):
The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.
Example:
>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel(nu=0.5))
>>> # Non-batch: ARD (different lengthscale for each input dimension)
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel(nu=0.5, ard_num_dims=5))
>>> covar = covar_module(x)  # Output: LazyVariable of size (10 x 10)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel(nu=0.5))
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.MaternKernel(nu=0.5, batch_shape=torch.Size([2])
>>> covar = covar_module(x)  # Output: LazyVariable of size (2 x 10 x 10)

PeriodicKernel

class gpytorch.kernels.PeriodicKernel(period_length_prior=None, period_length_constraint=None, **kwargs)[source]

Computes a covariance matrix based on the periodic kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{Periodic}}(\mathbf{x_1}, \mathbf{x_2}) = \exp \left( \frac{2 \sin^2 \left( \pi \Vert \mathbf{x_1} - \mathbf{x_2} \Vert_1 / p \right) } { \ell^2 } \right) \end{equation*}\]

where

  • \(p\) is the periord length parameter.
  • \(\ell\) is a lengthscale parameter.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Note

This kernel does not have an ARD lengthscale option.

Args:
batch_shape (torch.Size, optional):
Set this if you want a separate lengthscale for each
batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([]).
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
period_length_prior (Prior, optional):
Set this if you want to apply a prior to the period length parameter. Default: None.
lengthscale_prior (Prior, optional):
Set this if you want to apply a prior to the lengthscale parameter. Default: None.
lengthscale_constraint (Constraint, optional):
Set this if you want to apply a constraint to the value of the lengthscale. Default: Positive.
period_length_constraint (Constraint, optional):
Set this if you want to apply a constraint to the value of the period length. Default: Positive.
eps (float):
The minimum value that the lengthscale/period length can take (prevents divide by zero errors). Default: 1e-6.
Attributes:
lengthscale (Tensor):
The lengthscale parameter. Size = *batch_shape x 1 x 1.
period_length (Tensor):
The period length parameter. Size = *batch_shape x 1 x 1.
Example:
>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.PeriodicKernel())
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.PeriodicKernel())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.PeriodicKernel(batch_size=2))
>>> covar = covar_module(x)  # Output: LazyVariable of size (2 x 10 x 10)

PolynomialKernel

class gpytorch.kernels.PolynomialKernel(power: int, offset_prior: Optional[gpytorch.priors.prior.Prior] = None, offset_constraint: Optional[gpytorch.constraints.constraints.Interval] = None, **kwargs)[source]

Computes a covariance matrix based on the Polynomial kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_\text{Poly}(\mathbf{x_1}, \mathbf{x_2}) = (\mathbf{x_1}^\top \mathbf{x_2} + c)^{d}. \end{equation*}\]

where

  • \(c\) is an offset parameter.
Args:
offset_prior (gpytorch.priors.Prior):
Prior over the offset parameter (default None).
offset_constraint (Constraint, optional):
Constraint to place on offset parameter. Default: Positive.
active_dims (list):
List of data dimensions to operate on. len(active_dims) should equal num_dimensions.

PolynomialKernelGrad

class gpytorch.kernels.PolynomialKernelGrad(power: int, offset_prior: Optional[gpytorch.priors.prior.Prior] = None, offset_constraint: Optional[gpytorch.constraints.constraints.Interval] = None, **kwargs)[source]

RBFKernel

class gpytorch.kernels.RBFKernel(**kwargs)[source]

Computes a covariance matrix based on the RBF (squared exponential) kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{RBF}}(\mathbf{x_1}, \mathbf{x_2}) = \exp \left( -\frac{1}{2} (\mathbf{x_1} - \mathbf{x_2})^\top \Theta^{-2} (\mathbf{x_1} - \mathbf{x_2}) \right) \end{equation*}\]

where \(\Theta\) is a lengthscale parameter. See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Args:
ard_num_dims (int, optional):
Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. Default: None
batch_shape (torch.Size, optional):
Set this if you want a separate lengthscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([]).
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
lengthscale_prior (Prior, optional):
Set this if you want to apply a prior to the lengthscale parameter. Default: None.
lengthscale_constraint (Constraint, optional):
Set this if you want to apply a constraint to the lengthscale parameter. Default: Positive.
eps (float):
The minimum value that the lengthscale can take (prevents divide by zero errors). Default: 1e-6.
Attributes:
lengthscale (Tensor):
The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.
Example:
>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
>>> # Non-batch: ARD (different lengthscale for each input dimension)
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(ard_num_dims=5))
>>> covar = covar_module(x)  # Output: LazyTensor of size (10 x 10)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(batch_shape=torch.Size([2])))
>>> covar = covar_module(x)  # Output: LazyTensor of size (2 x 10 x 10)

SpectralMixtureKernel

class gpytorch.kernels.SpectralMixtureKernel(num_mixtures=None, ard_num_dims=1, batch_shape=<MagicMock name='mock()' id='139903492634440'>, mixture_scales_prior=None, mixture_scales_constraint=None, mixture_means_prior=None, mixture_means_constraint=None, mixture_weights_prior=None, mixture_weights_constraint=None, **kwargs)[source]

Computes a covariance matrix based on the Spectral Mixture Kernel between inputs \(mathbf{x_1}\) and \(mathbf{x_2}\): It was proposed in Gaussian Process Kernels for Pattern Discovery and Extrapolation.

Note

Unlike other kernels, * ard_num_dims must equal the number of dimensions of the data * batch_shape must equal the batch size of the data (torch.Size([1]) if the data is not batched) * batch_shape cannot contain more than one batch dimension. * This kernel should not be combined with a gpytorch.kernels.ScaleKernel.

Args:
num_mixtures (int, optional):
The number of components in the mixture.
ard_num_dims (int, optional):
Set this to match the dimensionality of the input. It should be d if x1 is a n x d matrix. Default: 1
batch_shape (torch.Size, optional):
Set this if the data is
batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([1])
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
eps (float):
The minimum value that the lengthscale can take (prevents divide by zero errors). Default: 1e-6.
Attributes:
mixture_lengthscale (Tensor):
The lengthscale parameter. Given k mixture components, and b x n x d data, this will be of size b x k x 1 x d.
mixture_means (Tensor):
The mixture mean parameters (b x k x 1 x d).
mixture_weights (Tensor):
The mixture weight parameters (b x k).
Example:
>>> # Non-batch
>>> x = torch.randn(10, 5)
>>> covar_module = gpytorch.kernels.SpectralMixtureKernel(num_mixtures=4, ard_num_dims=5)
>>> covar = covar_module(x)  # Output: LazyVariable of size (10 x 10)
>>>
>>> # Batch
>>> batch_x = torch.randn(2, 10, 5)
>>> covar_module = gpytorch.kernels.SpectralMixtureKernel(num_mixtures=4, batch_size=2, ard_num_dims=5)
>>> covar = covar_module(x)  # Output: LazyVariable of size (10 x 10)

WhiteNoiseKernel

class gpytorch.kernels.WhiteNoiseKernel(*args, **kwargs)[source]

The WhiteNoiseKernel has been hard deprecated due to incorrect behavior in certain cases. For equivalent functionality, please use a FixedNoiseGaussianLikelihood.

Composition/Decoration Kernels

AdditiveKernel

class gpytorch.kernels.AdditiveKernel(*kernels)[source]

A Kernel that supports summing over multiple component kernels.

Example:
>>> covar_module = RBFKernel(active_dims=torch.tensor([1])) + RBFKernel(active_dims=torch.tensor([2]))
>>> x1 = torch.randn(50, 2)
>>> additive_kernel_matrix = covar_module(x1)

AdditiveStructureKernel

MultiDeviceKernel

class gpytorch.kernels.MultiDeviceKernel(base_kernel, device_ids, output_device=None, create_cuda_context=True, **kwargs)[source]

Allocates the covariance matrix on distributed devices, e.g. multiple GPUs.

Args:
  • base_kernel: Base kernel to distribute
  • device_ids: list of torch.device objects to place kernel chunks on
  • output_device: Device where outputs will be placed
class gpytorch.kernels.AdditiveStructureKernel(base_kernel, num_dims, active_dims=None)[source]

A Kernel decorator for kernels with additive structure. If a kernel decomposes additively, then this module will be much more computationally efficient.

A kernel function k decomposes additively if it can be written as

\[\begin{equation*} k(\mathbf{x_1}, \mathbf{x_2}) = k'(x_1^{(1)}, x_2^{(1)}) + \ldots + k'(x_1^{(d)}, x_2^{(d)}) \end{equation*}\]

for some kernel \(k'\) that operates on a subset of dimensions.

Given a b x n x d input, AdditiveStructureKernel computes d one-dimensional kernels (using the supplied base_kernel), and then adds the component kernels together. Unlike AdditiveKernel, AdditiveStructureKernel computes each of the additive terms in batch, making it very fast.

Args:
base_kernel (Kernel):
The kernel to approximate with KISS-GP
num_dims (int):
The dimension of the input data.
active_dims (tuple of ints, optional):
Passed down to the base_kernel.

ProductKernel

class gpytorch.kernels.ProductKernel(*kernels)[source]

A Kernel that supports elementwise multiplying multiple component kernels together.

Example:
>>> covar_module = RBFKernel(active_dims=torch.tensor([1])) * RBFKernel(active_dims=torch.tensor([2]))
>>> x1 = torch.randn(50, 2)
>>> kernel_matrix = covar_module(x1) # The RBF Kernel already decomposes multiplicatively, so this is foolish!

ProductStructureKernel

class gpytorch.kernels.ProductStructureKernel(base_kernel, num_dims, active_dims=None)[source]

A Kernel decorator for kernels with product structure. If a kernel decomposes multiplicatively, then this module will be much more computationally efficient.

A kernel function k has product structure if it can be written as

\[\begin{equation*} k(\mathbf{x_1}, \mathbf{x_2}) = k'(x_1^{(1)}, x_2^{(1)}) * \ldots * k'(x_1^{(d)}, x_2^{(d)}) \end{equation*}\]

for some kernel \(k'\) that operates on each dimension.

Given a b x n x d input, ProductStructureKernel computes d one-dimensional kernels (using the supplied base_kernel), and then multiplies the component kernels together. Unlike ProductKernel, ProductStructureKernel computes each of the product terms in batch, making it very fast.

See Product Kernel Interpolation for Scalable Gaussian Processes for more detail.

Args:
  • base_kernel (Kernel):
    The kernel to approximate with KISS-GP
  • num_dims (int):
    The dimension of the input data.
  • active_dims (tuple of ints, optional):
    Passed down to the base_kernel.

ScaleKernel

class gpytorch.kernels.ScaleKernel(base_kernel, outputscale_prior=None, outputscale_constraint=None, **kwargs)[source]

Decorates an existing kernel object with an output scale, i.e.

\[\begin{equation*} K_{\text{scaled}} = \theta_\text{scale} K_{\text{orig}} \end{equation*}\]

where \(\theta_\text{scale}\) is the outputscale parameter.

In batch-mode (i.e. when \(x_1\) and \(x_2\) are batches of input matrices), each batch of data can have its own outputscale parameter by setting the batch_shape keyword argument to the appropriate number of batches.

Note

The outputscale parameter is parameterized on a log scale to constrain it to be positive. You can set a prior on this parameter using the outputscale_prior argument.

Args:
base_kernel (Kernel):
The base kernel to be scaled.
batch_shape (int, optional):
Set this if you want a separate outputscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([])
outputscale_prior (Prior, optional): Set this if you want to apply a prior to the outputscale
parameter. Default: None
outputscale_constraint (Constraint, optional): Set this if you want to apply a constraint to the
outputscale parameter. Default: Positive.
Attributes:
base_kernel (Kernel):
The kernel module to be scaled.
outputscale (Tensor):
The outputscale parameter. Size/shape of parameter depends on the batch_shape arguments.
Example:
>>> x = torch.randn(10, 5)
>>> base_covar_module = gpytorch.kernels.RBFKernel()
>>> scaled_covar_module = gpytorch.kernels.ScaleKernel(base_covar_module)
>>> covar = scaled_covar_module(x)  # Output: LazyTensor of size (10 x 10)

Specialty Kernels

IndexKernel

class gpytorch.kernels.IndexKernel(num_tasks, rank=1, prior=None, var_constraint=None, **kwargs)[source]

A kernel for discrete indices. Kernel is defined by a lookup table.

\[\begin{equation} k(i, j) = \left(BB^\top + \text{diag}(\mathbf v) \right)_{i, j} \end{equation}\]

where \(B\) is a low-rank matrix, and \(\mathbf v\) is a non-negative vector. These parameters are learned.

Args:
num_tasks (int):
Total number of indices.
batch_shape (torch.Size, optional):
Set if the MultitaskKernel is operating on batches of data (and you want different parameters for each batch)
rank (int):
Rank of \(B\) matrix.
prior (gpytorch.priors.Prior):
Prior for \(B\) matrix.
var_constraint (Constraint, optional):
Constraint for added diagonal component. Default: Positive.
Attributes:
covar_factor:
The \(B\) matrix.
lov_var:
The element-wise log of the \(\mathbf v\) vector.

LCMKernel

class gpytorch.kernels.LCMKernel(base_kernels, num_tasks, rank=1, task_covar_prior=None)[source]

This kernel supports the LCM kernel. It allows the user to specify a list of base kernels to use, and individual MultitaskKernel objects are fit to each of them. The final kernel is the linear sum of the Kronecker product of all these base kernels with their respective MultitaskKernel objects.

The returned object is of type gpytorch.lazy.KroneckerProductLazyTensor.

num_outputs_per_input(x1, x2)[source]

Given n data points x1 and m datapoints x2, this multitask kernel returns an (n*num_tasks) x (m*num_tasks) covariance matrix.

MultitaskKernel

class gpytorch.kernels.MultitaskKernel(data_covar_module, num_tasks, rank=1, task_covar_prior=None, **kwargs)[source]

Kernel supporting Kronecker style multitask Gaussian processes (where every data point is evaluated at every task) using gpytorch.kernels.IndexKernel as a basic multitask kernel.

Given a base covariance module to be used for the data, \(K_{XX}\), this kernel computes a task kernel of specified size \(K_{TT}\) and returns \(K = K_{TT} \otimes K_{XX}\). as an gpytorch.lazy.KroneckerProductLazyTensor.

Args:
data_covar_module (gpytorch.kernels.Kernel):
Kernel to use as the data kernel.
num_tasks (int):
Number of tasks
batch_size (int, optional):
Set if the MultitaskKernel is operating on batches of data (and you want different parameters for each batch)
rank (int):
Rank of index kernel to use for task covariance matrix.
task_covar_prior (gpytorch.priors.Prior):
Prior to use for task kernel. See gpytorch.kernels.IndexKernel for details.
num_outputs_per_input(x1, x2)[source]

Given n data points x1 and m datapoints x2, this multitask kernel returns an (n*num_tasks) x (m*num_tasks) covariancn matrix.

RBFKernelGrad

class gpytorch.kernels.RBFKernelGrad(**kwargs)[source]

Computes a covariance matrix of the RBF kernel that models the covariance between the values and partial derivatives for inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Args:
batch_shape (torch.Size, optional):
Set this if you want a separate lengthscale for each
batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([]).
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
lengthscale_prior (Prior, optional):
Set this if you want to apply a prior to the lengthscale parameter. Default: None.
lengthscale_constraint (Constraint, optional):
Set this if you want to apply a constraint to the lengthscale parameter. Default: Positive.
eps (float):
The minimum value that the lengthscale can take (prevents divide by zero errors). Default: 1e-6.
Attributes:
lengthscale (Tensor):
The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.
Example:
>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernelGrad())
>>> covar = covar_module(x)  # Output: LazyTensor of size (60 x 60), where 60 = n * (d + 1)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernelGrad())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernelGrad(batch_shape=torch.Size([2])))
>>> covar = covar_module(x)  # Output: LazyTensor of size (2 x 60 x 60)

Kernels for Scalable GP Regression Methods

GridKernel

class gpytorch.kernels.GridKernel(base_kernel, grid, interpolation_mode=False, active_dims=None)[source]

If the input data \(X\) are regularly spaced on a grid, then GridKernel can dramatically speed up computatations for stationary kernel.

GridKernel exploits Toeplitz and Kronecker structure within the covariance matrix. See Fast kernel learning for multidimensional pattern extrapolation for more info.

Note

GridKernel can only wrap stationary kernels (such as RBF, Matern, Periodic, Spectral Mixture, etc.)

Args:
base_kernel (Kernel):
The kernel to speed up with grid methods.
active_dims (tuple of ints, optional):
Passed down to the base_kernel.
update_grid(grid)[source]

Supply a new grid if it ever changes.

GridInterpolationKernel

class gpytorch.kernels.GridInterpolationKernel(base_kernel, grid_size, num_dims=None, grid_bounds=None, active_dims=None)[source]

Implements the KISS-GP (or SKI) approximation for a given kernel. It was proposed in Kernel Interpolation for Scalable Structured Gaussian Processes, and offers extremely fast and accurate Kernel approximations for large datasets.

Given a base kernel k, the covariance \(k(\mathbf{x_1}, \mathbf{x_2})\) is approximated by using a grid of regularly spaced inducing points:

\[\begin{equation*} k(\mathbf{x_1}, \mathbf{x_2}) = \mathbf{w_{x_1}}^\top K_{U,U} \mathbf{w_{x_2}} \end{equation*}\]

where

  • \(U\) is the set of gridded inducing points
  • \(K_{U,U}\) is the kernel matrix between the inducing points
  • \(\mathbf{w_{x_1}}\) and \(\mathbf{w_{x_2}}\) are sparse vectors based on \(\mathbf{x_1}\) and \(\mathbf{x_2}\) that apply cubic interpolation.

The user should supply the size of the grid (using the grid_size attribute). To choose a reasonable grid value, we highly recommend using the gpytorch.utils.grid.choose_grid_size() helper function. The bounds of the grid will automatically be determined by data.

(Alternatively, you can hard-code bounds using the grid_bounds, which will speed up this kernel’s computations.)

Note

GridInterpolationKernel can only wrap stationary kernels (such as RBF, Matern, Periodic, Spectral Mixture, etc.)

Args:
  • base_kernel (Kernel):
    The kernel to approximate with KISS-GP
  • grid_size (int):
    The size of the grid (in each dimension)
  • num_dims (int):
    The dimension of the input data. Required if grid_bounds=None
  • grid_bounds (tuple(float, float), optional):
    The bounds of the grid, if known (high performance mode). The length of the tuple must match the number of dimensions. The entries represent the min/max values for each dimension.
  • active_dims (tuple of ints, optional):
    Passed down to the base_kernel.

InducingPointKernel

class gpytorch.kernels.InducingPointKernel(base_kernel, inducing_points, likelihood, active_dims=None)[source]