gpytorch.mlls

These are modules to compute (or approximate/bound) the marginal log likelihood (MLL) of the GP model when applied to data. I.e., given a GP \(f \sim \mathcal{GP}(\mu, K)\), and data \(\mathbf X, \mathbf y\), these modules compute/approximate

\[\begin{equation*} \mathcal{L} = p_f(\mathbf y \! \mid \! \mathbf X) = \int p \left( \mathbf y \! \mid \! f(\mathbf X) \right) \: p(f(\mathbf X) \! \mid \! \mathbf X) \: d f \end{equation*}\]

This is computed exactly when the GP inference is computed exactly (e.g. regression w/ a Gaussian likelihood). It is approximated/bounded for GP models that use approximate inference.

These models are typically used as the “loss” functions for GP models (though note that the output of these functions must be negated for optimization).

Exact GP Inference

These are MLLs for use with ExactGP modules. They compute the MLL exactly.

ExactMarginalLogLikelihood

class gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)[source]

The exact marginal log likelihood (MLL) for an exact Gaussian process with a Gaussian likelihood.

Note

This module will not work with anything other than a GaussianLikelihood and a ExactGP. It also cannot be used in conjunction with stochastic optimization.

Parameters:
Example:
>>> # model is a gpytorch.models.ExactGP
>>> # likelihood is a gpytorch.likelihoods.Likelihood
>>> mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
>>>
>>> output = model(train_x)
>>> loss = -mll(output, train_y)
>>> loss.backward()
forward(function_dist, target, *params)[source]

Computes the MLL given \(p(\mathbf f)\) and \(\mathbf y\).

Parameters:
Return type:

torch.Tensor

Returns:

Exact MLL. Output shape corresponds to batch shape of the model/input data.

Approximate GP Inference

These are MLLs for use with ApproximateGP modules. They are designed for when exact inference is intractable (either when the likelihood is non-Gaussian likelihood, or when there is too much data for an ExactGP model).

VariationalELBO

class gpytorch.mlls.VariationalELBO(likelihood, model, num_data, beta=1.0, combine_terms=True)[source]

The variational evidence lower bound (ELBO). This is used to optimize variational Gaussian processes (with or without stochastic optimization).

\[\begin{split}\begin{align*} \mathcal{L}_\text{ELBO} &= \mathbb{E}_{p_\text{data}( y, \mathbf x )} \left[ \log \mathbb{E}_{q(\mathbf u)} \left[ p( y \! \mid \! \mathbf u) \right] \right] - \beta \: \text{KL} \left[ q( \mathbf u) \Vert p( \mathbf u) \right] \\ &\approx \sum_{i=1}^N \mathbb{E}_{q( \mathbf u)} \left[ \log \int p( y_i \! \mid \! f_i) p(f_i \! \mid \! \mathbf u) \: d \mathbf f_i \right] - \beta \: \text{KL} \left[ q( \mathbf u) \Vert p( \mathbf u) \right] \end{align*}\end{split}\]

where \(N\) is the number of datapoints, \(q(\mathbf u)\) is the variational distribution for the inducing function values, and p(mathbf u) is the prior distribution for the inducing function values.

\(\beta\) is a scaling constant that reduces the regularization effect of the KL divergence. Setting \(\beta=1\) (default) results in the true variational ELBO.

For more information on this derivation, see Scalable Variational Gaussian Process Classification (Hensman et al., 2015).

Parameters:
  • likelihood (Likelihood) – The likelihood for the model
  • model (ApproximateGP) – The approximate GP model
  • num_data (int) – The total number of training data points (necessary for SGD)
  • beta (float) – (optional, default=1.) A multiplicative factor for the KL divergence term. Setting it to 1 (default) recovers true variational inference (as derived in Scalable Variational Gaussian Process Classification). Setting it to anything less than 1 reduces the regularization effect of the model (similarly to what was proposed in the beta-VAE paper).
  • combine_terms (bool) – (default=True): Whether or not to sum the expected NLL with the KL terms (default True)
Example:
>>> # model is a gpytorch.models.ApproximateGP
>>> # likelihood is a gpytorch.likelihoods.Likelihood
>>> mll = gpytorch.mlls.VariationalELBO(likelihood, model, num_data=100, beta=0.5)
>>>
>>> output = model(train_x)
>>> loss = -mll(output, train_y)
>>> loss.backward()
forward(variational_dist_f, target, **kwargs)[source]

Computes the Variational ELBO given \(q(\mathbf f)\) and \(\mathbf y\). Calling this function will call the likelihood’s expected_log_prob() function.

Parameters:
Return type:

torch.Tensor

Returns:

Variational ELBO. Output shape corresponds to batch shape of the model/input data.

PredictiveLogLikelihood

class gpytorch.mlls.PredictiveLogLikelihood(likelihood, model, num_data, beta=1.0, combine_terms=True)[source]

An alternative objective function for approximate GPs, proposed in Jankowiak et al., 2019. It typically produces better predictive variances than the gpytorch.mlls.VariationalELBO objective.

\[\begin{split}\begin{align*} \mathcal{L}_\text{ELBO} &= \mathbb{E}_{p_\text{data}( y, \mathbf x )} \left[ \log p( y \! \mid \! \mathbf x) \right] - \beta \: \text{KL} \left[ q( \mathbf u) \Vert p( \mathbf u) \right] \\ &\approx \sum_{i=1}^N \log \mathbb{E}_{q(\mathbf u)} \left[ \int p( y_i \! \mid \! f_i) p(f_i \! \mid \! \mathbf u) \: d f_i \right] - \beta \: \text{KL} \left[ q( \mathbf u) \Vert p( \mathbf u) \right] \end{align*}\end{split}\]

where \(N\) is the total number of datapoints, \(q(\mathbf u)\) is the variational distribution for the inducing function values, and p(mathbf u) is the prior distribution for the inducing function values.

\(\beta\) is a scaling constant that reduces the regularization effect of the KL divergence. Setting \(\beta=1\) (default) results in an objective that can be motivated by a connection to Stochastic Expectation Propagation (see Jankowiak et al., 2019 for details).

Note

This objective is very similar to the variational ELBO. The only difference is that the \(log\) occurs outside the expectation \(\mathbb E_{q(\mathbf u}\). This difference results in very different predictive performance (see Jankowiak et al., 2019).

Parameters:
  • likelihood (Likelihood) – The likelihood for the model
  • model (ApproximateGP) – The approximate GP model
  • num_data (int) – The total number of training data points (necessary for SGD)
  • beta (float) – (optional, default=1.) A multiplicative factor for the KL divergence term. Setting it to anything less than 1 reduces the regularization effect of the model (similarly to what was proposed in the beta-VAE paper).
  • combine_terms (bool) – (default=True): Whether or not to sum the expected NLL with the KL terms (default True)
Example:
>>> # model is a gpytorch.models.ApproximateGP
>>> # likelihood is a gpytorch.likelihoods.Likelihood
>>> mll = gpytorch.mlls.PredictiveLogLikelihood(likelihood, model, num_data=100, beta=0.5)
>>>
>>> output = model(train_x)
>>> loss = -mll(output, train_y)
>>> loss.backward()
forward(approximate_dist_f, target, **kwargs)[source]

Computes the predictive cross entropy given \(q(\mathbf f)\) and mathbf y. Calling this function will call the likelihood’s forward() function.

Parameters:
Return type:

torch.Tensor

Returns:

Predictive log likelihood. Output shape corresponds to batch shape of the model/input data.

GammaRobustVariationalELBO

class gpytorch.mlls.GammaRobustVariationalELBO(likelihood, model, gamma=1.03, *args, **kwargs)[source]

An alternative to the variational evidence lower bound (ELBO), proposed by Knoblauch, 2019. It is derived by replacing the log-likelihood term in the ELBO with a gamma divergence:

\[\begin{align*} \mathcal{L}_{\gamma} &= \sum_{i=1}^N \mathbb{E}_{q( \mathbf u)} \left[ -\frac{\gamma}{\gamma - 1} \frac{ p( y_i \! \mid \! \mathbf u)^{\gamma - 1} }{ \int p(\mathbf y \mid \mathbf u) \: d \mathbf y } \right] - \beta \: \text{KL} \left[ q( \mathbf u) \Vert p( \mathbf u) \right] \end{align*}\]

where \(N\) is the number of datapoints, \(\gamma\) is a hyperparameter, \(q(\mathbf u)\) is the variational distribution for the inducing function values, and p(mathbf u) is the prior distribution for the inducing function values.

\(\beta\) is a scaling constant for the KL divergence.

Note

This module will only work with GaussianLikelihood.

Parameters:
  • likelihood (GaussianLikelihood) – The likelihood for the model
  • model (ApproximateGP) – The approximate GP model
  • num_data (int) – The total number of training data points (necessary for SGD)
  • beta (float) – (optional, default=1.) A multiplicative factor for the KL divergence term. Setting it to anything less than 1 reduces the regularization effect of the model (similarly to what was proposed in the beta-VAE paper).
  • gamma (float) – (optional, default=1.03) The \(\gamma\)-divergence hyperparameter.
  • combine_terms (bool) – (default=True): Whether or not to sum the expected NLL with the KL terms (default True)
Example:
>>> # model is a gpytorch.models.ApproximateGP
>>> # likelihood is a gpytorch.likelihoods.Likelihood
>>> mll = gpytorch.mlls.GammaRobustVariationalELBO(likelihood, model, num_data=100, beta=0.5, gamma=1.03)
>>>
>>> output = model(train_x)
>>> loss = -mll(output, train_y)
>>> loss.backward()