gpytorch.settings

class gpytorch.settings.cg_tolerance(value)[source]

Relative residual tolerance to use for terminating CG.

Default: 1

class gpytorch.settings.debug(state=True)[source]

Whether or not to perform “safety” checks on the supplied data. (For example, that the correct training data is supplied in Exact GP training mode) Pros: fewer data checks, fewer warning messages Cons: possibility of supplying incorrect data, model accidentially in wrong mode

class gpytorch.settings.detach_test_caches(state=True)[source]

Whether or not to detach caches computed for making predictions. In most cases, you will want this, as this will speed up derivative computations of the predictions with respect to test inputs. However, if you also need derivatives with respect to training inputs (e.g., because you have fantasy observations), then you must disable this.

class gpytorch.settings.eval_cg_tolerance(value)[source]

Relative residual tolerance to use for terminating CG when making predictions.

Default: 0.01

class gpytorch.settings.fast_computations(covar_root_decomposition=True, log_prob=True, solves=True)[source]

This feature flag controls whether or not to use fast approximations to various mathematical functions used in GP inference. The functions that can be controlled are:

  • covar_root_decomposition
    This feature flag controls how matrix root decompositions (\(K = L L^\top\)) are computed (e.g. for sampling, computing caches, etc.).
    • If set to True,
      covariance matrices \(K\) are decomposed with low-rank approximations \(L L^\top\), (\(L \in \mathbb R^{n \times k}\)) using the Lanczos algorithm. This is faster for large matrices and exploits structure in the covariance matrix if applicable.
    • If set to False,
      covariance matrices \(K\) are decomposed using the Cholesky decomposition.
  • log_prob
    This feature flag controls how GPyTorch computes the marginal log likelihood for exact GPs and log_prob for multivariate normal distributions
  • fast_solves
    This feature flag controls how GPyTorch computes the solves of positive-definite matrices.
    • If set to True,
      Solves are computed with preconditioned conjugate gradients.
    • If set to False,
      Solves are computed using the Cholesky decomposition.

Warning

Setting this to False will compute a complete Cholesky decomposition of covariance matrices. This may be infeasible for GPs with structure covariance matrices.

By default, approximations are used for all of these functions (except for solves). Setting any of them to False will use exact computations instead.

See also:
covar_root_decomposition

alias of _fast_covar_root_decomposition

log_prob

alias of _fast_log_prob

solves

alias of _fast_solves

class gpytorch.settings.fast_pred_samples(state=True)[source]

Fast predictive samples using Lanczos Variance Estimates (LOVE). Use this for improved performance when sampling from a predictive posterior matrix.

As described in the paper:

Constant-Time Predictive Distributions for Gaussian Processes.

See also: gpytorch.settings.max_root_decomposition_size (to control the size of the low rank decomposition used for samples).

class gpytorch.settings.fast_pred_var(state=True, num_probe_vectors=1)[source]

Fast predictive variances using Lanczos Variance Estimates (LOVE) Use this for improved performance when computing predictive variances.

As described in the paper:

Constant-Time Predictive Distributions for Gaussian Processes.

See also: gpytorch.settings.max_root_decomposition_size (to control the size of the low rank decomposition used for variance estimates).

class gpytorch.settings.lazily_evaluate_kernels(state=True)[source]

Lazily compute the entries of covariance matrices (set to True by default). This can result in memory and speed savings - if say cross covariance terms are not needed or if you only need to compute variances (not covariances).

If set to False, gpytorch will always compute the entire covariance matrix between training and test data.

class gpytorch.settings.max_cg_iterations(value)[source]

The maximum number of conjugate gradient iterations to perform (when computing matrix solves). A higher value rarely results in more accurate solves – instead, lower the CG tolerance. Default: 1000

class gpytorch.settings.max_cholesky_size(value)[source]

If the size of of a LazyTensor is less than max_cholesky_size, then root_decomposition and inv_matmul of LazyTensor will use Cholesky rather than Lanczos/CG. Default: 128

class gpytorch.settings.max_eager_kernel_size(value)[source]

If the joint train/test covariance matrix is less than this size, then we will avoid as much lazy evaluation of the kernel as possible. Default: 512

class gpytorch.settings.max_lanczos_quadrature_iterations(value)[source]

The maximum number of Lanczos iterations to perform when doing stochastic Lanczos quadrature. This is ONLY used for log determinant calculations and computing Tr(K^{-1}dK/d heta)

class gpytorch.settings.max_preconditioner_size(value)[source]

The maximum size of preconditioner to use. 0 corresponds to turning preconditioning off. When enabled, usually a value of around ~10 works fairly well. Default: 0

class gpytorch.settings.max_root_decomposition_size(value)[source]

The maximum number of Lanczos iterations to perform This is used when 1) computing variance estiamtes 2) when drawing from MVNs, or 3) for kernel multiplication More values results in higher accuracy Default: 100

class gpytorch.settings.memory_efficient(state=True)[source]

Whether or not to use Toeplitz math with gridded data, grid inducing point modules Pros: memory efficient, faster on CPU Cons: slower on GPUs with < 10000 inducing points

class gpytorch.settings.num_gauss_hermite_locs(value)[source]

The number of samples to draw from a latent GP when computing a likelihood This is used in variational inference and training Default: 10

class gpytorch.settings.num_likelihood_samples(value)[source]

The number of samples to draw from a latent GP when computing a likelihood This is used in variational inference and training Default: 10

class gpytorch.settings.num_trace_samples(value)[source]

The number of samples to draw when stochastically computing the trace of a matrix More values results in more accurate trace estimations If the value is set to 0, then the trace will be deterministically computed Default: 10

class gpytorch.settings.preconditioner_tolerance(value)[source]

Diagonal trace tolerance to use for checking preconditioner convergence.

Default: 1e-3

class gpytorch.settings.skip_logdet_forward(state=True)[source]

This feature does not affect the gradients returned by gpytorch.distributions.MultivariateNormal.log_prob() (used by gpytorch.mlls.MarginalLogLikelihood). The gradients remain unbiased estimates, and therefore can be used with SGD. However, the actual likelihood value returned by the forward pass will skip certain computations (i.e. the logdet computation), and will therefore be improper estimates.

If you’re using SGD (or a varient) to optimize parameters, you probably don’t need an accurate MLL estimate; you only need accurate gradients. So this setting may give your model a performance boost.

class gpytorch.settings.skip_posterior_variances(state=True)[source]

Whether or not to skip the posterior covariance matrix when doing an ExactGP forward pass. If this is on, the returned gpytorch MultivariateNormal will have a ZeroLazyTensor as its covariance matrix. This allows gpytorch to not compute the covariance matrix when it is not needed, speeding up computations.

class gpytorch.settings.terminate_cg_by_size(state=True)[source]

If set to true, cg will terminate after n iterations for an n x n matrix.

class gpytorch.settings.tridiagonal_jitter(value)[source]

The (relative) amount of noise to add to the diagonal of tridiagonal matrices before eigendecomposing. root_decomposition becomes slightly more stable with this, as we need to take the square root of the eigenvalues. Any eigenvalues still negative after adding jitter will be zeroed out.

class gpytorch.settings.use_toeplitz(state=True)[source]

Whether or not to use Toeplitz math with gridded data, grid inducing point modules Pros: memory efficient, faster on CPU Cons: slower on GPUs with < 10000 inducing points