Should the kernel be computed in chunks with checkpointing or not? (Default, no)
- If split_size = 0:
- The kernel is computed explicitly. During training, the kernel matrix is kept in memory for the backward pass. This is the fastest option but the most memory intensive.
- If split_size > 0:
- The kernel is never fully computed or stored. Instead, the kernel is only accessed through matrix multiplication. The matrix multiplication is computed in segments chunks. This is slower, but requires significantly less memory.
Add a diagonal correction to scalable inducing point methods