Constant Kernel

This tutorial explains the constant kernel \(\sigma_c^2\) introduced for molecular systems, following Koistinen et al. (JCTC).

Motivation

For molecular systems with many degrees of freedom, the energy-energy correlations between distant geometries can approach zero under a pure SE kernel. Adding a constant kernel term ensures a baseline correlation: all geometries share a common energy offset.

The combined kernel:

\[k_{\text{total}}(\mathbf{x}, \mathbf{y}) = \sigma_c^2 + \sigma_m^2 \exp\!\Bigl(-\sum_p \theta_p^2 \, r_p^2\Bigr)\]

The constant term \(\sigma_c^2\) acts as a prior on the energy mean function. Without it, the GP posterior mean reverts to zero far from training data. With it, the posterior mean reverts to a value informed by the overall energy scale.

Derivative structure

The constant kernel has zero derivatives with respect to coordinates:

\[\frac{\partial k_c}{\partial \mathbf{x}} = 0 \quad \text{(energy-force: zero)}\]
\[\frac{\partial k_c}{\partial \mathbf{y}} = 0 \quad \text{(force-energy: zero)}\]
\[\frac{\partial^2 k_c}{\partial \mathbf{x} \, \partial \mathbf{y}} = 0 \quad \text{(force-force: zero)}\]

This means \(\sigma_c^2\) affects only the energy-energy (EE) block of the covariance matrix. Force predictions depend entirely on the SE kernel \(\sigma_m^2\).

This is physically intuitive: a constant energy offset does not generate forces.

See scripts/sympy/t8_constant_kernel.py for the full derivation.

Implementation

\(\sigma_c^2\) enters the covariance matrix at two points:

  1. Diagonal: K[i,i] + sigmac2= (along with noise and jitter)

  2. Off-diagonal: K[i,j] + sigmac2= (all EE entries)

In the NLL computation, \(\sigma_c^2\) is added to the kernel matrix but not to the hyperparameter gradient (\(\sigma_c^2\) is fixed, not optimized by SCG).

RFF treatment

Random Fourier Features approximate the SE kernel but not the constant term. To include \(\sigma_c^2\) in RFF predictions, an extra basis function is appended:

\[\phi_c = \sqrt{\sigma_c^2}\]

with zero Jacobian (no coordinate dependence). The extended RFF feature vector is:

\[\mathbf{z}_{\text{ext}}(\mathbf{x}) = \bigl[z_1(\mathbf{x}),\, \ldots,\, z_{D_{\text{rff}}}(\mathbf{x}),\, \phi_c\bigr]\]

The inner product \(\mathbf{z}_{\text{ext}}(\mathbf{x})^\top \mathbf{z}_{\text{ext}}(\mathbf{y}) = \mathbf{z}_{\text{rff}}(\mathbf{x})^\top \mathbf{z}_{\text{rff}}(\mathbf{y}) + \sigma_c^2\), reproducing the constant kernel contribution. For gradient predictions, the last Jacobian row is zero, so forces use only the RFF features.

The effective dimension becomes \(d_{\text{eff}} = D_{\text{rff}} + 1\).

Configuration

let mut cfg = MinimizationConfig::default();
cfg.const_sigma2 = 1.0;  // molecular systems
// cfg.const_sigma2 = 0.0;  // 2D surfaces (default)

All method config structs (MinimizationConfig, DimerConfig, OTGPDConfig, NEBConfig) accept const_sigma2. Set to 1.0 for molecular systems; leave at 0.0 for 2D analytical surfaces (backward compatible).

Effect on convergence

On the HCN system (9 atoms, 27 DOF) with GP-NEB OIE:

constsigma2

Oracle calls

Converged

0.0

diverges

no

1.0

134

yes

The constant kernel stabilizes GP predictions for molecular systems where the energy scale is large relative to local variations. Without it, the GP posterior mean can collapse to zero between training points, producing spurious forces that mislead the optimizer.

SymPy verification

The constant kernel derivative properties (all zeros except EE) and the RFF constant-feature construction are verified in:

python scripts/sympy/t8_constant_kernel.py