Architecture¶
Workspace Layout¶
ChemGP/
Cargo.toml # workspace root
crates/
chemgp-core/ # all GP + optimizer logic
src/
lib.rs # module exports, StopReason enum
invdist.rs # inverse distance features + Jacobian
kernel.rs # Kernel enum, MolInvDistSE, CartesianSE
types.rs # TrainingData, GPModel, init_kernel
covariance.rs # build_full_covariance, robust_cholesky
predict.rs # GP posterior, PredModel, build_pred_model
nll.rs # MAP NLL + analytical gradient
scg.rs # Moller (1993) SCG optimizer
train.rs # train_model dispatcher
distances.rs # max_1d_log, euclidean distance
emd.rs # brute-force EMD
sampling.rs # FPS, select_optim_subset, prune
lbfgs.rs # L-BFGS two-loop recursion
optim_step.rs # FIRE/LBFGS unified step
trust.rs # adaptive trust thresholds, EMD-based trust
rff.rs # Random Fourier Features
minimize.rs # gp_minimize
dimer.rs # gp_dimer, standard_dimer
neb_path.rs # NEBConfig, NEBPath, NEB force computation
idpp.rs # IDPP/sIDPP path initialization
neb.rs # neb_optimize, gp_neb_aie
neb_oie.rs # gp_neb_oie (LCB-guided image selection)
otgpd.rs # otgpd (HOD, adaptive threshold)
potentials.rs # LJ, Muller-Brown, LEPS
io.rs # readcon-core + chemfiles (feature-gated)
oracle.rs # rgpot-core wrapper (feature-gated)
examples/
mb_minimize.rs # GP minimize on Muller-Brown (CartesianSE)
mb_gp_quality.rs # GP quality grids on Muller-Brown
leps_minimize.rs # GP minimize on LEPS (MolInvDistSE)
leps_rff_quality.rs # RFF vs exact GP on LEPS
leps_neb.rs # NEB vs AIE vs OIE
leps_dimer.rs # GP-Dimer vs OTGPD
docs/
export.el # ox-rst batch export
orgmode/ # org source files (single source of truth)
source/ # generated RST (gitignored)
scripts/ # figure generation and plotting
data/ # test structures (HCN, system100)
Kernel dispatch¶
The Kernel enum wraps MolInvDistSE (molecules) and CartesianSE (arbitrary
surfaces). All code accepts &Kernel and dispatches through unified methods.
This avoids trait objects and monomorphization bloat while supporting both kernel
types with zero runtime overhead. See Kernel Design for details.
PredModel dispatch¶
The PredModel enum in predict.rs dispatches between exact GP and RFF:
pub enum PredModel {
Gp(GPModel),
Rff(RffModel),
}
The asymmetry is deliberate: hyperparameters are always trained on the exact GP (SCG needs exact NLL gradients), but the prediction model can be approximate. This means RFF inherits well-tuned hyperparameters without paying the exact GP prediction cost.
All optimizers (minimize, dimer, otgpd, neb, neboie) use build_pred_model()
to construct the appropriate variant.
Outer loop pattern¶
All GP-accelerated optimizers follow the same outer loop:
FPS subset selection: select the K most informative training points. Why: keeps Cholesky cost manageable.
Train GP: SCG-optimize hyperparameters on the subset. Why: hyperparameters must adapt as the optimizer explores new regions.
Build PredModel: construct exact GP or RFF on full training data. Why: prediction should use all available data, even points not in the training subset.
Inner optimization: optimize on the surrogate (L-BFGS for minimize/dimer, FIRE/L-BFGS for NEB). Why: the GP surrogate is cheap to evaluate, so aggressive optimization is free.
Trust clip: clip the proposed step to the trust region. Why: prevents stepping into extrapolation territory.
Oracle call: evaluate the true potential at the new point. Why: the GP prediction must be validated; the new data improves the GP.
Convergence check: compare force to tolerance.
The method-specific differences (translation force for dimer, perpendicular force for NEB, adaptive threshold for OTGPD) happen within steps 4-6.
StopReason¶
All optimizer result types carry a StopReason enum:
pub enum StopReason {
Converged,
MaxIterations,
OracleCap,
ForceStagnation,
}
ForceStagnation triggers when the force norm changes by less than 1e-10 for 3
consecutive steps. This catches situations where the GP is stuck (e.g., due to
deduptolrejecting all new training points).