Performance Benchmarks

1 Methodology

All benchmarks use Criterion.rs with default settings (100 iterations, 5-second warm-up). Measurements taken on a single core. Times report the median of the sample distribution. Source: benches/iterator_bench.rs.

Run benchmarks locally:

cargo bench
# or: pixi r bench

2 Frame parsing throughput

Benchmark

Dataset

Time

Throughput

Single frame parse

4 atoms

1.5 us

2.7M atoms/s

2-frame parse (next)

2x4 atoms

2.3 us

3.5M atoms/s

2-frame skip (forward)

2x4 atoms

0.6 us

13M atoms/s (skip mode)

100-frame sequential

100x4 atoms

212 us

1.9M atoms/s

100-frame forward skip

100x4 atoms

29 us

14M atoms/s (skip mode)

218-atom frame (cuh2)

218 atoms

42 us

5.2M atoms/s

forward() skips frames by line counting without parsing atom data, achieving 7x higher throughput than full parsing. This matters for trajectory analysis that only needs specific frames (e.g., every 10th).

3 Velocity parsing overhead

Benchmark

Time

Overhead vs coords-only

Coords only (2x4)

2.3 us

(baseline)

Coords + vel (2x4)

3.9 us

+70%

Vel skip (forward)

0.9 us

(skip mode)

Velocity sections add roughly 70% parsing time (same line count, same float parsing). The forward() skip mode handles velocity sections with minimal overhead.

4 Float parsing: fast-float2 vs stdlib

Parser

5-column line

Speedup

fast-float2

100 ns

2.0x

str::parse::<f64>

202 ns

1.0x

readcon-core uses fast-float2 for all coordinate, velocity, and force line parsing. This provides a consistent 2x speedup over Rust’s standard library float parser on the hot path.

5 I/O strategy: mmap vs readtostring

Strategy

218-atom file (16 KiB)

Notes

readtostring

42 us

Slight edge for small files

mmap

44 us

Fixed overhead (VMA, page fault)

For files under 64 KiB, read_to_string avoids mmap overhead. For larger trajectory files, mmap lets the OS page cache handle data without a full heap copy. readcon-core switches automatically at the 64 KiB threshold.

Compressed files (.con.gz) always decompress to an in-memory buffer regardless of size, since mmap cannot decompress on the fly.

6 Cross-implementation comparison

Measured with benches/compare_readers.py on a 100-frame trajectory (218 atoms per frame, 1.6 MiB file). Median of 5 runs.

Reader

Time (ms)

Speedup vs ASE

ASE (ase.io.eon)

36.1

1.0x (baseline)

C sscanf (eOn-style)

10.6

3.4x

readcon-core (file path)

4.4

8.2x

readcon-core (from string)

4.1

8.7x

img/parsing_throughput.svg

Parsing throughput across trajectory sizes (log scale)

readcon-core outperforms even a C sscanf-based reader (2.5x) because:

  • fast-float2: SIMD-accelerated float parsing vs sscanf dispatch

  • Zero-copy iteration: borrows lines from the input &str, no fgets buffer copies

  • Pre-allocated vectors: atom count known from header before parsing

  • No stdio overhead: entire file in memory (mmap or readtostring) vs per-line fgets

For trajectory files with thousands of frames, the difference compounds: readcon-core’s forward() skip mode processes frames it does not need at 14M atoms/s, while Python readers must parse every line.

7 Scaling with file size

Measured across four trajectory sizes. readcon-core and C times are read_con_string() (pre-loaded) and internal best-of-N respectively. ASE times include file I/O.

Dataset

File size

C sscanf

ASE

readcon

vs ASE

vs C

218 x 100

1.6 MiB

10.6 ms

36 ms

4.4 ms

8.2x

2.4x

218 x 1000

9.7 MiB

73 ms

286 ms

55 ms

5.2x

1.3x

10k x 100

46.9 MiB

361 ms

956 ms

185 ms

5.2x

2.0x

10k x 10

4.7 MiB

36 ms

94 ms

13 ms

7.2x

2.8x

readcon-core maintains 5-8x speedup over ASE across all sizes. The advantage over C narrows on large files (I/O becomes a larger fraction of total time), but readcon-core remains consistently faster due to fast-float2 and zero-copy parsing.

8 Memory usage

Peak resident set size when loading all frames into memory:

Dataset

readcon peak RSS

ASE peak RSS

218 x 1000

70 MiB

268 MiB

10k x 100

263 MiB

270 MiB

10k x 10

263 MiB

270 MiB

For the 218-atom trajectory, readcon-core uses 3.8x less memory than ASE (70 vs 268 MiB). At 10k atoms, both converge because the atom data dominates (readcon stores ~120 bytes/atom, ASE stores similar plus numpy overhead).

img/memory_usage.svg

Peak memory usage with all frames loaded

The C sscanf reader frees each frame immediately, so its peak RSS stays under 16 MiB regardless of trajectory length. readcon-core can achieve similar constant-memory usage via the iterator API:

// Process frames one at a time (constant memory)
let iter = ConFrameIterator::new(&contents);
for result in iter {
    let frame = result?;
    // process frame, then drop
}

9 Scaling considerations

The per-atom parsing cost is dominated by float conversion (5 columns per atom line). With fast-float2, each atom line takes roughly 100 ns to parse. For a 10,000-atom frame:

  • Coordinates: ~1 ms

  • Coordinates + velocities: ~1.7 ms

  • Coordinates + velocities + forces: ~2.4 ms

  • With gzip decompression overhead: +10-30% (depends on compression ratio)

For trajectory files with many frames, the parallel feature gate enables rayon-based frame-level parallelism, scaling linearly with core count for the parsing phase.

10 Memory profile

readcon-core allocates:

  • One Rc<String> per atom type (not per atom) for symbol storage

  • One Vec<AtomDatum> per frame (pre-allocated from header counts)

  • No intermediate string allocations for atom line parsing (fast-float2 parses directly from the borrowed &str slice)

For a 10,000-atom frame with velocities and forces, the in-memory footprint is approximately:

  • 10,000 atoms x 120 bytes/atom (coords + vel + forces + metadata) = 1.2 MB

  • Header overhead: negligible

  • Total: ~1.2 MB per frame in memory

The iterator API processes one frame at a time, so multi-frame files do not require loading the entire trajectory into memory.

11 Feature coverage vs other formats

The CON v2 format covers features that typically require multiple formats or lossy workarounds in other ecosystems. The comparison below includes text-based and binary formats commonly used in computational chemistry.

img/feature_comparison.svg

Feature matrix: CON v2 vs common atomic structure formats

CON v2 achieves full coverage (10/10) across: positions, velocities, forces, unit cell, per-direction constraints, atom identity (round-trip), structured metadata, compression, multi-frame support, and streaming iteration. No other single format covers all ten.

The extxyz format comes closest (6/10 with partial metadata) but lacks per-direction constraints, atom identity tracking, and a formal specification. LAMMPS dump format supports many features but is tightly coupled to the LAMMPS ecosystem.

img/pareto_features_vs_speed.svg

Feature coverage vs parsing performance (Pareto front)

readcon-core occupies the Pareto-optimal corner: maximum feature coverage at the fastest parse speed among text-based formats. Binary formats (DCD, TRR) trade features for raw throughput – they lack metadata, constraints, and human readability.

12 Statistical analysis

The point estimates above characterize typical performance. For publication-quality results with credible intervals, we use bayescomp – a Bayesian hierarchical comparison framework that fits Gamma-family models with random intercepts per test system. This provides posterior distributions for speedup factors rather than single numbers, accounting for system-to-system variation and measurement noise.

The bayescomp analysis pipeline reads Criterion JSON output and compare_readers.py timing data, fits the model via brms / cmdstanr, and produces posterior predictive checks and effect size summaries suitable for JOSS or SoftwareX publication.

13 Reproducing these benchmarks

# Cross-implementation speed comparison (ASE, C, readcon)
uv run --with matplotlib --with numpy --with ase python benches/compare_readers.py

# Generate publication plots
uv run --with matplotlib --with numpy python benches/make_plots.py

# Rust microbenchmarks (Criterion)
cargo bench