Frequently Asked Questions¶
1 Why another atomic structure format?¶
The con format addresses a specific gap: lossless round-tripping of
atomic configurations through saddle-point search, NEB, and dimer
calculations. Existing formats lose information during read-write
cycles:
XYZ: no cell data, no fixed-atom flags, no atom identity tracking. A 218-atom slab written to XYZ and read back has lost the original atom ordering, constraint information, and periodicity.
POSCAR/CONTCAR: VASP-specific. No velocity or force sections. Selective dynamics is all-or-nothing per direction. No metadata for potential parameters or convergence state.
extxyz: extensible but underspecified. Every tool invents its own key names. No formal specification means round-trip fidelity depends on implementation details. Parsing performance suffers from per-atom key-value overhead.
CIF: designed for crystallography, not molecular dynamics. Verbose. No velocity or force representation. Overkill for transient simulation snapshots.
The con format is deliberately minimal: a fixed 9-line header, typed
atom blocks, and optional velocity/force sections. The v2 JSON
metadata line adds extensibility without breaking the core simplicity.
2 What problems does atomidsolve?¶
The con format groups atoms by element type. A structure with atoms
C, C, C, O, C, C (indices 0-5) gets written as five C atoms followed
by one O atom. Without a persistent identity field, the original
ordering vanishes after one read-write cycle.
This matters for:
NEB calculations: interpolated images must maintain consistent atom ordering across the band. If atom ordering drifts, the interpolation produces nonsense.
Dimer searches: the displacement vector references specific atom indices. Reordering atoms invalidates the mode.
Reference comparisons: comparing a relaxed structure against a reference (e.g., Baker test set) requires matching atoms by index.
The atom_id field (column 5) stores the pre-grouping index,
allowing exact reconstruction of the original ordering after any
number of read-write cycles.
3 Why JSON on line 2?¶
Line 2 was historically unused (“Time” or empty in eOn files). JSON provides:
Forward compatibility: new keys can be added without format changes. Unknown keys are preserved through round-trips.
Machine readability: no custom parser needed. Every language has a JSON library.
Section declaration: the
sectionskey tells the parser exactly what per-atom data to expect, eliminating ambiguity.Provenance:
potential,energy,generatorkeys make files self-documenting.Backward compatibility: pre-v2 files have non-JSON on line 2. The parser detects this (line 2 does not start with
{) and falls back to legacy mode (spec_version = 1).
4 When should I use HDF5 instead?¶
Use con for:
Single structures and short trajectories (< 10k frames)
Interoperability with eOn, readcon-core, and ASE
Human-readable files that can be inspected with
headSituations where simplicity and round-trip fidelity matter
Use HDF5 for:
Large-scale MD trajectories (millions of frames, billions of atoms)
Random access by frame index without scanning
Binary data with native-endian floats (no parsing overhead)
Complex hierarchical data (multiple properties per frame, metadata trees, datasets with different shapes)
The two formats complement each other. readcon-core handles the
con-to-data pipeline; HDF5 handles long-term archival and analysis.
5 How fast is readcon-core?¶
readcon-core parses con files 10-30x faster than pure-Python
readers (e.g., eOn’s fileio.py) by using:
fast-float2: SIMD-accelerated float parsing (2-3x over
str::parse)Memory-mapped I/O: large files are mmap’d, avoiding heap copies
Arc<str> symbols: one allocation per atom type, not per atom
Zero-copy iteration: the
ConFrameIteratorborrows from the input string without allocating per-lineForward skip:
forward()skips frames by line counting without parsing atom data
See benchmarks for measured numbers on real datasets.
6 What is the sections mechanism?¶
Version 2 files can include per-atom data beyond coordinates. Each additional section (velocities, forces) follows the same block structure as coordinates: blank separator, symbol line, label line, data lines.
The sections key in the JSON metadata declares which sections exist
and their order:
{"con_spec_version":2,"sections":["velocities","forces"]}
This is more robust than the legacy approach (detecting velocities by peeking for a blank separator) because:
The parser knows exactly what to expect
New section types can be added without ambiguity
Section order is explicit
Declared sections must be present at the declared position. Use an
empty sections array ([]) to state that no additional per-atom
sections follow.
When the key is absent, the reader keeps the blank-separator fallback
for existing .convel files.
Legacy .convel files without a sections key still work: the
parser falls back to blank-separator velocity detection.
7 What does validate=true do?¶
The validate metadata key asks v2 readers to reject frames that do
not satisfy strict ordering and schema invariants:
{"con_spec_version":2,"sections":["velocities"],"validate":true}
In this mode, sections must be present. Readers verify the
declared section order, exact component labels, component symbols,
integer identity columns, matching fixed masks and atom ids across
sections, finite numeric values, physical cell geometry, positive
counts and masses, and the JSON types of reserved metadata keys.
8 Can I store forces and energies?¶
Yes, in two complementary places:
Per-frame total energy lives in the JSON metadata under the
energykey.Forces are a per-atom vector section, declared via
sections:
{"con_spec_version":2,"sections":["forces"],"energy":-42.5,"potential":{"type":"EMT","params":{"cutoff":6.0}}}
For ML potentials that decompose total energy into per-atom
contributions, declare the energies section alongside forces and
emit one scalar per atom in an Energies of Component i block:
{"con_spec_version":2,"sections":["forces","energies"],"energy":-42.5}
The per-frame energy metadata key SHOULD equal the sum of the
per-atom energies section when both are present. Forces require
potential identification (the potential key) so downstream tools
know how to interpret the values.
9 Does readcon-core support compression?¶
Yes. Two formats are detected automatically by magic bytes:
gzip:
.con.gzextension or0x1f 0x8bmagic; always available.zstd:
.con.zstextension or0x28 0xb5 0x2f 0xfdmagic; opt-in behind thezstdCargo feature. Builds without the feature still detect zstd input and return a clear error pointing at the feature flag rather than producing a corrupt parse.
Writing through ConFrameWriter::from_path_gzip /
from_path_gzip_with_precision or (with the zstd feature) the
matching from_path_zstd constructors compresses output transparently.
Force data roughly triples per-atom file size. Gzip compression typically recovers 60-80% of that overhead; zstd usually trims an additional 5-15% over gzip for the same content.
10 How do I look up an atom by its atom_id?¶
readcon-core preserves atom_id (column 5) through every read-write
cycle, but the in-memory atom order follows the file’s type-grouped
layout, not the atom_id ordering. Two convenience APIs lift the
gap:
One-shot:
frame.atom_index_by_id(id)scans the atom list and returnsOption<usize>. O(N) per call.Repeated:
frame.build_atom_id_index()returns anFxHashMap<u64, usize>(Rust) / dict (Python) / Dict{UInt64, Int} (Julia) for O(1) reverse lookup. Build once and reuse for every lookup against the same frame.
Both APIs mirror across every supported binding (Rust, C ABI, C++, Python, Julia).
11 What languages are supported?¶
Language |
Mechanism |
Installation |
|---|---|---|
Rust |
Native crate |
|
Python |
PyO3 bindings |
|
C |
FFI (cdylib) |
link |
C++ |
RAII header |
|
Julia |
ccall wrapper |
|
All bindings share the same Rust core, ensuring identical parsing behavior across languages.
12 How do I convert between ASE and con?¶
import readcon
# con -> ASE Atoms (preserves atom_id, velocities, forces, masses)
frames = readcon.read_con("input.con")
ase_atoms = frames[0].to_ase()
# ASE Atoms -> con
frame = readcon.ConFrame.from_ase(ase_atoms)
readcon.write_con("output.con", [frame])
# Direct read to ASE list
ase_list = readcon.read_con_as_ase("trajectory.con")
The conversion preserves atom_id (via a custom per-atom array),
velocities, forces (via SinglePointCalculator), masses, and
constraints (FixAtoms).