Frequently Asked Questions¶
1 Why another atomic structure format?¶
The con format addresses a specific gap: lossless round-tripping of
atomic configurations through saddle-point search, NEB, and dimer
calculations. Existing formats lose information during read-write
cycles:
XYZ: no cell data, no fixed-atom flags, no atom identity tracking. A 218-atom slab written to XYZ and read back has lost the original atom ordering, constraint information, and periodicity.
POSCAR/CONTCAR: VASP-specific. No velocity or force sections. Selective dynamics is all-or-nothing per direction. No metadata for potential parameters or convergence state.
extxyz: extensible but underspecified. Every tool invents its own key names. No formal specification means round-trip fidelity depends on implementation details. Parsing performance suffers from per-atom key-value overhead.
CIF: designed for crystallography, not molecular dynamics. Verbose. No velocity or force representation. Overkill for transient simulation snapshots.
The con format is deliberately minimal: a fixed 9-line header, typed
atom blocks, and optional velocity/force sections. The v2 JSON
metadata line adds extensibility without breaking the core simplicity.
2 What problems does atomidsolve?¶
The con format groups atoms by element type. A structure with atoms
C, C, C, O, C, C (indices 0-5) gets written as five C atoms followed
by one O atom. Without a persistent identity field, the original
ordering vanishes after one read-write cycle.
This matters for:
NEB calculations: interpolated images must maintain consistent atom ordering across the band. If atom ordering drifts, the interpolation produces nonsense.
Dimer searches: the displacement vector references specific atom indices. Reordering atoms invalidates the mode.
Reference comparisons: comparing a relaxed structure against a reference (e.g., Baker test set) requires matching atoms by index.
The atom_id field (column 5) stores the pre-grouping index,
allowing exact reconstruction of the original ordering after any
number of read-write cycles.
3 Why JSON on line 2?¶
Line 2 was historically unused (“Time” or empty in eOn files). JSON provides:
Forward compatibility: new keys can be added without format changes. Unknown keys are preserved through round-trips.
Machine readability: no custom parser needed. Every language has a JSON library.
Section declaration: the
sectionskey tells the parser exactly what per-atom data to expect, eliminating ambiguity.Provenance:
potential,energy,generatorkeys make files self-documenting.Backward compatibility: pre-v2 files have non-JSON on line 2. The parser detects this (line 2 does not start with
{) and falls back to legacy mode (spec_version = 1).
4 When should I use HDF5 instead?¶
Use con for:
Single structures and short trajectories (< 10k frames)
Interoperability with eOn, readcon-core, and ASE
Human-readable files that can be inspected with
headSituations where simplicity and round-trip fidelity matter
Use HDF5 for:
Large-scale MD trajectories (millions of frames, billions of atoms)
Random access by frame index without scanning
Binary data with native-endian floats (no parsing overhead)
Complex hierarchical data (multiple properties per frame, metadata trees, datasets with different shapes)
The two formats complement each other. readcon-core handles the
con-to-data pipeline; HDF5 handles long-term archival and analysis.
5 How fast is readcon-core?¶
readcon-core parses con files 10-30x faster than pure-Python
readers (e.g., eOn’s fileio.py) by using:
fast-float2: SIMD-accelerated float parsing (2-3x over
str::parse)Memory-mapped I/O: large files are mmap’d, avoiding heap copies
Rc<String> symbols: one allocation per atom type, not per atom
Zero-copy iteration: the
ConFrameIteratorborrows from the input string without allocating per-lineForward skip:
forward()skips frames by line counting without parsing atom data
See benchmarks for measured numbers on real datasets.
6 What is the sections mechanism?¶
Version 2 files can include per-atom data beyond coordinates. Each additional section (velocities, forces) follows the same block structure as coordinates: blank separator, symbol line, label line, data lines.
The sections key in the JSON metadata declares which sections exist
and their order:
{"con_spec_version":2,"sections":["velocities","forces"]}
This is more robust than the legacy approach (detecting velocities by peeking for a blank separator) because:
The parser knows exactly what to expect
New section types can be added without ambiguity
Section order is explicit
Legacy .convel files without a sections key still work: the
parser falls back to blank-separator velocity detection.
7 Can I store forces and energies?¶
Yes. Forces are a per-atom section (declared via sections). Energy
is per-frame, stored in the JSON metadata:
{"con_spec_version":2,"sections":["forces"],"energy":-42.5,"potential":{"type":"EMT","params":{"cutoff":6.0}}}
Forces require potential identification (the potential key) so
downstream tools know how to interpret the values.
8 Does readcon-core support compression?¶
Yes. Files with a .con.gz extension (or gzip magic bytes 0x1f 0x8b) are transparently decompressed during reading. Writing supports
gzip compression via dedicated constructors.
Force data roughly triples per-atom file size. Gzip compression typically recovers 60-80% of that overhead, making it practical to store coordinates + velocities + forces in a single compressed file.
9 What languages are supported?¶
Language |
Mechanism |
Installation |
|---|---|---|
Rust |
Native crate |
|
Python |
PyO3 bindings |
|
C |
FFI (cdylib) |
link |
C++ |
RAII header |
|
Julia |
ccall wrapper |
|
All bindings share the same Rust core, ensuring identical parsing behavior across languages.
10 How do I convert between ASE and con?¶
import readcon
# con -> ASE Atoms (preserves atom_id, velocities, forces, masses)
frames = readcon.read_con("input.con")
ase_atoms = frames[0].to_ase()
# ASE Atoms -> con
frame = readcon.ConFrame.from_ase(ase_atoms)
readcon.write_con("output.con", [frame])
# Direct read to ASE list
ase_list = readcon.read_con_as_ase("trajectory.con")
The conversion preserves atom_id (via a custom per-atom array),
velocities, forces (via SinglePointCalculator), masses, and
constraints (FixAtoms).