The CON File Format Specification

Date:

2026-03-25

1 Specification

Version

2

Date

2026-03-25

Status

Stable

Reference implementation

readcon-core

This document defines version 2 of the CON file format. It supersedes all prior informal descriptions. New implementations SHOULD target this version.

The keywords MUST, MUST NOT, REQUIRED, SHALL, SHOULD, SHOULD NOT, MAY, and OPTIONAL follow RFC 2119 semantics.

2 Overview

The CON format stores atomic configurations for molecular dynamics and transition-state search simulations. It originated in the eOn code and has since been adopted by multiple tools including ASE.

A CON file contains one or more frames. Each frame encodes a simulation cell, per-type metadata (masses, atom counts), and per-atom data (coordinates, constraints, identity). Optional sections add velocities, forces, or other per-atom vector/scalar data.

3 File extensions

.con

Coordinate-only configuration files.

.convel

Configuration files with velocity data per frame.

.con.gz

Gzip-compressed CON files (see compression).

4 Encoding

CON files MUST use UTF-8. Line endings MAY be LF (\n) or CRLF (\r\n); parsers MUST accept both. All numeric values use ASCII decimal representation (no locale-dependent formatting).

5 Frame structure

Each frame consists of:

  1. A 9-line header.

  2. One coordinate block per atom type.

  3. Zero or more additional per-atom sections (velocities, forces), each preceded by a blank separator line.

Multiple frames are concatenated directly with no inter-frame separator.

6 Header (9 lines)

Line

Name

Content

Example

1

Generator comment

Free-form text

Generated by eOn

2

Metadata

JSON object or free-form text

{"con_spec_version":2}

3

Cell dimensions

3 floats: Lx Ly Lz

15.3456 21.702 100.0

4

Cell angles

3 floats: alpha beta gamma

90.0 90.0 90.0

5

Reserved

Free-form text (round-tripped)

0 0

6

Reserved

Free-form text (round-tripped)

218 0 1

7

Atom type count

1 integer: N

2

8

Atoms per type

N integers

216 2

9

Mass per type

N floats (atomic mass units)

63.546 1.00793

6.1 Line 1: Generator comment

Free-form text. Writers SHOULD set this to a human-readable identifier. Parsers MUST preserve it through round-trips but MUST NOT assign it semantic meaning.

6.2 Line 2: Metadata

Line 2 carries machine-readable metadata as a single-line JSON object.

6.2.1 Version 2+ files

Writers MUST emit a JSON object containing at least con_spec_version with an integer value:

{"con_spec_version":2}

Additional keys MAY appear. Parsers MUST preserve unrecognized keys through round-trips. Reserved metadata keys listed in metadata-keys MUST use the declared JSON type. The sections key, when present, MUST be an array of strings. The validate key, when present, MUST be a boolean.

6.2.2 Legacy (pre-v2) files

Files produced before this specification may contain free-form text on line 2. A conforming parser detects the format by checking whether line 2, after trimming whitespace, starts with {:

  • Starts with {: parse as JSON, extract con_spec_version.

  • Does not start with {: treat as legacy (implicit version 1).

If line 2 starts with { but contains malformed JSON, the parser MUST report an error. If the JSON object lacks con_spec_version, the parser MUST report an error. If con_spec_version exceeds the highest version the parser supports, the parser MUST report an error.

6.3 Lines 3-4: Cell geometry

Line 3: three whitespace-separated floats (cell edge lengths in angstroms). Line 4: three whitespace-separated floats (cell angles in degrees). Tabs and spaces are both valid separators.

For non-orthogonal cells, the angle-based representation introduces floating-point drift through trigonometric round-trips. Writers SHOULD include the exact 3x3 lattice vector matrix in the JSON metadata via the lattice_vectors key (see lattice-vectors). When lattice_vectors is present, readers SHOULD prefer it over the length/angle values on lines 3-4.

When validate is true, readers MUST reject zero or negative cell lengths and MUST reject angles outside the open interval (0, 180) degrees.

6.4 Lines 5-6: Reserved

Free-form text with no defined semantics. Writers MAY emit empty lines. Parsers MUST preserve these for round-trip fidelity.

6.5 Lines 7-9: Type metadata

Line 7: single positive integer N (number of atom types). Line 8: exactly N positive integers (atom count per type). Line 9: exactly N floats (atomic mass per type, in amu).

7 Coordinate blocks

For each atom type i (1 to N), in the order declared in lines 8-9:

  1. Symbol line: chemical symbol (e.g., Cu, H).

  2. Label line: Coordinates of Component /i/

  3. Atom lines: one per atom, containing:

Column

Type

Description

1

float

x coordinate (angstroms)

2

float

y coordinate (angstroms)

3

float

z coordinate (angstroms)

4

int

Constraint bitmask (see constraints)

5

int

Atom index (see atom-index)

Columns are whitespace-separated.

7.1 Per-direction constraints (column 4)

Column 4 encodes per-direction constraint flags as a bitmask. Bit 0 = x, bit 1 = y, bit 2 = z.

Value

Meaning

0

Free (all directions)

1

All-fixed (legacy, treated as 7)

2-6

Per-direction combinations

7

Fixed in all directions

Readers MUST treat value 1 as equivalent to 7. Writers MUST emit 7 for all-fixed atoms, never 1.

7.2 Atom index (column 5)

The atom index preserves the original position of each atom before type-based grouping. The CON format groups atoms by element type, which reorders them. Without a persistent index, the original ordering cannot be recovered after a read-write cycle.

Version 2 requirements:

  • Writers MUST emit column 5 on every atom line.

  • Column 5 MUST contain the pre-grouping index.

  • Readers MUST parse and preserve column 5 through write-back.

Version 1 behavior:

  • Column 5 is present but its semantics are undefined.

Readers SHOULD accept 4-column atom lines. When column 5 is absent, default to the sequential position within the frame (0, 1, 2, …).

8 Additional per-atom sections

After coordinate blocks, a frame MAY contain additional per-atom data sections. Each section follows the same block structure: a blank separator line, then per-component blocks (symbol line, label line, data lines).

8.1 Section declaration

Version 2 files declare sections in the JSON metadata using the sections key:

{"con_spec_version":2,"sections":["velocities","forces"]}

The parser reads sections in the declared order. Parsers MUST reject unknown section names with an error. Every declared section MUST be present, complete, and parseable at its declared position. An empty sections array declares that no additional per-atom sections follow.

Section name

Label pattern

Columns

Data

velocities

Velocities of Component /i/

5

vx vy vz fixedflagatomid

forces

Forces of Component /i/

5

fx fy fz fixedflagatomid

energies

Energies of Component /i/

3

energy fixedflagatomid

The energies section carries one scalar per atom, useful for ML potentials that decompose total energy into local contributions. Writers MAY emit it alongside forces, alone, or omit it entirely.

When the energies section is present:

  • The per-frame total energy metadata key SHOULD equal the sum of the per-atom contributions. Implementations MAY warn on a mismatch but MUST NOT reject the frame on that ground (a reader cannot tell apart a numerical-noise mismatch from a deliberate definition where the per-atom decomposition does not sum to the total).

  • The total energy metadata key MAY still be absent, in which case the per-atom contributions are the only energy data on the frame.

  • The fixed_flag and atom_id columns SHOULD match the coordinate block, exactly as for velocities and forces. In validate=true mode they MUST match.

The fixed_flag and atom_id columns in every additional section repeat the coordinate-block identity data for the same atom ordering. Writers SHOULD emit the same values as the coordinate block. Readers associate section rows by component order and MAY ignore the duplicate identity columns after parsing the row shape; in validate=true mode readers MUST verify that the fixed_flag and atom_id columns match the coordinate block.

8.2 Validation mode

Version 2 files MAY set validate to true in the JSON metadata:

{"con_spec_version":2,"sections":["velocities","forces"],"validate":true}

When validate is true, the sections key MUST be present, even when the value is an empty array. Conforming readers MUST verify that the frame satisfies strict v2 invariants before accepting it. At minimum, this validation MUST check:

  • Reserved metadata keys use the declared JSON types.

  • Numeric tokens are finite.

  • Cell lengths are positive, cell angles are in (0, 180) degrees, atom counts are positive, and masses are positive.

  • Coordinate component labels exactly match Coordinates of Component /i/.

  • Component symbols are recognized element symbols, or X for an explicitly unknown element.

  • Coordinate and additional-section fixed_flag and atom_id fields are exact integer tokens.

  • The section component symbol matches the coordinate component symbol.

  • The section label exactly matches the declared section and component number (for example, Velocities of Component 1).

  • Each section row’s fixed_flag decodes to the same per-axis fixed mask as the corresponding coordinate row.

  • Each section row’s atom_id equals the corresponding coordinate row’s atom_id.

If any check fails, the reader MUST reject the frame. When validate is absent or false, readers MAY parse files by associating section rows by component order and ignoring duplicate identity column mismatches after parsing the row shape.

8.2.1 Error paths and ParseError variants

The reference reader (readcon-core) surfaces validation failures as typed ParseError variants. Other implementations are expected to return analogous structured errors, but the variant names below are specific to readcon-core and are listed here so that the spec and the reference implementation stay in sync.

ParseError variant

Fires when

MissingSpecVersion

Line 2 starts with { but no con_spec_version key.

UnsupportedSpecVersion(v)

con_spec_version exceeds CON_SPEC_VERSION.

InvalidMetadataJson(msg)

JSON is malformed, validate is not a boolean, sections is not an array of strings, or a reserved key has the wrong JSON type. Also fires when validate=true omits the sections key.

ValidationError(msg) (validate=true only)

Cell geometry, masses, coordinate component label, component symbol, fixed_flag / atom_id integer tokens, section component symbol, section label, or per-row identity columns mismatch the rules above. Numeric tokens that are not finite are rejected here as well, after parse-time tokenization.

UnknownSection(name)

A name in the sections array does not match a known section.

IncompleteHeader

Fewer than 9 header lines remain.

IncompleteFrame

Coordinate block ends short.

IncompleteVelocitySection

Declared velocity section ends short or is absent.

IncompleteForceSection

Declared force section ends short or is absent.

A minimal example of each path: the file resources/test/tiny_cuh2_strict_invalid.con (if present) and the test cases under src/parser.rs::tests and tests/parseforces.rs exercise these branches and serve as executable references.

8.3 Legacy section detection

Files without a sections key use blank-separator detection: if a blank line follows coordinate blocks, the parser attempts to parse a velocity section. A present sections key disables this fallback, including when the value is []. This preserves backward compatibility with existing .convel files while giving v2 writers a precise declaration mechanism.

Writers SHOULD always emit the sections key when writing additional sections.

8.4 Velocity section

Per component i: blank separator, symbol line, Velocities of Component /i/ label, then one line per atom: vx vy vz fixed_flag atom_id.

8.5 Force section

Per component i: blank separator, symbol line, Forces of Component /i/ label, then one line per atom: fx fy fz fixed_flag atom_id.

Frames with forces SHOULD include the potential and energy metadata keys.

8.6 Extending with new section types

New section types follow the same pattern: declare in the sections array, use a blank separator, symbol line, <Name> of Component /i/ label, and data lines.

9 Multi-frame files

Frames are concatenated with no separator. After parsing a frame’s data, the parser attempts the next 9-line header. If fewer than 9 lines remain, parsing ends.

Writers MUST NOT insert extra blank lines between frames.

10 Data types and precision

  • Floats: any valid decimal representation. Writers SHOULD emit at least 6 significant digits. For lossless f64 round-tripping, 17 digits suffice. Readers MUST reject non-finite values (NaN, Infinity, -Infinity).

  • Integers: constraint bitmask (0-7), atomid, natmtypes, and natmspertypeare non-negative integers.

11 Compression

CON files MAY be gzip-compressed. Readers SHOULD detect compression by checking the first two bytes for the gzip magic number (0x1f 0x8b) rather than relying on file extension. The decompressed content MUST be a valid CON file.

12 Constraints and limits

  • Atoms MUST appear grouped by type, in header-declared order.

  • Component numbering starts at 1.

  • Total atom count equals the sum of line 8 values.

  • Symbol strings SHOULD match IUPAC element symbols.

  • No upper limit on atom types or atom count is imposed.

14 Version history

Version

Date

Changes

1

(original)

De facto format from eOn. Column 5 present, undefined.

2

2026-03-25

JSON metadata. atomidsemantics. Per-direction constraints.

Declared sections. Force blocks. Compression.

15 Detecting the spec version

Read line 2. If it starts with {, parse as JSON and extract con_spec_version. Otherwise the file predates this specification (implicit version 1).

16 Examples

16.1 Minimal v2 file

Generated by eOn
{"con_spec_version":2}
10.000000 10.000000 10.000000
90.000000 90.000000 90.000000


1
2
63.546000
Cu
Coordinates of Component 1
0.000000 0.000000 0.000000 7 0
5.000000 5.000000 5.000000 0 1

16.2 File with velocities and forces

Generated by eOn
{"con_spec_version":2,"sections":["velocities","forces"],"energy":-42.5,"potential":{"type":"EMT","params":{}}}
15.345600 21.702000 100.000000
90.000000 90.000000 90.000000
0 0
218 0 1
2
2 2
63.546000 1.007930
Cu
Coordinates of Component 1
   0.639400    0.904500    6.975300 7    0
   3.196900    0.904500    6.975300 7    1
H
Coordinates of Component 2
   8.682300    9.947000   11.733000 0  2
   7.942100    9.947000   11.733000 0  3

Cu
Velocities of Component 1
   0.001234    0.002345   -0.003456 7    0
   0.004567   -0.005678    0.006789 7    1
H
Velocities of Component 2
  -0.012345    0.023456    0.034567 0  2
   0.045678   -0.056789   -0.067890 0  3

Cu
Forces of Component 1
   0.123456    0.234567   -0.345678 7    0
   0.456789   -0.567890    0.678901 7    1
H
Forces of Component 2
  -1.234567    2.345678    3.456789 0  2
   4.567890   -5.678901   -6.789012 0  3

16.3 Trajectory frame with metadata

Generated by eOn 3.1
{"con_spec_version":2,"generator":"eOn 3.1","units":{"length":"angstrom","energy":"eV"},"frame_index":5,"time":2.5,"timestep":0.5}
15.345600 21.702000 100.000000
90.000000 90.000000 90.000000


2
2 2
63.546000 1.007930
Cu
Coordinates of Component 1
   0.639400    0.904500    6.975300 7    0
   3.196900    0.904500    6.975300 7    1
H
Coordinates of Component 2
   8.682300    9.947000   11.733000 0  2
   7.942100    9.947000   11.733000 0  3

16.4 Legacy (pre-v2) file

Random Number Seed
0.0000 TIME
15.345600 21.702000 100.000000
90.000000 90.000000 90.000000
0 0
0 0 0
2
216 2
63.546 1.00793
Cu
Coordinates of Component 1
   0.639400    0.904500   -0.000100 1    0
   3.197000    0.904500   -0.000100 1    1
...

A conforming v2 reader processes this without error, assigning spec_version = 1 because line 2 does not start with {.