C++ pybind11 extension for ASON (Array-Schema Object Notation).
Provides 7 functions without requiring manual schema strings for encoding:
encode, encodeTyped, encodePretty, encodePrettyTyped, decode, encodeBinary, decodeBinary.
The wheel also ships ason.pyi and py.typed, so editors and static type checkers can understand the extension module without a separate stub package.
| Tool | Version |
|---|---|
| g++ | ≥ 11 (C++17) |
| python3-dev | any (provides Python.h) |
| Python | ≥ 3.8 |
pybind11 2.13.6 headers are vendored in vendor/pybind11/ — no separate installation needed.
# Option A — shell script (auto-installs python3-dev via sudo if missing)
bash build.sh
# Option B — Makefile
make
# Option C — CMake
cmake -B build && cmake --build build| Python value | Inferred ASON type |
|---|---|
bool |
bool |
int |
int |
float |
float |
str |
str |
None |
optional (e.g. str?, int?) |
Cross-row type merging for lists: When encoding a list, all rows are scanned to compute the final type:
- A field that is non-
Nonein row 0 butNonein some later row is promoted to optional (e.g.str→str?,int→int?). - Type conflicts between non-
Nonevalues (e.g.intin row 0,strin row 1) fall back tostr.
This means encodeTyped is safe to use even when only some rows have None for a given field.
ason.encode({"id": 1, "name": "Alice"})
# → '{id,name}:\n(1,Alice)\n'
ason.encode([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}])
# → '[{id,name}]:\n(1,Alice),\n(2,Bob)\n'Untyped decode semantics: When decoded with
decode(), all field values are returned as strings because the untyped schema carries no type information. UseencodeTypedwhen you need a type-preserving round-trip.
Type is inferred from all rows (not just the first). A field that is None in any row is made optional:
ason.encodeTyped({"id": 1, "name": "Alice", "active": True})
# → '{id@int,name@str,active@bool}:\n(1,Alice,true)\n'
# Optional field inferred from cross-row merging:
ason.encodeTyped([{"id": 1, "tag": "hello"}, {"id": 2, "tag": None}])
# → '[{id@int,tag@str?}]:\n(1,hello),\n(2,)\n'pretty = ason.encodePretty(rows)pretty = ason.encodePrettyTyped(rows)Decodes both typed and untyped schemas embedded in the text:
# typed schema → values restored as Python types
rec = ason.decode('{id@int, name@str}:\n(1,Alice)\n') # {'id': 1, 'name': 'Alice'}
rows = ason.decode('[{id@int, name@str}]:\n(1,Alice),\n(2,Bob)\n')
# untyped schema → all values returned as strings
rec2 = ason.decode('{id,name}:\n(1,Alice)\n') # {'id': '1', 'name': 'Alice'}Block comments are supported anywhere whitespace is allowed:
rec = ason.decode('/* top */ {id@int,name@str}: /* row */ (1, /* name */ Alice)')data = ason.encodeBinary(rows)Schema is required because the binary wire format carries no embedded type information:
rows = ason.decodeBinary(data, "[{id@int, name@str}]")ason-py includes inline typing support for the compiled extension:
from ason import decode
rows = decode("[{id@int, name@str}]:(1,Alice),(2,Bob)")Type checkers will infer dict[str, Any] | list[dict[str, Any]] for decode results and validate function signatures from the bundled ason.pyi.
Little-endian layout, identical to ason-rs and ason-go:
| Type | Bytes |
|---|---|
int |
8 (i64 LE) |
uint |
8 (u64 LE) |
float |
8 (f64 LE) |
bool |
1 |
str |
4-byte length LE + UTF-8 bytes |
| optional | 1-byte tag (0=null, 1=present) + value |
| slice | 4-byte count LE + elements |
# after building:
python3 -m pytest tests/ -vimport ason
users = [
{"id": 1, "name": "Alice", "score": 9.5},
{"id": 2, "name": "Bob", "score": 7.2},
]
# Schema is inferred automatically—no schema string needed
text = ason.encode(users) # untyped schema
textTyped = ason.encodeTyped(users) # typed schema (use for round-trip)
pretty = ason.encodePrettyTyped(users)# pretty + typed
blob = ason.encodeBinary(users) # binary (schema inferred internally)
assert ason.decode(textTyped) == users # typed round-trip
assert ason.decode(pretty) == users
assert ason.decodeBinary(blob, "[{id@int, name@str, score@float}]") == usersMeasured on this machine with:
bash build.sh
PYTHONPATH=. python3 examples/bench.pyHeadline numbers:
- Flat 1,000-record dataset: ASON text serialize
118.98msvs JSON403.32ms, deserialize221.21msvs JSON441.89ms - Flat 10,000-record dataset: ASON text serialize
81.70msvs JSON293.38ms, deserialize158.39msvs JSON317.44ms - Size summary for 1,000 flat records: JSON
137,674 B, ASON text57,761 B(58%smaller), ASON binary74,454 B(46%smaller vs JSON) - Throughput summary on 1,000 records: ASON text was
3.58xfaster than JSON for serialize and2.01xfaster for deserialize - Binary mode was even faster:
7.18xfaster than JSON on serialization and4.16xfaster on deserialization in the benchmark summary