As the simulation is updated to stream Parquet files directly, non-numeric values must be treated at the exact moment the data is generated in memory, before it hits the disk.
Why at this stage: Parquet handles missing or specialized numeric data natively and efficiently using bitmasks for nullability and IEEE 754 floating-point standards for infinity/NaN. Storing these flags as raw text strings (e.g., writing the literal string "NaN" or "inf") causes severe schema degradation, forces columns into resource-intensive string types, and breaks vectorization.
How to treat them: * True Empty/Missing States: Map empty or missing simulation metrics directly to PyArrow / Parquet null entries. Parquet stores null values using a highly optimized validity bitmap, meaning missing data consumes virtually zero disk space and requires no parsing when loaded.
Mathematical Boundaries: Represent actual simulation limits (e.g., division by zero in liquidity ratios) using native IEEE 754 values via float('inf'), float('-inf'), or float('nan'). When written via Arrow, these map perfectly to standard float types without shifting the column schema to a string type.
As the simulation is updated to stream Parquet files directly, non-numeric values must be treated at the exact moment the data is generated in memory, before it hits the disk.