Deal with non-numeric values such as NaN, Inf, Null at the simulation output stage where data goed from memory to parquet files

1. The Native Simulation Output Stage (Strategic / Long-Term)

As the simulation is updated to stream Parquet files directly, non-numeric values must be treated at the exact moment the data is generated in memory, before it hits the disk.

    Why at this stage: Parquet handles missing or specialized numeric data natively and efficiently using bitmasks for nullability and IEEE 754 floating-point standards for infinity/NaN. Storing these flags as raw text strings (e.g., writing the literal string "NaN" or "inf") causes severe schema degradation, forces columns into resource-intensive string types, and breaks vectorization.

    How to treat them: * True Empty/Missing States: Map empty or missing simulation metrics directly to PyArrow / Parquet null entries. Parquet stores null values using a highly optimized validity bitmap, meaning missing data consumes virtually zero disk space and requires no parsing when loaded.

        Mathematical Boundaries: Represent actual simulation limits (e.g., division by zero in liquidity ratios) using native IEEE 754 values via float('inf'), float('-inf'), or float('nan'). When written via Arrow, these map perfectly to standard float types without shifting the column schema to a string type.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deal with non-numeric values such as NaN, Inf, Null at the simulation output stage where data goed from memory to parquet files #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Deal with non-numeric values such as NaN, Inf, Null at the simulation output stage where data goed from memory to parquet files #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions