Skip to content

open-meteo/python-omfiles

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Bindings for Open Meteo File Format

Python 3.10 | 3.11 | 3.12 | 3.13 | 3.14 PyPI version Build and Test

Features

  • Read Open-Meteo (.om) files directly from cloud storage using Python
  • Traverse the hierarchical data structure
  • Arrays/array slices are returned directly as NumPy arrays
  • Support for fsspec and xarray
  • Chunked data access behind the scenes

Installation

pip install omfiles

Pre-Built Wheels & Platform Support

We provide pre-built wheels for the following platforms:

  • Linux x86_64 (manylinux_2_28_x86_64)
  • Linux aarch64 (manylinux_2_28_aarch64)
  • Linux musl x86_64 (musllinux_1_2_x86_64)
  • Windows x86_64 (win_amd64)
  • Windows ARM64 (win_arm64)
  • macOS x86_64 (macosx_10_12_x86_64)
  • macOS ARM64 (macosx_11_0_arm64)

Reading

Reading Files without Hierarchy

OM files are structured like a tree of variables. The following example assumes that the file test_file.om contains an array variable as a root variable which has a dimensionality greater than 2 and a size of at least 2x100:

from omfiles import OmFileReader

reader = OmFileReader("test_file.om")
data = reader[0:2, 0:100, ...]
reader.close() # Close the reader to release resources

Reading Hierarchical Files, e.g. S3 Spatial Files

import datetime as dt

import fsspec
import numpy as np
from omfiles import OmFileReader

# Example: URI for a spatial data file in the `data_spatial` S3 bucket
# See data organization details: https://github.com/open-meteo/open-data?tab=readme-ov-file#data-organization
MODEL_DOMAIN = "dwd_icon"
# Note: Spatial data is only retained for 7 days. The script uses one file within this period.
date_time = dt.datetime.now(dt.timezone.utc) - dt.timedelta(days=2)
S3_URI = (
    f"s3://openmeteo/data_spatial/{MODEL_DOMAIN}/{date_time.year}/"
    f"{date_time.month:02}/{date_time.day:02}/0000Z/"
    f"{date_time.strftime('%Y-%m-%d')}T0000.om"
)
print(f"Using om file: {S3_URI}")

# Create and open filesystem, wrapping it in a blockcache
backend = fsspec.open(
    f"blockcache::{S3_URI}",
    mode="rb",
    s3={"anon": True, "default_block_size": 65536},  # s3 settings
    blockcache={"cache_storage": "cache"},  # blockcache settings
)
# Create reader from the fsspec file object using a context manager.
# This will automatically close the file when the block is exited.
with OmFileReader(backend) as root:
    # We are at the root of the data hierarchy!
    # What type of node is this?
    print(f"root.is_array: {root.is_array}")  # False
    print(f"root.is_scalar: {root.is_scalar}")  # False
    print(f"root.is_group: {root.is_group}")  # True

    temperature_reader = root.get_child_by_name("temperature_2m")
    print(f"temperature_reader.is_array: {temperature_reader.is_array}")  # True
    print(f"temperature_reader.is_scalar: {temperature_reader.is_scalar}")  # False
    print(f"temperature_reader.is_group: {temperature_reader.is_group}")  # False

    # What shape does the stored array have?
    print(f"temperature_reader.shape: {temperature_reader.shape}")  # (1441, 2879)

    # Read all data from the array
    temperature_data = temperature_reader.read_array((...))
    print(f"temperature_data.shape: {temperature_data.shape}")  # (1441, 2879)

    # It's also possible to read any subset of the array
    temperature_data_subset1 = temperature_reader.read_array((slice(0, 10), slice(0, 10)))
    print(temperature_data_subset1)
    print(f"temperature_data_subset1.shape: {temperature_data_subset1.shape}")  # (10, 10)

    # Numpy basic indexing is supported for direct access if the reader is an array.
    temperature_data_subset2 = temperature_reader[0:10, 0:10]
    print(temperature_data_subset2)
    print(f"temperature_data_subset2.shape: {temperature_data_subset2.shape}")  # (10, 10)

    # Compare the two temperature subsets and verify that they are the same
    are_equal = np.array_equal(temperature_data_subset1, temperature_data_subset2, equal_nan=True)
    print(f"Are the two temperature subsets equal? {are_equal}")

Writing

Single Array

import numpy as np
from omfiles import OmFileWriter

# Create sample data
data = np.random.rand(100, 100).astype(np.float32)

# Initialize writer
writer = OmFileWriter("simple.om")

# Write array with compression
array_variable = writer.write_array(
    data,
    chunks=[50, 50],
    scale_factor=1.0,
    add_offset=0.0,
    compression="pfor_delta_2d",
    name="data"
)

# Finalize the file using array_variable as entry-point
writer.close(array_variable)

Hierarchical Structure

import numpy as np
from omfiles import OmFileWriter

# Create sample data
features = np.random.rand(1000, 64).astype(np.float32)
labels = np.random.randint(0, 10, size=(1000,), dtype=np.int32)

# Initialize writer
writer = OmFileWriter("hierarchical.om")

# Write child arrays first
features_var = writer.write_array(features, chunks=[100, 64], name="features", compression="pfor_delta_2d")
labels_var = writer.write_array(labels, chunks=[100], name="labels")
metadata_var = writer.write_scalar(42, name="metadata")

# Create root group with children
root_var = writer.write_group(
    name="root",
    children=[features_var, labels_var, metadata_var],
)
# Finalize the file using root_var as entry-point into the hierarchy
writer.close(root_var)

Development

# install the required dependencies in .venv directory
uv sync
# to run the tests
uv run pytest tests/
# to build the wheels
uv run build
# or to trigger maturin directly:
# maturin develop

Tests

cargo test

runs rust tests.

uv run pytest tests/

runs Python tests.

Python Type Stubs

Can be generated from the rust doc comments via

cargo run stub_gen

About

Python bindings for the Open-Meteo file format

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •