Skip to content

Release v1.1.0: Pyproject migration, Pixi support, and CMIP7 CVs#231

Draft
pgierz wants to merge 286 commits intomainfrom
prep-release
Draft

Release v1.1.0: Pyproject migration, Pixi support, and CMIP7 CVs#231
pgierz wants to merge 286 commits intomainfrom
prep-release

Conversation

@pgierz
Copy link
Copy Markdown
Member

@pgierz pgierz commented Nov 5, 2025

Summary

This PR consolidates several major improvements for the pycmor 1.1.0 release:

Major Changes

✅ PR #212 - Pyproject Migration

  • Migrates from setup.py/setup.cfg to modern pyproject.toml configuration
  • Consolidates all project metadata and dependencies
  • Maintains versioneer for version management
  • Updates CI configuration to exclude CMIP7 DReq submodule from linting

✅ PR #224 - Pixi Support

  • Adds pixi package manager support with pixi.lock for reproducible environments
  • Configures pixi in pyproject.toml with conda-forge channel
  • Supports Python 3.10-3.12 on osx-arm64, osx-64, linux-64
  • Installs pycmor as editable PyPI dependency
  • Includes dev environment with pytest

✅ PR #222 - CMIP7 Controlled Vocabularies Implementation

  • Adds CMIP7-CVs as git submodule (WCRP-CMIP/CMIP7-CVs, src-data branch)
  • Enhances ControlledVocabularies class to support both CMIP6 and CMIP7
  • Adds comprehensive unit tests for CV functionality
  • Updates yamllint config to exclude CMIP7-CVs directory
  • Adds documentation for CMIP7 CV implementation

Already Incorporated

The following PRs were already merged into prep-release:

Breaking Changes

  • Minimum Python version may change based on pixi configuration
  • Package now uses modern pyproject.toml (backwards compatible for users)

Testing

  • CI will run on Python 3.9, 3.10, 3.11, 3.12
  • All linting checks (black, isort, flake8, yamllint)
  • Full test suite including integration tests

Checklist

pgierz and others added 30 commits November 5, 2025 22:24
- Add missing imports to config.py doctest examples
- Convert file-writing example to code-block to avoid side effects
- Add proper imports (xarray, numpy) to bounds.py doctest examples
- Add Rule object creation in __init__.py doctest example for add_vertical_bounds
- Change print() assertions to direct boolean checks for cleaner output
- Update config.py expected xarray_engine from netcdf4 to h5netcdf (matches Dockerfile env)
- Add +ELLIPSIS directive to bounds functions to ignore INFO log output
- Keeps tests strict on actual functionality while allowing log format variations
…n pipelines

- Replace matrix-based jobs with individual named jobs per Python version
- Each version now flows independently: build-X-Y → meta-X-Y → [unit, integration, doctest]-X-Y
- Python 3.9 can complete entire pipeline while 3.12 is still building
- Reduces pipeline latency and improves parallelization
- Total jobs: 4 builds + 16 tests (4 versions × 4 test types)
- Add CMIP7_DReq_Software and cmip6-cmor-tables to flake8 exclusions
- Add both submodules to isort skip list
- Update black exclude pattern to cover all three submodules
- Prevents linting failures from third-party code in git submodules
- Use ellipsis wildcards in expected output lines instead of bare '...'
- Match actual logging output structure with '...INFO → message...'
- Avoids doctest ambiguity where '...' is interpreted as continuation prompt
- Properly validates that bounds are added while allowing variable formatting
- Set PYTHONLOGLEVEL=CRITICAL for all doctest jobs in CI
- Prevents logging output from interfering with doctest expected output
- Cleaner solution than modifying doctest examples or pytest config
- Applies to all four Python versions (3.9, 3.10, 3.11, 3.12)
- Add Docker login step to authenticate with ghcr.io
- Push images with two tags per Python version:
  - ghcr.io/esm-tools/pycmor-testground:py3.X-<commit-sha>
  - ghcr.io/esm-tools/pycmor-testground:py3.X-<branch-name>
- Upload images as workflow artifacts for same-run access
- Enables reproducible test environments via container registry
- Infrastructure as Code: Dockerfile.test defines test infrastructure
Document the Infrastructure as Code approach for test environments:
- Container image publishing to GitHub Container Registry
- Tagging scheme for reproducibility (commit SHA, branch, semver)
- CI/CD workflow for building and distributing testgrounds
- Local usage examples for developers
- Future improvements (conditional publishing, cleanup policies, multi-arch)
- Infrastructure as Code principles and traceability
- Troubleshooting guide for common issues

The testground system treats test infrastructure as code, with Dockerfile.test
as the declarative specification and container images as infrastructure artifacts.
- Use substring(github.sha, 0, 7) for 7-character short SHA
- Use github.head_ref || github.ref_name to get actual branch name
  in both PR and push contexts (avoids '231/merge' format)

This fixes the invalid tag error caused by github.ref_name returning
'231/merge' for pull requests instead of the source branch name.
Remove tar-based artifact workflow in favor of direct GHCR pulls.

Changes:
- Remove load: true and local unprefixed tags from all build jobs
- Remove tar export, cache, and artifact upload steps
- Update all test jobs to pull directly from ghcr.io
- Add GHCR login to all test jobs

Benefits:
- Fixes Docker Hub authentication error (no unprefixed tags)
- Simplifies workflow (-60 lines)
- Better performance (GHCR layer caching vs tar artifacts)
- Perfect CI/local parity - same images available locally
Bumps [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) from 1.9.0 to 1.13.0.
- [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases)
- [Commits](pypa/gh-action-pypi-publish@v1.9.0...v1.13.0)

---
updated-dependencies:
- dependency-name: pypa/gh-action-pypi-publish
  dependency-version: 1.13.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
…github/workflows/pypa/gh-action-pypi-publish-1.13.0

chore(deps): bump pypa/gh-action-pypi-publish from 1.9.0 to 1.13.0 in /.github/workflows
The >>> prompts in the .. code-block:: python directives were being
interpreted by pytest's --doctest-modules as actual doctests, even though
they're inside Sphinx code-block directives. This caused doctest failures due
to the ... continuation lines being misinterpreted as doctest continuation
markers.

Changed the examples to plain Python code without the interactive prompts,
which is more appropriate for Sphinx code-block directives anyway. The examples
are now for documentation purposes only, not executable doctests.

Fixes doctest errors in:
- add_bounds_from_coords()
- add_vertical_bounds()
The logger was hardcoded to INFO level, which meant that legitimate INFO
log statements (side effects of normal operation) would appear in doctest
output even when PYTHONLOGLEVEL=CRITICAL was set in CI.

Now the logger respects the PYTHONLOGLEVEL environment variable, allowing
doctests to run with logging suppressed while keeping the logging statements
in the actual code (which is correct - logging is a valid side effect).

Changes:
- Read PYTHONLOGLEVEL from environment, default to INFO if not set
- Apply the log level when configuring the RichHandler
- This allows CI doctest runs to suppress all logs below CRITICAL
Re-added doctest prompts (>>>) to bounds.py examples now that logging is
properly suppressed via PYTHONLOGLEVEL. The examples now show both input
and output datasets with structured representations, making it much easier
to understand what the functions do.

Changes:
- Restored >>> prompts for executable doctests
- Added print() statements for input datasets before transformation
- Added print() statements for output datasets after transformation
- Used doctest directives (+ELLIPSIS, +NORMALIZE_WHITESPACE) for flexibility
- Shows full xarray Dataset structure: dimensions, coordinates, data variables

This provides clear before/after visualization while maintaining executable
tests that verify the functions work correctly.
ARM64 builds take 3-4x longer due to QEMU emulation, so make them optional
to speed up CI. Builds now default to linux/amd64 only.

To build ARM64 images:
1. Go to Actions tab in GitHub
2. Select 'Run Basic Tests' workflow
3. Click 'Run workflow'
4. Check the 'Build ARM64 images' option

This allows:
- Fast CI for most PRs and commits (amd64 only)
- Manual ARM64 builds when needed for M1/M2/M3 Mac users
- ARM64 builds still happen on tags (for releases)

Changes:
- Add workflow_dispatch trigger with build_arm64 boolean input
- Conditionally set platforms based on input (defaults to amd64 only)
- Applied to all 4 Python version build jobs
- Implement three chunking algorithms (simple, even_divisor, iterative)
  inspired by dynamic_chunks library
- Add chunking module (src/pycmor/std_lib/chunking.py) with functions for
  calculating optimal chunk sizes based on target size and access patterns
- Integrate chunking into save_dataset() with automatic encoding generation
- Add 7 new configuration options for chunking and compression control
- Support global and per-rule chunking configuration via YAML
- Include comprehensive test suite (13 tests, all passing)
- Add user documentation with examples and troubleshooting guide
- Default: 100MB chunks, time-dimension preference, level 4 compression

This enables users to optimize NetCDF file I/O performance by configuring
internal chunking strategies that match their data access patterns.
pgierz and others added 30 commits December 12, 2025 09:54
…ss()

- Replace auto-import with enable_xarray_accessor() for lazy registration
- Add _build_rule() helper for interactive Rule construction
- Add StdLibAccessor with tab-completable std_lib steps via ds.pycmor.stdlib
- Add .process() method for running full pipelines interactively
- Add BaseModelRun ABC in pycmor.tutorial for test infrastructure
- Update existing tests to use enable_xarray_accessor()
- Add comprehensive test suite in test_accessor_api.py
# Conflicts:
#	src/pycmor/core/cmorizer.py
- Add required compound_name field to all CMIP7 test config rules
  (validator requires it for cmor_version=CMIP7)
- Add setuptools to Dockerfile.test (pyfesom2 imports pkg_resources)
The vendored all_var_info.json does not populate cmip7_compound_name or
cmip6_compound_name on DRVs. So variable_id falls back to the short
name (e.g., "tas"). The matching logic compared the full compound name
"Amon.tas" against the plain "tas" when only one side had a dot,
which always failed.

Fix: always extract the short name from compound_name for comparison,
regardless of whether the DRV also has dots. Also add a fallback match
against drv.name directly.

Add CMIP7 DRV fixtures (dr_cmip7_tas, dr_cmip7_thetao) for testing.
Pipeline._run_prefect() now uses return_state=True and checks for
failures, re-raising the original exception. Previously, Prefect
swallowed exceptions via on_failure callbacks that only logged.

CMORizer._parallel_process_prefect() also checks both the flow-level
state and individual rule future states for failures.

This ensures integration tests correctly fail when pipeline steps
raise exceptions.
DefaultPipeline had both handle_unit_conversion (correct pipeline step
taking data+rule) and units.convert (low-level function taking
da+from_unit+to_unit). The latter was called with (data, rule) args,
causing ParameterBindError: missing required argument 'to_unit'.

handle_unit_conversion already calls convert() internally, so the
duplicate step was both wrong and redundant.
- dimension_mapping.py: use getattr(rule, "dimension_mapping") instead
  of rule._pycmor_cfg("dimension_mapping", default={}) -- dimension_mapping
  is a rule attribute, not a config option, and everett rejects non-string
  defaults
- CMIP7 test configs: add activity_id="CMIP" to rules that need it for
  global attribute generation
- cmorizer.py: fix parallel error checking to handle both PrefectFuture
  and State objects from different Prefect versions
…_run

- dimension_mapping.py: check isinstance(user_mapping, dict) to handle
  Mock objects in tests (getattr on Mock returns Mock, not None)
- base_model_run.py: convert doctest example to code-block to prevent
  pytest from trying to execute it
Cherry-picked from PR #194 by @mzapponi (adapted for src/pycmor/ paths):
- gather_inputs.py: if rule has time_dimname and dataset uses that
  dimension instead of "time", rename it automatically on load
- pipeline.py: defensive getattr for _cluster attribute

Co-authored-by: Martina Zapponi <mzapponi@users.noreply.github.com>
fix: accessor API with lazy registration and BaseModelRun infrastructure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants