Skip to content

chore(tier3): refreeze baselines snapshot (track 19 untracked tests + LSODE)#627

Merged
simnaut merged 2 commits into
mainfrom
regen-tier3-baselines
May 25, 2026
Merged

chore(tier3): refreeze baselines snapshot (track 19 untracked tests + LSODE)#627
simnaut merged 2 commits into
mainfrom
regen-tier3-baselines

Conversation

@simnaut
Copy link
Copy Markdown
Owner

@simnaut simnaut commented May 25, 2026

Summary

Refreezes the Tier 3 baselines.json regression snapshot from a clean full
suite run, closing a drift gap between the frozen baseline (78 entries) and
the actual set of passing tier3 tests (99).

baselines.json is a manual tier3_report --freeze-baselines artifact —
CI never refreshes it. The tier3_baseline_diff gate only fails on
regressions of entries already present; brand-new tier3 tests are reported
as informational new and pass. So every tier3 test added since the last
freeze accumulated unguarded against future regression.

Regenerated via:

rm -rf target/tier3_crossval
cargo nextest run --workspace -E 'test(tier3_)'   # 233 passed
cargo run -p astrodyn_verif_jeod --bin tier3_report -- --freeze-baselines

What changes (78 → 99 entries)

+19 tracked-but-unfrozen tests now gain regression coverage:

csr_compare_gravity_octants, earth_moon_rosetta, mars_orb_init_phobos,
mars_phobos, nesc_cc8_nrho, sim_attach_detach_trajectory_simple,
sim_complex_attach_detach_pre_attach_trajectory,
sim_compute_child_derivative_{full,pre_attach}_trajectory,
sim_dyncomp_run_attach_to_ref_frame, sim_kinematic_propagation_simple,
simulation_relative_{a_rot_no_trans,ab_rot_ab_trans,no_rot_ab_trans},
simulation_run{2_lvlh_rot_init_propagation,6c_plane_change,6d_departure,9b_torque_initial_rate,10b_gravity_torque_circular_rate}.

1 existing entry tightened — tier3_simulation_lsode_default:

component old ceiling new
position x/y/z 9.5e3 / 9.1e3 / 6.0e3 m 9.2e-4 / 8.7e-4 / 5.8e-4 m
velocity x/y/z 1.1e1 / 9.9e0 / 6.6e0 m/s 1.0e-6 / 9.6e-7 / 6.4e-7 m/s

The old ceiling was frozen while LSODE was a stub; #616/#617 implemented the
stiff BDF integrator, so the real error is now sub-millimeter. A ~7-order
accuracy regression could previously pass unnoticed.

Safety

  • All deltas are tightenings (informational) or new entries — no
    baseline was loosened.
  • scripts/check_baseline_widening.sh vs origin/main:
    0 error(s), 0 warning(s), 6 tightening(s), 21 new, 0 removed.
  • tier3_baseline_diff: OK (99 matched; 0 allowed-missing; 0 new).
  • tier3_sim_attach_mass is unchanged from test(tier3): add 9 SIM_verif_attach_mass RUNs (#99 Bucket B) #624 (byte-identical).

Follow-up worth considering (not in this PR)

There's no forcing function keeping the frozen set in sync with the tier3 test
set. A CI check that fails when a tier3_ test exists without a baseline entry
(rather than reporting new as a passing notice) would prevent recurrence.

🤖 Generated with Claude Code

… LSODE)

The frozen `baselines.json` regression snapshot is a manual
`tier3_report --freeze-baselines` artifact, not refreshed by CI. The
`tier3_baseline_diff` gate only fails on regressions of entries already
present, reporting brand-new tier3 tests as informational "new" — so every
tier3 test added since the last freeze accumulated unguarded.

This refreezes the full snapshot (78 -> 97 entries) from a clean
`cargo nextest run --workspace -E 'test(tier3_)'` (all 233 tier3 tests pass):

- Adds regression coverage for 19 tracked-but-unfrozen tests (RNP, LSODE,
  relative-state, mars/phobos, NESC NRHO, dyncomp attach-to-ref-frame,
  kinematic-propagation, child-derivative, complex attach/detach, and the
  run2/6c/6d/9b/10b simulation RUNs).
- Tightens `tier3_simulation_lsode_default` by ~7 orders of magnitude:
  position 9.5 km -> 9.2e-4 m, velocity 11 m/s -> 1.0e-6 m/s. The old
  ceiling was frozen when LSODE was a stub; #616/#617 implemented the
  stiff BDF integrator, so the meaningful error is now sub-mm. The loose
  ceiling let a 7-orders accuracy regression pass unnoticed.

All deltas are tightenings (informational) or new entries; the
`check_baseline_widening` lane reports 0 errors / 0 warnings, and
`tier3_baseline_diff` is OK (97 matched, 0 new). `attach_mass` is unchanged
from #624.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 25, 2026 07:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Tier 3 JEOD cross-validation baseline snapshot to reflect the current set of passing tier3_ tests, so newly-added Tier 3 tests gain regression coverage and LSODE’s baseline reflects the now-real solver accuracy.

Changes:

  • Refreezes Tier 3 baselines from 78 → 97 tracked tests (adding the previously “new/unfrozen” Tier 3 tests).
  • Tightens the tier3_simulation_lsode_default baseline thresholds to sub-millimeter / ~1e-6 m/s levels.
  • Regenerates the human-readable baselines.md to match the updated JSON snapshot.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
crates/astrodyn_verif_jeod/test_data/baselines.json Adds 19 new baseline entries and tightens LSODE default tolerances for Tier 3 regression gating.
crates/astrodyn_verif_jeod/test_data/baselines.md Regenerates the readable baseline report (count + new sections) to mirror the updated JSON snapshot.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/astrodyn_verif_jeod/test_data/baselines.json
…apture 2 new tests

The previous freeze failed the fast-bucket baseline-invariance gate:

- tier3_earth_moon_rosetta was added as a baseline entry but the fast CI
  bucket excludes the earth_moon suite, so it reported as missing. Add it to
  .github/tier3-allow-missing.txt alongside tier3_earth_moon_clem.
- sim_drag_ver_const (#621) and simulation_tide_run02 (#625) landed on main
  after the freeze and were unguarded; refreeze captures both.
- Rename tier3_sim_dyncomp_run_attach_to_ref_frame extras
  *_max_quat_angle/*_max_ang_vel -> *_..._err to match the schema used by
  every other tier3 report (Copilot review). Values byte-identical.

Verified: baseline_diff OK (99 matched; 0 allowed-missing; 0 new);
widening check 0 error(s), 0 removed; fmt + clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@simnaut simnaut enabled auto-merge (squash) May 25, 2026 18:32
@simnaut simnaut merged commit bd82437 into main May 25, 2026
18 checks passed
@simnaut simnaut deleted the regen-tier3-baselines branch May 25, 2026 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants