chore(tier3): refreeze baselines snapshot (track 19 untracked tests + LSODE)#627
Merged
Conversation
… LSODE) The frozen `baselines.json` regression snapshot is a manual `tier3_report --freeze-baselines` artifact, not refreshed by CI. The `tier3_baseline_diff` gate only fails on regressions of entries already present, reporting brand-new tier3 tests as informational "new" — so every tier3 test added since the last freeze accumulated unguarded. This refreezes the full snapshot (78 -> 97 entries) from a clean `cargo nextest run --workspace -E 'test(tier3_)'` (all 233 tier3 tests pass): - Adds regression coverage for 19 tracked-but-unfrozen tests (RNP, LSODE, relative-state, mars/phobos, NESC NRHO, dyncomp attach-to-ref-frame, kinematic-propagation, child-derivative, complex attach/detach, and the run2/6c/6d/9b/10b simulation RUNs). - Tightens `tier3_simulation_lsode_default` by ~7 orders of magnitude: position 9.5 km -> 9.2e-4 m, velocity 11 m/s -> 1.0e-6 m/s. The old ceiling was frozen when LSODE was a stub; #616/#617 implemented the stiff BDF integrator, so the meaningful error is now sub-mm. The loose ceiling let a 7-orders accuracy regression pass unnoticed. All deltas are tightenings (informational) or new entries; the `check_baseline_widening` lane reports 0 errors / 0 warnings, and `tier3_baseline_diff` is OK (97 matched, 0 new). `attach_mass` is unchanged from #624. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Updates the Tier 3 JEOD cross-validation baseline snapshot to reflect the current set of passing tier3_ tests, so newly-added Tier 3 tests gain regression coverage and LSODE’s baseline reflects the now-real solver accuracy.
Changes:
- Refreezes Tier 3 baselines from 78 → 97 tracked tests (adding the previously “new/unfrozen” Tier 3 tests).
- Tightens the
tier3_simulation_lsode_defaultbaseline thresholds to sub-millimeter / ~1e-6 m/s levels. - Regenerates the human-readable
baselines.mdto match the updated JSON snapshot.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| crates/astrodyn_verif_jeod/test_data/baselines.json | Adds 19 new baseline entries and tightens LSODE default tolerances for Tier 3 regression gating. |
| crates/astrodyn_verif_jeod/test_data/baselines.md | Regenerates the readable baseline report (count + new sections) to mirror the updated JSON snapshot. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…apture 2 new tests The previous freeze failed the fast-bucket baseline-invariance gate: - tier3_earth_moon_rosetta was added as a baseline entry but the fast CI bucket excludes the earth_moon suite, so it reported as missing. Add it to .github/tier3-allow-missing.txt alongside tier3_earth_moon_clem. - sim_drag_ver_const (#621) and simulation_tide_run02 (#625) landed on main after the freeze and were unguarded; refreeze captures both. - Rename tier3_sim_dyncomp_run_attach_to_ref_frame extras *_max_quat_angle/*_max_ang_vel -> *_..._err to match the schema used by every other tier3 report (Copilot review). Values byte-identical. Verified: baseline_diff OK (99 matched; 0 allowed-missing; 0 new); widening check 0 error(s), 0 removed; fmt + clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refreezes the Tier 3
baselines.jsonregression snapshot from a clean fullsuite run, closing a drift gap between the frozen baseline (78 entries) and
the actual set of passing tier3 tests (99).
baselines.jsonis a manualtier3_report --freeze-baselinesartifact —CI never refreshes it. The
tier3_baseline_diffgate only fails onregressions of entries already present; brand-new tier3 tests are reported
as informational
newand pass. So every tier3 test added since the lastfreeze accumulated unguarded against future regression.
Regenerated via:
What changes (78 → 99 entries)
+19 tracked-but-unfrozen tests now gain regression coverage:
csr_compare_gravity_octants,earth_moon_rosetta,mars_orb_init_phobos,mars_phobos,nesc_cc8_nrho,sim_attach_detach_trajectory_simple,sim_complex_attach_detach_pre_attach_trajectory,sim_compute_child_derivative_{full,pre_attach}_trajectory,sim_dyncomp_run_attach_to_ref_frame,sim_kinematic_propagation_simple,simulation_relative_{a_rot_no_trans,ab_rot_ab_trans,no_rot_ab_trans},simulation_run{2_lvlh_rot_init_propagation,6c_plane_change,6d_departure,9b_torque_initial_rate,10b_gravity_torque_circular_rate}.1 existing entry tightened —
tier3_simulation_lsode_default:The old ceiling was frozen while LSODE was a stub; #616/#617 implemented the
stiff BDF integrator, so the real error is now sub-millimeter. A ~7-order
accuracy regression could previously pass unnoticed.
Safety
baseline was loosened.
scripts/check_baseline_widening.shvsorigin/main:0 error(s), 0 warning(s), 6 tightening(s), 21 new, 0 removed.tier3_baseline_diff:OK (99 matched; 0 allowed-missing; 0 new).tier3_sim_attach_massis unchanged from test(tier3): add 9 SIM_verif_attach_mass RUNs (#99 Bucket B) #624 (byte-identical).Follow-up worth considering (not in this PR)
There's no forcing function keeping the frozen set in sync with the tier3 test
set. A CI check that fails when a
tier3_test exists without a baseline entry(rather than reporting
newas a passing notice) would prevent recurrence.🤖 Generated with Claude Code