fix(tests): repair default test failures#942
Open
smithfabian wants to merge 2 commits intojundot:mainfrom
Open
fix(tests): repair default test failures#942smithfabian wants to merge 2 commits intojundot:mainfrom
smithfabian wants to merge 2 commits intojundot:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes several “fresh clone” test failures and improves default test-suite consistency for new contributors by aligning tests with current constants, completing admin model-settings responses, strengthening mocked streaming infrastructure, addressing a boundary snapshot cleanup race, and ensuring required FastAPI multipart support is present in dev installs.
Changes:
- Make accuracy benchmark test expectations derive from
VALID_BENCHMARKSinstead of a hard-coded count. - Expose missing model settings fields in the admin “list models” response and classify
preserve_thinkingas a universal profile field. - Fix streaming test mocking gaps, harden
BoundarySnapshotSSDStore.cleanup_all()against in-flight writes, and addpython-multipartto dev dependencies.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_accuracy_benchmark.py | Removes stale hard-coded benchmark count by asserting against len(VALID_BENCHMARKS). |
| tests/integration/test_e2e_streaming.py | Extends the mock engine pool with get_entry() to satisfy server code paths during streaming tests. |
| pyproject.toml | Adds python-multipart to dev dependency sets to prevent FastAPI import-time failures for form/file routes in tests. |
| omlx/model_profiles.py | Adds preserve_thinking to UNIVERSAL_PROFILE_FIELDS for correct profile/template field classification. |
| omlx/cache/boundary_snapshot_store.py | Ensures queue tasks are marked done and uses join() to avoid cleanup races with in-flight background writes. |
| omlx/admin/routes.py | Includes preserve_thinking and turboquant_skip_last in the admin model settings response payload. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes a few test found from a fresh fork + clone setup. The fixes make the current default test suite behave consistently for new contributors.
The failing tests were:
tests/test_accuracy_benchmark.py::TestAccuracyBenchmarkRequest::test_all_valid_benchmarkstests/test_admin_api_key.py::TestListModelsSettings::test_list_models_includes_all_model_settings_fieldstests/test_admin_profiles_api.py::test_all_model_settings_fields_classifiedPlus two additional failures found while validating the test suite:
tests/integration/test_e2e_streaming.pytests/test_boundary_snapshot_store.py::TestBoundarySnapshotSSDStore::test_cleanup_all_drains_queueThis also fixes:
Changes
len(VALID_BENCHMARKS)instead of a stale hard-coded integer. Fixes failing test 1.preserve_thinkingandturboquant_skip_lastin the admin model settings response. Fixes failing test 2.preserve_thinkingas a universal profile field inUNIVERSAL_PROFILE_FIELDS. Fixes failing test 3.MockEnginePool.get_entry()for mocked streaming tests. Fixes validation gap 4.BoundarySnapshotSSDStore.cleanup_all()race where cleanup could miss an in-flight background write. Fixes validation gap 5.python-multipartdev dependency needed by FastAPI form/file routes in the test suite. Fixes failure 6, for freshpip install -e ".[dev]".Test validation