TEST: Moving dataset tests to end-to-end#1589
Merged
rlundeen2 merged 7 commits intomicrosoft:mainfrom Apr 11, 2026
Merged
Conversation
Replace TestSeedDatasetProviderIntegration (parameterized over all 58 providers) with TestSeedDatasetSmoke testing 3 representative providers: - 1 local YAML provider (no network) - 1 remote URL-based provider (_XSTestDataset) - 1 remote HuggingFace provider (_SimpleSafetyTestsDataset) Move the full parametric test to tests/end_to_end/test_all_datasets.py for daily CI runs, avoiding flaky HF rate-limiting failures on every PR. Add tests/integration/datasets/test_load_default_datasets_integration.py to verify the LoadDefaultDatasets initializer pipeline with real data. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Retry up to 3 times with exponential backoff (5s, 10s, 60s max) on transient network errors (OSError, ConnectionError, TimeoutError) - 5-minute per-test timeout to prevent hung downloads from blocking CI - Uses tenacity (already a project dependency) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The second test (test_all_scenario_datasets_are_fetchable) duplicated the work of the first — LoadDefaultDatasets.initialize_async() already fetches all scenario datasets and will fail if any are missing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use memory.get_seed_dataset_names() (sync, returns list[str]) instead of the nonexistent memory.get_seed_datasets_async(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ests Root causes for the 3 failures: - _PromptIntelDataset: requires PROMPTINTEL_API_KEY (paid API) -> skip when credential is missing - _HarmBenchMultimodalDataset / _VLSUMultimodalDataset: download remote images one-by-one; when all fetches fail due to rate-limiting, the provider returns an empty dataset -> skip with explanation rather than failing the test infrastructure Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Root cause: multimodal dataset providers (HarmBench, VLSU) use DataTypeSerializer which requires CentralMemory to save downloaded images. Without memory initialized, every image save fails silently, producing empty datasets. Fix: - Add conftest.py with initialize_pyrit_async(IN_MEMORY) for e2e tests - Skip PromptIntelDataset when PROMPTINTEL_API_KEY is not set - Gracefully skip multimodal providers if all image downloads fail - Clean up test structure (remove redundant try/except wrapping) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move initialize_pyrit_async call from conftest.py (which would affect all e2e tests) into a module-scoped autouse fixture within the dataset test file itself. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
romanlutz
approved these changes
Apr 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We don't need these to run all dataset tests every PR
All tests pass manually.