TEST: Moving dataset tests to end-to-end by rlundeen2 · Pull Request #1589 · microsoft/PyRIT

rlundeen2 · 2026-04-10T19:14:22Z

We don't need these to run all dataset tests every PR

Adding some basic dataset tests for integration
Adding comprehensive dataset tests to end-to-end (which runs daily). Added resiliency to these

All tests pass manually.

Replace TestSeedDatasetProviderIntegration (parameterized over all 58 providers) with TestSeedDatasetSmoke testing 3 representative providers: - 1 local YAML provider (no network) - 1 remote URL-based provider (_XSTestDataset) - 1 remote HuggingFace provider (_SimpleSafetyTestsDataset) Move the full parametric test to tests/end_to_end/test_all_datasets.py for daily CI runs, avoiding flaky HF rate-limiting failures on every PR. Add tests/integration/datasets/test_load_default_datasets_integration.py to verify the LoadDefaultDatasets initializer pipeline with real data. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Retry up to 3 times with exponential backoff (5s, 10s, 60s max) on transient network errors (OSError, ConnectionError, TimeoutError) - 5-minute per-test timeout to prevent hung downloads from blocking CI - Uses tenacity (already a project dependency) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The second test (test_all_scenario_datasets_are_fetchable) duplicated the work of the first — LoadDefaultDatasets.initialize_async() already fetches all scenario datasets and will fail if any are missing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Use memory.get_seed_dataset_names() (sync, returns list[str]) instead of the nonexistent memory.get_seed_datasets_async(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ests Root causes for the 3 failures: - _PromptIntelDataset: requires PROMPTINTEL_API_KEY (paid API) -> skip when credential is missing - _HarmBenchMultimodalDataset / _VLSUMultimodalDataset: download remote images one-by-one; when all fetches fail due to rate-limiting, the provider returns an empty dataset -> skip with explanation rather than failing the test infrastructure Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Root cause: multimodal dataset providers (HarmBench, VLSU) use DataTypeSerializer which requires CentralMemory to save downloaded images. Without memory initialized, every image save fails silently, producing empty datasets. Fix: - Add conftest.py with initialize_pyrit_async(IN_MEMORY) for e2e tests - Skip PromptIntelDataset when PROMPTINTEL_API_KEY is not set - Gracefully skip multimodal providers if all image downloads fail - Clean up test structure (remove redundant try/except wrapping) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move initialize_pyrit_async call from conftest.py (which would affect all e2e tests) into a module-scoped autouse fixture within the dataset test file itself. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

rlundeen2 and others added 7 commits April 10, 2026 12:11

Fix LoadDefaultDatasets integration test to use correct memory API

bcd505a

Use memory.get_seed_dataset_names() (sync, returns list[str]) instead of the nonexistent memory.get_seed_datasets_async(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz approved these changes Apr 11, 2026

View reviewed changes

rlundeen2 merged commit 71eaa26 into microsoft:main Apr 11, 2026
38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TEST: Moving dataset tests to end-to-end#1589

TEST: Moving dataset tests to end-to-end#1589
rlundeen2 merged 7 commits intomicrosoft:mainfrom
rlundeen2:users/rlundeen/2026_04_10_integration

rlundeen2 commented Apr 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rlundeen2 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rlundeen2 commented Apr 10, 2026 •

edited

Loading