Skip to content

TEST: Moving dataset tests to end-to-end#1589

Merged
rlundeen2 merged 7 commits intomicrosoft:mainfrom
rlundeen2:users/rlundeen/2026_04_10_integration
Apr 11, 2026
Merged

TEST: Moving dataset tests to end-to-end#1589
rlundeen2 merged 7 commits intomicrosoft:mainfrom
rlundeen2:users/rlundeen/2026_04_10_integration

Conversation

@rlundeen2
Copy link
Copy Markdown
Contributor

@rlundeen2 rlundeen2 commented Apr 10, 2026

We don't need these to run all dataset tests every PR

  • Adding some basic dataset tests for integration
  • Adding comprehensive dataset tests to end-to-end (which runs daily). Added resiliency to these

All tests pass manually.

rlundeen2 and others added 7 commits April 10, 2026 12:11
Replace TestSeedDatasetProviderIntegration (parameterized over all 58
providers) with TestSeedDatasetSmoke testing 3 representative providers:
- 1 local YAML provider (no network)
- 1 remote URL-based provider (_XSTestDataset)
- 1 remote HuggingFace provider (_SimpleSafetyTestsDataset)

Move the full parametric test to tests/end_to_end/test_all_datasets.py
for daily CI runs, avoiding flaky HF rate-limiting failures on every PR.

Add tests/integration/datasets/test_load_default_datasets_integration.py
to verify the LoadDefaultDatasets initializer pipeline with real data.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Retry up to 3 times with exponential backoff (5s, 10s, 60s max) on
  transient network errors (OSError, ConnectionError, TimeoutError)
- 5-minute per-test timeout to prevent hung downloads from blocking CI
- Uses tenacity (already a project dependency)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The second test (test_all_scenario_datasets_are_fetchable) duplicated
the work of the first — LoadDefaultDatasets.initialize_async() already
fetches all scenario datasets and will fail if any are missing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use memory.get_seed_dataset_names() (sync, returns list[str]) instead
of the nonexistent memory.get_seed_datasets_async().

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ests

Root causes for the 3 failures:
- _PromptIntelDataset: requires PROMPTINTEL_API_KEY (paid API) -> skip
  when credential is missing
- _HarmBenchMultimodalDataset / _VLSUMultimodalDataset: download remote
  images one-by-one; when all fetches fail due to rate-limiting, the
  provider returns an empty dataset -> skip with explanation rather
  than failing the test infrastructure

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Root cause: multimodal dataset providers (HarmBench, VLSU) use
DataTypeSerializer which requires CentralMemory to save downloaded
images. Without memory initialized, every image save fails silently,
producing empty datasets.

Fix:
- Add conftest.py with initialize_pyrit_async(IN_MEMORY) for e2e tests
- Skip PromptIntelDataset when PROMPTINTEL_API_KEY is not set
- Gracefully skip multimodal providers if all image downloads fail
- Clean up test structure (remove redundant try/except wrapping)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move initialize_pyrit_async call from conftest.py (which would affect
all e2e tests) into a module-scoped autouse fixture within the dataset
test file itself.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rlundeen2 rlundeen2 merged commit 71eaa26 into microsoft:main Apr 11, 2026
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants