FEAT Add VisualLeakBench dataset loader (arXiv:2603.13385) by Copilot · Pull Request #1531 · microsoft/PyRIT

Copilot · 2026-03-22T20:04:01Z

Adds PyRIT support for the VisualLeakBench / MM-SafetyBench dataset — a multimodal benchmark of 1,000 adversarial images testing LVLMs against OCR injection (harmful instructions embedded in images) and PII leakage (social engineering to extract SSNs, passwords, API keys, etc.).

Fixes #1530

New: `_VisualLeakBenchDataset`

Fetches metadata.csv from YoutingWang/MM-SafetyBench on GitHub and downloads images with local caching
Produces image+text prompt pairs per example, linked by prompt_group_id (image at sequence=0, category-specific query at sequence=1)
Maps harm categories: ocr_injection for OCR entries; pii_leakage + normalized PII type (e.g. ssn, api_key) for PII entries
Supports filtering via categories, pii_types, and max_examples
Registered with tags={"default", "safety", "privacy"}, modalities=["image", "text"] for SeedDatasetFilter discovery

New enums

Enum	Values
`VisualLeakBenchCategory`	`OCR_INJECTION`, `PII_LEAKAGE`
`VisualLeakBenchPIIType`	`EMAIL`, `DOB`, `PHONE`, `PASSWORD`, `PIN`, `API_KEY`, `SSN`, `CREDIT_CARD`

Usage

from pyrit.datasets.seed_datasets.remote import (
    _VisualLeakBenchDataset,
    VisualLeakBenchCategory,
    VisualLeakBenchPIIType,
)

# Load only PII leakage examples for SSN and Password
loader = _VisualLeakBenchDataset(
    categories=[VisualLeakBenchCategory.PII_LEAKAGE],
    pii_types=[VisualLeakBenchPIIType.SSN, VisualLeakBenchPIIType.PASSWORD],
)
dataset = await loader.fetch_dataset()

Test coverage

26 unit tests covering init validation, OCR/PII pair creation, harm category mapping, category/PII-type filtering, max_examples, failed image handling, and metadata correctness. Integration test updated to cap image downloads at max_examples=6 (same pattern as _VLSUMultimodalDataset).

Additionally: refactoring VisualLeakBench + VLSU loaders

Applied during code review to both the new loader and the existing _VLSUMultimodalDataset:

Moved prompt constants from module-level to class-level (OCR_INJECTION_PROMPT, PII_LEAKAGE_PROMPT)
Made class metadata attributes immutable (frozenset for tags, tuple for modalities/harm_categories)
Extracted _matches_filters() and _build_prompt_pair_async() helpers from long fetch_dataset methods
Replaced manual setup_memory fixture with @pytest.mark.usefixtures("patch_central_database") in tests
Renamed test_failed_image_download_skips_example → test_all_images_fail_produces_empty_dataset for clarity

…update Co-authored-by: romanlutz <10245648+romanlutz@users.noreply.github.com> Agent-Logs-Url: https://github.com/Azure/PyRIT/sessions/1797b39b-c590-40f4-9bfa-73ee156591b1

…ak-bench-dataset-loader

- Move prompt constants from module-level to class-level (OCR_INJECTION_PROMPT, PII_LEAKAGE_PROMPT) - Make class metadata immutable (frozenset for tags, tuples for modalities/harm_categories) - Extract _build_prompt_pair_async and _matches_filters helpers in both loaders - Use patch_central_database fixture instead of manual CentralMemory setup in tests - Rename test_failed_image_download_skips_example to test_all_images_fail_produces_empty_dataset Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Initial plan

e5dcf57

Copilot AI assigned Copilot and romanlutz Mar 22, 2026

Copilot started work on behalf of romanlutz March 22, 2026 20:04 View session

Add VisualLeakBench dataset loader, unit tests, and integration test …

afead6e

…update Co-authored-by: romanlutz <10245648+romanlutz@users.noreply.github.com> Agent-Logs-Url: https://github.com/Azure/PyRIT/sessions/1797b39b-c590-40f4-9bfa-73ee156591b1

Copilot AI changed the title ~~[WIP] Add dataset loader for VisualLeakBench in PyRIT~~ Add VisualLeakBench dataset loader (arXiv:2603.13385) Mar 22, 2026

Copilot AI requested a review from romanlutz March 22, 2026 20:21

Copilot finished work on behalf of romanlutz March 22, 2026 20:21

romanlutz marked this pull request as ready for review April 11, 2026 00:40

romanlutz and others added 2 commits April 10, 2026 17:41

Merge remote-tracking branch 'origin/main' into copilot/add-visual-le…

c377142

…ak-bench-dataset-loader

romanlutz changed the title ~~Add VisualLeakBench dataset loader (arXiv:2603.13385)~~ FEAT Add VisualLeakBench dataset loader (arXiv:2603.13385) Apr 11, 2026

romanlutz and others added 2 commits April 10, 2026 18:07

Fix VisualLeakBench authors to include all four paper authors

1fdad0c

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add VisualLeakBench bibliography entry to references.bib

3bc32ab

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz approved these changes Apr 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT Add VisualLeakBench dataset loader (arXiv:2603.13385)#1531

FEAT Add VisualLeakBench dataset loader (arXiv:2603.13385)#1531
Copilot wants to merge 6 commits intomainfrom
copilot/add-visual-leak-bench-dataset-loader

Copilot AI commented Mar 22, 2026 •

edited by romanlutz

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 22, 2026 • edited by romanlutz Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New: _VisualLeakBenchDataset

New enums

Usage

Test coverage

Additionally: refactoring VisualLeakBench + VLSU loaders

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 22, 2026 •

edited by romanlutz

Loading

New: `_VisualLeakBenchDataset`