Repository status: we are organizing the public release. Code, evaluation scripts, prompts, and additional assets will be released progressively.
Multimodal Large Reasoning Models (MLRMs) can correctly perceive risk-relevant visual cues, yet still fail to enforce safety constraints when harmful objectives are embedded in seemingly benign contexts. We term this failure mode Safety Context Amnesia (SCA): during reasoning, the model over-prioritizes contextual coherence and narrative alignment, causing latent risk signals to be suppressed.
Across multiple multimodal safety benchmarks, IGSR substantially improves defense success rates while largely preserving utility.
This project studies multimodal safety failures and defenses. As a result, the paper materials contain unsafe or harmful examples used strictly for research and evaluation