It appears that the List of All Adversarial Example Papers has been experiencing crashes over the past few days. In the absence of this valuable resource, staying up-to-date with the latest research papers in this field has become challenging. Consequently, I created a repository aimed at aggregating and maintaining the most current papers in this domain. While this repository may not encompass every paper, I did try. If you find any papers we have missed, just drop me an email. We have included the data from List of All Adversarial Example Papers till 2023-09-01. We also provide a list of papers about transfer-based attacks here.
-
R-Debater: Retrieval-Augmented Debate Generation through Argumentative Memory
Maoyuan Li, Zhongsheng Wang, Haoyuan Li, Jiamou Liu
-
PrivacyBench: A Conversational Benchmark for Evaluating Privacy in Personalized AI
Srija Mukhopadhyay, Sathwik Reddy, Shruthi Muthukumar, Jisun An, Ponnurangam Kumaraguru
-
Muhammad Abdullahi Said, Muhammad Sammani Sani
-
Takeru Kusakabe, Yudai Hirose, Mashiho Mukaida, Satoshi Ono
-
CPR: Causal Physiological Representation Learning for Robust ECG Analysis under Distribution Shifts
Shunbo Jia, Caizhi Liao
-
HeteroHBA: A Generative Structure-Manipulating Backdoor Attack on Heterogeneous Graphs
Honglin Gao, Lan Zhao, Junhao Ren, Xiang Li, Gaoxi Xiao
-
Sparse Offline Reinforcement Learning with Corruption Robustness
Nam Phuong Tran, Andi Nika, Goran Radanovic, Long Tran-Thanh, Debmalya Mandal
-
Secure Digital Semantic Communications: Fundamentals, Challenges, and Opportunities
Weixuan Chen, Qianqian Yang, Yuanyuan Jia, Junyu Pan, Shuo Shao, Jincheng Dai, Meixia Tao, Ping Zhang
-
Towards Provably Secure Generative AI: Reliable Consensus Sampling
Yu Cui, Hang Fu, Sicheng Pan, Zhuoyu Sun, Yifei Liu, Yuhong Nie, Bo Ran, Baohan Huang, Xufeng Zhang, Haibin Zhang, Cong Zuo, Licheng Wang
-
BEDA: Belief Estimation as Probabilistic Constraints for Performing Strategic Dialogue Acts
Hengli Li, Zhaoxin Yu, Qi Shen, Chenxi Li, Mengmeng Wang, Tinglang Wu, Yipeng Kang, Yuxuan Wang, Song-Chun Zhu, Zixia Jia, Zilong Zheng
-
Jailbreaking Attacks vs. Content Safety Filters: How Far Are We in the LLM Safety Arms Race?
Yuan Xin, Dingfan Chen, Linyi Yang, Michael Backes, Xiao Zhang
-
Privacy-Preserving Semantic Communications via Multi-Task Learning and Adversarial Perturbations
Yalin E. Sagduyu, Tugba Erpek, Aylin Yener, Sennur Ulukus
-
T2VAttack: Adversarial Attack on Text-to-Video Diffusion Models
Changzhen Li, Yuecong Min, Jie Zhang, Zheng Yuan, Shiguang Shan, Xilin Chen
-
Bridging Structure and Appearance: Topological Features for Robust Self-Supervised Segmentation
Haotang Li, Zhenyu Qi, Hao Qin, Huanrui Yang, Sen He, Kebin Peng
-
Yongtao Chen, Yanbo Wang, Wentao Zhao, Guole Shen, Tianchen Deng, Jingchuan Wang
-
Bayesian Self-Distillation for Image Classification
Anton Adelöw, Matteo Gamba, Atsuto Maki
-
Robust Egocentric Referring Video Object Segmentation via Dual-Modal Causal Intervention
Haijing Liu, Zhiyuan Song, Hefeng Wu, Tao Pu, Keze Wang, Liang Lin
-
Vladimir Frants, Sos Agaian
-
Kacem Khaled, Felipe Gohring de Magalhães, Gabriela Nicolescu
-
Assured Autonomy: How Operations Research Powers and Orchestrates Generative AI Systems
Tinglong Dai, David Simchi-Levi, Michelle Xiao Wu, Yao Xie
-
Ruixuan Huang, Qingyue Wang, Hantao Huang, Yudong Gao, Dong Chen, Shuai Wang, Wei Wang
-
SourceBroken: A large-scale analysis on the (un)reliability of SourceRank in the PyPI ecosystem
Biagio Montaruli, Serena Elisa Ponta, Luca Compagna, Davide Balzarotti
-
Jingyu Zhang
-
Sina Jahromi, Farshid Hajati, Alireza Rezaee, Javaher Nourian
-
It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents
Karolina Korgul, Yushi Yang, Arkadiusz Drohomirecki, Piotr Błaszczyk, Will Howard, Lukas Aichberger, Chris Russell, Philip H.S. Torr, Adam Mahdi, Adel Bibi
-
Zhen Liang, Hai Huang, Zhengkui Chen
-
AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents
Jiafeng Liang, Hao Li, Chang Li, Jiaqi Zhou, Shixin Jiang, Zekun Wang, Changkai Ji, Zhihao Zhu, Runxuan Liu, Tao Ren, Jinlan Fu, See-Kiong Ng, Xia Liang, Ming Liu, Bing Qin
-
Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks
Toqeer Ali Syed, Mishal Ateeq Almutairi, Mahmoud Abdel Moaty
-
Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing
Panagiotis Theocharopoulos, Ajinkya Kulkarni, Mathew Magimai.-Doss
-
NeXT-IMDL: Build Benchmark for NeXT-Generation Image Manipulation Detection & Localization
Yifei Li, Haoyuan He, Yu Zheng, Bingyao Yu, Wenzhao Zheng, Lei Chen, Jie Zhou, Jiwen Lu
-
ProGuard: Towards Proactive Multimodal Safeguard
Shaohan Yu, Lijun Li, Chenyang Si, Lu Sheng, Jing Shao
-
Multi-Agent Framework for Threat Mitigation and Resilience in AI-Based Systems
Armstrong Foundjem, Lionel Nganyewou Tidjon, Leuson Da Silva, Foutse Khomh
-
Yu Jiang, Xindi Tong, Ziyao Liu, Xiaoxi Zhang, Kwok-Yan Lam, Chee Wei Tan
-
Calibrated Multi-Level Quantile Forecasting
Tiffany Ding, Isaac Gibbs, Ryan J. Tibshirani
-
RobustMask: Certified Robustness against Adversarial Neural Ranking Attack via Randomized Masking
Jiawei Liu, Zhuo Chen, Rui Zhu, Miaokun Chen, Yuyang Gong, Wei Lu, Xiaofeng Wang
-
A Privacy Protocol Using Ephemeral Intermediaries and a Rank-Deficient Matrix Power Function (RDMPF)
Eduardo Salazar
-
Prompt-Induced Over-Generation as Denial-of-Service: A Black-Box Attack-Side Benchmark
Manu, Yi Guo, Jo Plested, Tim Lynar, Kanchana Thilakarathna, Nirhoshan Sivaroopan, Jack Yang, Wangli Yang
-
Zero-Trust Agentic Federated Learning for Secure IIoT Defense Systems
Samaresh Kumar Singh, Joyjit Roy, Martin So
-
Improved Bounds for Private and Robust Alignment
Wenqian Weng, Yi He, Xingyu Zhou
-
Adversarial Lens: Exploiting Attention Layers to Generate Adversarial Examples for Evaluation
Kaustubh Dhole
-
Roee Ziv, Raz Lapid, Moshe Sipper
-
DECEPTICON: How Dark Patterns Manipulate Web Agents
Phil Cuvin, Hao Zhu, Diyi Yang
-
Ju-Hsuan Weng, Jia-Wei Liao, Cheng-Fu Chou, Jun-Cheng Chen
-
Fundamental Novel Consistency Theory: $H$-Consistency Bounds
Yutao Zhong
-
Soham Padia, Dhananjay Vaidya, Ramchandra Mangrulkar
-
Hierarchical Pedagogical Oversight: A Multi-Agent Adversarial Framework for Reliable AI Tutoring
Saisab Sadhu, Ashim Dhor
-
Towards Reliable Evaluation of Adversarial Robustness for Spiking Neural Networks
Jihang Wang, Dongcheng Zhao, Ruolin Chen, Qian Zhang, Yi Zeng
-
Verifiable Dropout: Turning Randomness into a Verifiable Claim
Kichang Lee, Sungmin Lee, Jaeho Jin, JeongGil Ko
-
Secure and Explainable Fraud Detection in Finance via Hierarchical Multi-source Dataset Distillation
Yiming Qian, Thorsten Neumann, Xueyining Huang, David Hardoon, Fei Gao, Yong Liu, Siow Mong Rick Goh
-
StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars
Zhiyao Sun, Ziqiao Peng, Yifeng Ma, Yi Chen, Zhengguang Zhou, Zixiang Zhou, Guozhen Zhang, Youliang Zhang, Yuan Zhou, Qinglin Lu, Yong-Jin Liu
-
Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models
Mengqi He, Xinyu Tian, Xin Shen, Jinhong Ni, Shu Zou, Zhaoyuan Yang, Jing Zhang
-
Dunyuan XU, Xikai Yang, Yaoqian Li, Juzheng Miao, Jinpeng Li, Pheng-Ann Heng
-
Look Closer! An Adversarial Parametric Editing Framework for Hallucination Mitigation in VLMs
Jiayu Hu, Beibei Li, Jiangwei Xia, Yanjun Qin, Bing Ji, Zhongshi He
-
Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models
Zongmin Zhang, Zhen Sun, Yifan Liao, Wenhan Dong, Xinlei He, Xingshuo Han, Shengmin Xu, Xinyi Huang
-
Scaling Adversarial Training via Data Selection
Youran Ye, Dejin Wang, Ajinkya Bhandare
-
Attack-Aware Deepfake Detection under Counter-Forensic Manipulations
Noor Fatima, Hasan Faraz Khan, Muzammil Behzad
-
LLA: Enhancing Security and Privacy for Generative Models with Logic-Locked Accelerators
You Li, Guannan Zhao, Yuhao Ju, Yunqi He, Jie Gu, Hai Zhou
-
Mohammad Zakaria Haider, Amit Kumar Podder, Prabin Mali, Aranya Chakrabortty, Sumit Paudyal, Mohammad Ashiqur Rahman
-
Yunguo Yu
-
Do Latent Tokens Think? A Causal and Adversarial Analysis of Chain-of-Continuous-Thought
Yuyi Zhang, Boyu Tang, Tianjie Ju, Sufeng Duan, Gongshen Liu
-
First Provable Guarantees for Practical Private FL: Beyond Restrictive Assumptions
Egor Shulgin, Grigory Malinovsky, Sarit Khirirat, Peter Richtárik
-
Dictionary-Transform Generative Adversarial Networks
Angshul Majumdar
-
Assessing the Effectiveness of Membership Inference on Generative Music
Kurtis Chow, Omar Samiullah, Vinesh Sridhar, Hewen Zhang
-
Exploring the Security Threats of Retriever Backdoors in Retrieval-Augmented Code Generation
Tian Li, Bo Lin, Shangwen Wang, Yusong Tan
-
Machine Learning Power Side-Channel Attack on SNOW-V
Deepak, Rahul Balout, Anupam Golder, Suparna Kundu, Angshuman Karmakar, Debayan Das
-
Learning from Negative Examples: Why Warning-Framed Training Data Teaches What It Warns Against
Tsogt-Ochir Enkhbayar
-
Beyond Context: Large Language Models Failure to Grasp Users Intent
Ahmed M. Hussain, Salahuddin Salahuddin, Panos Papadimitratos
-
Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking
Yifan Huang, Xiaojun Jia, Wenbo Guo, Yuqiang Sun, Yihao Huang, Chong Wang, Yang Liu
-
Improving the Convergence Rate of Ray Search Optimization for Query-Efficient Hard-Label Attacks
Xinjie Xu, Shuyu Cheng, Dongwei Xu, Qi Xuan, Chen Ma
-
Efficient and Robust Video Defense Framework against 3D-field Personalized Talking Face
Rui-qing Sun, Xingshan Yao, Tian Lan, Hui-Yang Zhao, Jia-Ling Shi, Chen-Hao Cui, Zhijing Wu, Chen Yang, Xian-Ling Mao
-
Robustness Certificates for Neural Networks against Adversarial Attacks
Sara Taheri, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar, Majid Zamani
-
Time-Efficient Evaluation and Enhancement of Adversarial Robustness in Deep Neural Networks
Runqi Lin
-
Clever Hans in Chemistry: Chemist Style Signals Confound Activity Prediction on Public Benchmarks
Andrew D. Blevins, Ian K. Quigley
-
zkFL-Health: Blockchain-Enabled Zero-Knowledge Federated Learning for Medical AI Privacy
Savvy Sharma, George Petrovic, Sarthak Kaushik
-
Ji Hyuk Jung, Ji Won Yoon
-
AegisAgent: An Autonomous Defense Agent Against Prompt Injection Attacks in LLM-HARs
Yihan Wang, Huanqi Yang, Shantanu Pal, Weitao Xu
-
GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs
Lichao Wu, Sasha Behrouzi, Mohamadreza Rostami, Stjepan Picek, Ahmad-Reza Sadeghi
-
CoTDeceptor:Adversarial Code Obfuscation Against CoT-Enhanced LLM Code Agents
Haoyang Li, Mingjin Li, Jinxin Zuo, Siqi Li, Xiao Li, Hao Wu, Yueming Lu, Xiaochuan He
-
RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic
Le Wang, Zonghao Ying, Xiao Yang, Quanchen Zou, Zhenfei Yin, Tianlin Li, Jian Yang, Yaodong Yang, Aishan Liu, Xianglong Liu
-
LLM-Driven Feature-Level Adversarial Attacks on Android Malware Detectors
Tianwei Lan, Farid Naït-Abdesselam
-
The Imitation Game: Using Large Language Models as Chatbots to Combat Chat-Based Cybercrimes
Yifan Yao, Baojuan Wang, Jinhao Duan, Kaidi Xu, ChuanKai Guo, Zhibo Eric Sun, Yue Zhang
-
IoT-based Android Malware Detection Using Graph Neural Network With Adversarial Defense
Rahul Yumlembam, Biju Issac, Seibu Mary Jacob, Longzhi Yang
-
Honglin Mu, Jinghao Liu, Kaiyang Wan, Rui Xing, Xiuying Chen, Timothy Baldwin, Wanxiang Che
-
Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography
Songze Li, Jiameng Cheng, Yiming Li, Xiaojun Jia, Dacheng Tao
-
Evasion-Resilient Detection of DNS-over-HTTPS Data Exfiltration: A Practical Evaluation and Toolkit
Adam Elaoumari
-
Jaykumar Kasundra, Anjaneya Praharaj, Sourabh Surana, Lakshmi Sirisha Chodisetty, Sourav Sharma, Abhigya Verma, Abhishek Bhardwaj, Debasish Kanhar, Aakash Bhagat, Khalil Slimi, Seganrasan Subramanian, Sathwik Tejaswi Madhusudhan, Ranga Prasad Chenna, Srinivas Sunkara
-
Jixiao Yang, Jinyu Chen, Zixiao Huang, Chengda Xu, Chi Zhang, Sijia Li
-
Yuanjian Xu, Yuan Shuai, Jianing Hao, Guang Zhang
-
Ipek Sena Yilmaz, Onur G. Tuncer, Zeynep E. Aksoy, Zeynep Yağmur Baydemir
-
ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected
Kanchon Gharami, Sanjiv Kumar Sarkar, Yongxin Liu, Shafika Showkat Moni
-
Safety Alignment of LMs via Non-cooperative Games
Anselm Paulus, Ilia Kulikov, Brandon Amos, Rémi Munos, Ivan Evtimov, Kamalika Chaudhuri, Arman Zharmagambetov
-
Bridging Efficiency and Safety: Formal Verification of Neural Networks with Early Exits
Yizhak Yisrael Elboher, Avraham Raviv, Amihay Elboher, Zhouxing Shi, Omri Azencot, Hillel Kugler, Guy Katz
-
Adversarial Training for Failure-Sensitive User Simulation in Mental Health Dialogue Optimization
Ziyi Zhu, Olivier Tieleman, Caitlin A. Stamatis, Luka Smyth, Thomas D. Hull, Daniel R. Cahn, Matteo Malgaroli
-
Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics?
Zhengyang Shan, Aaron Mueller
-
Semantic Deception: When Reasoning Models Can't Compute an Addition
Nathaniël de Leeuw, Marceau Nahon, Mathis Reymond, Raja Chatila, Mehdi Khamassi
-
Mohammadreza Rostami, Solmaz S. Kia
-
Defending against adversarial attacks using mixture of experts
Mohammad Meymani, Roozbeh Razavi-Far
-
Real-World Adversarial Attacks on RF-Based Drone Detectors
Omer Gazit, Yael Itzhakev, Yuval Elovici, Asaf Shabtai
-
Investigating Model Editing for Unlearning in Large Language Models
Shariqah Hossain, Lalana Kagal
-
SemCovert: Secure and Covert Video Transmission via Deep Semantic-Level Hiding
Zhihan Cao, Xiao Yang, Gaolei Li, Jun Wu, Jianhua Li, Yuchen Liu
-
Failure Analysis of Safety Controllers in Autonomous Vehicles Under Object-Based LiDAR Attacks
Daniyal Ganiuly, Nurzhau Bolatbek, Assel Smaiyl
-
A.A. Gde Yogi Pramana, Jason Ray, Anthony Jaya, Michael Wijaya
-
Konstantin Kaulen, Tobias Ladner, Stanley Bak, Christopher Brix, Hai Duong, Thomas Flinkow, Taylor T. Johnson, Lukas Koller, Edoardo Manino, ThanhVu H Nguyen, Haoze Wu
-
Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline
Akshaj Prashanth Rao, Advait Singh, Saumya Kumaar Saksena, Dhruv Kumar
-
The Erasure Illusion: Stress-Testing the Generalization of LLM Forgetting Evaluation
Hengrui Jia, Taoran Li, Jonas Guan, Varun Chandrasekaran
-
Causal-Guided Detoxify Backdoor Attack of Open-Weight LoRA Models
Linzhi Chen, Yang Sun, Hongru Wei, Yuqi Chen
-
Lorenzo Capelli, Leandro de Souza Rosa, Gianluca Setti, Mauro Mangia, Riccardo Rovatti
-
Decoupled Generative Modeling for Human-Object Interaction Synthesis
Hwanhee Jung, Seunggwan Lee, Jeongyoon Yoon, SeungHyeon Kim, Giljoo Nam, Qixing Huang, Sangpil Kim
-
6DAttack: Backdoor Attacks in the 6DoF Pose Estimation
Jihui Guo, Zongmin Zhang, Zhen Sun, Yuhao Yang, Jinlin Wu, Fu Zhang, Xinlei He
-
Optimizer Dynamics at the Edge of Stability with Differential Privacy
Ayana Hussain, Ricky Fang
-
GShield: Mitigating Poisoning Attacks in Federated Learning
Sameera K. M., Serena Nicolazzo, Antonino Nocera, Vinod P., Rafidha Rehiman K. A
-
DREAM: Dynamic Red-teaming across Environments for AI Models
Liming Lu, Xiang Gu, Junyu Huang, Jiawei Du, Yunhuai Liu, Yongbin Zhou, Shuchao Pang
-
Conditional Adversarial Fragility in Financial Machine Learning under Macroeconomic Stress
Samruddhi Baviskar
-
Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning
Jiayun Wu, Jiashuo Liu, Zhiyuan Zeng, Tianyang Zhan, Tianle Cai, Wenhao Huang
-
Farjana Yesmin, Romana Akter
-
MEEA: Mere Exposure Effect-Driven Confrontational Optimization for LLM Jailbreaking
Jianyi Zhang, Shizhao Liu, Ziyin Zhou, Zhen Li
-
Gökdeniz Gülmez
-
DASH: Deception-Augmented Shared Mental Model for a Human-Machine Teaming System
Zelin Wan, Han Jun Yoon, Nithin Alluru, Terrence J. Moore, Frederica F. Nelson, Seunghyun Yoon, Hyuk Lim, Dan Dongseong Kim, Jin-Hee Cho
-
Junjun Pan, Yixin Liu, Rui Miao, Kaize Ding, Yu Zheng, Quoc Viet Hung Nguyen, Alan Wee-Chung Liew, Shirui Pan
-
FedVideoMAE: Efficient Privacy-Preserving Federated Video Moderation
Ziyuan Tao, Chuanzhi Xu, Sandaru Jayawardana, Wei Bao, Kanchana Thilakarathna, Teng Joon Lim
-
Zhiyuan Peng, Zihan Ye, Shreyank N Gowda, Yuping Yan, Haotian Xu, Ling Shao
-
SD2AIL: Adversarial Imitation Learning from Synthetic Demonstrations via Diffusion Models
Pengcheng Li, Qiang Fang, Tong Zhao, Yixing Lan, Xin Xu
-
Generating Risky Samples with Conformity Constraints via Diffusion Models
Han Yu, Hao Zou, Xingxuan Zhang, Zhengyi Wang, Yue He, Kehan Li, Peng Cui
-
Ni Ding, Songpei Lu, Wenjing Yang, Zijian Zhang
-
Khondokar Fida Hasan, Hasibul Hossain Shajeeb, Chathura Abeydeera, Benjamin Turnbull, Matthew Warren
-
Zhang Wei, Peilu Hu, Shengning Lang, Hao Yan, Li Mei, Yichao Zhang, Chen Yang, Junfeng Hao, Zhimo Han
-
Zehao Liu, Xi Lin
-
Who Can See Through You? Adversarial Shielding Against VLM-Based Attribute Inference Attacks
Yucheng Fan, Jiawei Chen, Yu Tian, Zhaoxia Yin
-
AL-GNN: Privacy-Preserving and Replay-Free Continual Graph Learning via Analytic Learning
Xuling Zhang, Jindong Li, Yifei Zhang, Menglin Yang
-
SoK: Understanding (New) Security Issues Across AI4Code Use Cases
Qilong Wu, Taoran Li, Tianyang Zhou, Varun Chandrasekaran
-
Rahul Yumlembam, Biju Issac, Nauman Aslam, Eaby Kollonoor Babu, Josh Collyer, Fraser Kennedy
-
Robust TTS Training via Self-Purifying Flow Matching for the WildSpoof 2026 TTS Track
June Young Yi, Hyeongju Kim, Juheon Lee
-
Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding
Jiaqi Tang, Jianmin Chen, Wei Wei, Xiaogang Xu, Runtao Liu, Xiangyu Wu, Qipeng Xie, Jiafei Wu, Lei Zhang, Qifeng Chen
-
Adversarial Robustness of Vision in Open Foundation Models
Jonathon Fox, William J Buchanan, Pavlos Papadopoulos
-
AdvJudge-Zero: Binary Decision Flips in LLM-as-a-Judge via Adversarial Control Tokens
Tung-Ling Li, Yuhao Wu, Hongliang Liu
-
EMMA: Concept Erasure Benchmark with Comprehensive Semantic Metrics and Diverse Categories
Lu Wei, Yuta Nakashima, Noa Garcia
-
Visually Prompted Benchmarks Are Surprisingly Fragile
Haiwen Feng, Long Lian, Lisa Dunlap, Jiahao Shu, XuDong Wang, Renhao Wang, Trevor Darrell, Alane Suhr, Angjoo Kanazawa
-
Adversarially Robust Detection of Harmful Online Content: A Computational Design Science Approach
Yidong Chai, Yi Liu, Mohammadreza Ebrahimi, Weifeng Li, Balaji Padmanabhan
-
DeepShare: Sharing ReLU Across Channels and Layers for Efficient Private Inference
Yonathan Bornfeld, Shai Avidan
-
Biosecurity-Aware AI: Agentic Risk Auditing of Soft Prompt Attacks on ESM-Based Variant Predictors
Huixin Zhan
-
Practical Framework for Privacy-Preserving and Byzantine-robust Federated Learning
Baolei Zhang, Minghong Fang, Zhuqing Liu, Biao Yi, Peizhao Zhou, Yuan Wang, Tong Li, Zheli Liu
-
Timely Information Updating for Mobile Devices Without and With ML Advice
Yu-Pin Hsu, Yi-Hsuan Tseng
-
Cryptanalysis of Pseudorandom Error-Correcting Codes
Tianrui Wang, Anyu Wang, Tianshuo Cong, Delong Ran, Jinyuan Liu, Xiaoyun Wang
-
Securing Agentic AI Systems -- A Multilayer Security Framework
Sunil Arora, John Hastings
-
Towards Benchmarking Privacy Vulnerabilities in Selective Forgetting with Large Language Models
Wei Qian, Chenxu Zhao, Yangyi Li, Mengdi Huai
-
PermuteV: A Performant Side-channel-Resistant RISC-V Core Securing Edge AI Inference
Nuntipat Narkthong, Xiaolin Xu
-
From Fake Focus to Real Precision: Confusion-Driven Adversarial Attention Learning in Transformers
Yawei Liu
-
Aniruddha Roy, Jyoti Patel, Aman Chadha, Vinija Jain, Amitava Das
-
StarCraft+: Benchmarking Multi-agent Algorithms in Adversary Paradigm
Yadong Li, Tong Zhang, Bo Huang, Zhen Cui
-
Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning
Qihao Liu, Luoxin Ye, Wufei Ma, Yu-Cheng Chou, Alan Yuille
-
Autoencoder-based Denoising Defense against Adversarial Attacks on Object Detection
Min Geun Song, Gang Min Kim, Woonmin Kim, Yongsik Kim, Jeonghyun Sim, Sangbeom Park, Huy Kang Kim
-
C-DGPA: Class-Centric Dual-Alignment Generative Prompt Adaptation
Chao Li, Dasha Hu, Chengyang Li, Yuming Jiang, Yuncheng Shen
-
Domain-Agnostic Causal-Aware Audio Transformer for Infant Cry Classification
Geofrey Owino, Bernard Shibwabo Kasamani, Ahmed M. Abdelmoniem, Edem Wornyo
-
Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks
Safwan Shaheer, G.M. Refatul Islam, Mohammad Rafid Hamid, Tahsin Zaman Jilan
-
Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation
Yuxuan Qiao, Dongqin Liu, Hongchang Yang, Wei Zhou, Songlin Hu
-
TTP: Test-Time Padding for Adversarial Detection and Robust Adaptation on Vision-Language Models
Zhiwei Li, Yitian Pang, Weining Wang, Zhenan Sun, Qi Li
-
Ripan Kumar Kundu, Istiak Ahmed, Khaza Anuarul Hoque
-
Pixel Seal: Adversarial-only training for invisible image and video watermarking
Tomáš Souček, Pierre Fernandez, Hady Elsahar, Sylvestre-Alvise Rebuffi, Valeriu Lacatusu, Tuan Tran, Tom Sander, Alexandre Mourachko
-
Hacking Neural Evaluation Metrics with Single Hub Text
Hiroyuki Deguchi, Katsuki Chousa, Yusuke Sakai
-
ContextLeak: Auditing Leakage in Private In-Context Learning Methods
Jacob Choi, Shuying Cao, Xingjian Dong, Wang Bill Zhu, Robin Jia, Sai Praneeth Karimireddy
-
Hao Li, Yubing Ren, Yanan Cao, Yingjie Li, Fang Fang, Shi Wang, Li Guo
-
Jiaheng Geng, Jiatong Du, Xinyu Zhang, Ye Li, Panqu Wang, Yanjun Huang
-
Pixel Super-Resolved Fluorescence Lifetime Imaging Using Deep Learning
Paloma Casteleiro Costa, Parnian Ghapandar Kashani, Xuhui Liu, Alexander Chen, Ary Portes, Julien Bec, Laura Marcu, Aydogan Ozcan
-
Adaptive Frequency Domain Alignment Network for Medical image segmentation
Zhanwei Li, Liang Li, Jiawan Zhang
-
DeContext as Defense: Safe Image Editing in Diffusion Transformers
Linghui Shen, Mingyue Cui, Xingyi Yang
-
Detecting Localized Deepfakes: How Well Do Synthetic Image Detectors Handle Inpainting?
Serafino Pandolfini, Lorenzo Pellegrini, Matteo Ferrara, Davide Maltoni
-
Dual-View Inference Attack: Machine Unlearning Amplifies Privacy Exposure
Lulu Xue, Shengshan Hu, Linqiang Qian, Peijin Guo, Yechao Zhang, Minghui Li, Yanjun Zhang, Dayong Ye, Leo Yu Zhang
-
Privacy Blur: Quantifying Privacy and Utility for Image Data Release
Saeed Mahloujifar, Narine Kokhlikyan, Chuan Guo, Kamalika Chaudhuri
-
In-Context Probing for Membership Inference in Fine-Tuned Language Models
Zhexi Lu, Hongliang Chi, Nathalie Baracaldo, Swanand Ravindra Kadhe, Yuseok Jeon, Lei Yu
-
A Systematic Study of Code Obfuscation Against LLM-based Vulnerability Detection
Xiao Li, Yue Li, Hao Wu, Yue Zhang, Yechao Zhang, Fengyuan Xu, Sheng Zhong
-
Empirical Evaluation of Structured Synthetic Data Privacy Metrics: Novel experimental framework
Milton Nicolás Plasencia Palacios, Alexander Boudewijn, Sebastiano Saccani, Andrea Filippo Ferraris, Diana Sofronieva, Giuseppe D'Acquisto, Filiberto Brozzetti, Daniele Panfilo, Luca Bortolussi
-
Security Risks of Agentic Vehicles: A Systematic Analysis of Cognitive and Cross-Layer Threats
Ali Eslami, Jiangbo Yu
-
MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval
Saksham Sahai Srivastava, Haoyu He
-
Istiak Ahmed, Ripan Kumar Kundu, Khaza Anuarul Hoque
-
Perturb Your Data: Paraphrase-Guided Training Data Watermarking
Pranav Shetty, Mirazul Haque, Petr Babkin, Zhiqiang Ma, Xiaomo Liu, Manuela Veloso
-
Privacy-Aware Sharing of Raw Spatial Sensor Data for Cooperative Perception
Bangya Liu, Chengpo Yan, Chenghao Jiang, Suman Banerjee, Akarsh Prabhakara
-
BitFlipScope: Scalable Fault Localization and Recovery for Bit-Flip Corruptions in LLMs
Muhammad Zeeshan Karamat, Sadman Saif, Christiana Chamon Garcia
-
SGM: Safety Glasses for Multimodal Large Language Models via Neuron-Level Detoxification
Hongbo Wang, MaungMaung AprilPyone, Isao Echizen
-
The Meta-Prompting Protocol: Orchestrating LLMs via Adversarial Feedback Loops
Fanzhe Fu
-
Quantifying Return on Security Controls in LLM Systems
Richard Helder Moulton, Austin O'Brien, John D. Hastings
-
Xuanjun Zong, Zhiqi Shen, Lei Wang, Yunshi Lan, Chao Yang
-
Quantum Machine Learning for Cybersecurity: A Taxonomy and Future Directions
Siva Sai, Ishika Goyal, Shubham Sharma, Sri Harshita Manuri, Vinay Chamola, Rajkumar Buyya
-
Adversarial versification in portuguese as a jailbreak operator in LLMs
Joao Queiroz
-
How Do Semantically Equivalent Code Transformations Impact Membership Inference on LLMs for Code?
Hua Yang, Alejandro Velasco, Thanh Le-Cong, Md Nazmul Haque, Bowen Xu, Denys Poshyvanyk
-
BashArena: A Control Setting for Highly Privileged AI Agents
Adam Kaufman, James Lucassen, Tyler Tracy, Cody Rushing, Aryan Bhatt
-
Robust and Calibrated Detection of Authentic Multimedia Content
Sarim Hashmi, Abdelrahman Elsayed, Mohammed Talha Alam, Samuele Poppi, Nils Lukas
-
CLIP-FTI: Fine-Grained Face Template Inversion via CLIP-Driven Attribute Conditioning
Longchen Dai, Zixuan Shen, Zhiheng Zhou, Peipeng Yu, Zhihua Xia
-
Mukur Gupta, Niharika Gupta, Saifur Rahman, Shantanu Pal, Chandan Karmakar
-
An Efficient Gradient-Based Inference Attack for Federated Learning
Pablo Montaña-Fernández, Ines Ortega-Fernandez
-
Bits for Privacy: Evaluating Post-Training Quantization via Membership Inference
Chenxiang Zhang, Tongxi Qu, Zhong Li, Tian Zhang, Jun Pang, Sjouke Mauw
-
Xiangrui Xu, Zhize Li, Yufei Han, Bin Wang, Jiqiang Liu, Wei Wang
-
Adrián Detavernier, Jasper De Bock
-
Time-Varying Audio Effect Modeling by End-to-End Adversarial Training
Yann Bourdin, Pierrick Legrand, Fanny Roche
-
Ratang Sedimo, Ivoline C. Ngong, Jami Lashua, Joseph P. Near
-
Vahideh Zolfaghari
-
The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs
Tejas Anvekar, Fenil Bardoliya, Pavan K. Turaga, Chitta Baral, Vivek Gupta
-
MCR-VQGAN: A Scalable and Cost-Effective Tau PET Synthesis Approach for Alzheimer's Disease Imaging
Jin Young Kim, Jeremy Hudson, Jeongchul Kim, Qing Lyu, Christopher T. Whitlow
-
Unveiling the Attribute Misbinding Threat in Identity-Preserving Models
Junming Fu, Jishen Zeng, Yi Jiang, Peiyu Zhuang, Baoying Chen, Siyu Lu, Jianquan Yang
-
The Semantic Illusion: Certified Limits of Embedding-Based Hallucination Detection in RAG Systems
Debu Sinha
-
Seeing Beyond the Scene: Analyzing and Mitigating Background Bias in Action Recognition
Ellie Zhou, Jihoon Chung, Olga Russakovsky
-
ArcGen: Generalizing Neural Backdoor Detection Across Diverse Architectures
Zhonghao Yang, Cheng Luo, Daojing He, Yiming Li, Yu Li
-
Jiesong Lian, Ruizhe Zhong, Zixiang Zhou, Xiaoyue Mi, Yixue Hao, Yuan Zhou, Qinglin Lu, Long Hu, Junchi Yan
-
IntentMiner: Intent Inversion Attack via Tool Call Analysis in the Model Context Protocol
Yunhao Yao, Zhiqiang Wang, Haoran Cheng, Yihang Cheng, Haohua Du, Xiang-Yang Li
-
Semantic Mismatch and Perceptual Degradation: A New Perspective on Image Editing Immunity
Shuai Dong, Jie Zhang, Guoying Zhao, Shiguang Shan, Xilin Chen
-
Dual Attention Guided Defense Against Malicious Edits
Jie Zhang, Shuai Dong, Shiguang Shan, Xilin Chen
-
Towards Transferable Defense Against Malicious Image Edits
Jie Zhang, Shuai Dong, Shiguang Shan, Xilin Chen
-
Xingfu Zhou, Pengfei Wang
-
Erasing CLIP Memories: Non-Destructive, Data-Free Zero-Shot class Unlearning in CLIP Models
Ashish Mishra, Tarun Kumar, Gyanaranjan Nayak, Arpit Shah, Suparna Bhattacharya, Martin Foltin
-
CIS-BA: Continuous Interaction Space Based Backdoor Attack for Object Detection in the Real-World
Shuxin Zhao, Bo Lang, Nan Xiao, Yilang Zhang
-
Mimicking Human Visual Development for Learning Robust Image Representations
Ankita Raj, Kaashika Prajaapat, Tapan Kumar Gandhi, Chetan Arora
-
LCMem: A Universal Model for Robust Image Memorization Detection
Mischa Dombrowski, Felix Nützel, Bernhard Kainz
-
Yiheng Huang, Junhong Chen, Anqi Ning, Zhanhong Liang, Nick Michiels, Luc Claesen, Wenyin Liu
-
On Improving Deep Active Learning with Formal Verification
Jonathan Spiegelman, Guy Amir, Guy Katz
-
Optimizing the Adversarial Perturbation with a Momentum-based Adaptive Matrix
Wei Tao, Sheng Long, Xin Liu, Wei Li, Qing Tao
-
Black-Box Auditing of Quantum Model: Lifted Differential Privacy with Quantum Canaries
Baobao Song, Shiva Raj Pokhrel, Athanasios V. Vasilakos, Tianqing Zhu, Gang Li
-
PerProb: Indirectly Evaluating Memorization in Large Language Models
Yihan Liao, Jacky Keung, Xiaoxue Ma, Jingyu Zhang, Yicheng Sun
-
Unai Laskurain, Aitor Aguirre-Ortuzar, Urko Zurutuza
-
Penetration Testing of Agentic AI: A Comparative Security Analysis Across Models and Frameworks
Viet K. Nguyen, Mohammad I. Husain
-
ComMark: Covert and Robust Black-Box Model Watermarking with Compressed Samples
Yunfei Yang, Xiaojun Chen, Zhendong Zhao, Yu Zhou, Xiaoyan Gu, Juan Cao
-
Cybercrime and Computer Forensics in Epoch of Artificial Intelligence in India
Sahibpreet Singh, Shikha Dhiman
-
CTIGuardian: A Few-Shot Framework for Mitigating Privacy Leakage in Fine-Tuned LLMs
Shashie Dilhara Batan Arachchige, Benjamin Zi Hao Zhao, Hassan Jameel Asghar, Dinusha Vatsalan, Dali Kaafar
-
Cisco Integrated AI Security and Safety Framework Report
Amy Chang, Tiffany Saade, Sanket Mendapara, Adam Swanda, Ankit Garg
-
Tackling Snow-Induced Challenges: Safe Autonomous Lane-Keeping with Robust Reinforcement Learning
Amin Jalal Aghdasian, Farzaneh Abdollahi, Ali Kamali Iglie
-
Calibrating Uncertainty for Zero-Shot Adversarial CLIP
Wenjing lu, Zerui Tao, Dongping Zhang, Yuning Qiu, Yang Yang, Qibin Zhao
-
SSAS: Cross-subject EEG-based Emotion Recognition through Source Selection with Adversarial Strategy
Yici Liu, Qi Wei Oung, Hoi Leong Lee
-
Behavior-Aware and Generalizable Defense Against Black-Box Adversarial Attacks for ML-Based IDS
Sabrine Ennaji, Elhadj Benkhelifa, Luigi Vincenzo Mancini
-
Leonard Bereska, Zoe Tzifa-Kratira, Reza Samavi, Efstratios Gavves
-
Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation
Richard J. Young
-
On the Effectiveness of Membership Inference in Targeted Data Extraction from Large Language Models
Ali Al Sahili, Ali Chehab, Razane Tajeddine
-
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?
Jiaqi Wang, Weijia Wu, Yi Zhan, Rui Zhao, Ming Hu, James Cheng, Wei Liu, Philip Torr, Kevin Qinghong Lin
-
Learning to Generate Cross-Task Unexploitable Examples
Haoxuan Qu, Qiuchi Xiang, Yujun Cai, Yirui Wu, Majid Mirmehdi, Hossein Rahmani, Jun Liu
-
Test-Time Modification: Inverse Domain Transformation for Robust Perception
Arpit Jadon, Joshua Niemeijer, Yuki M. Asano
-
Evaluating Adversarial Attacks on Federated Learning for Temperature Forecasting
Karina Chichifoi, Fabio Merizzi, Michele Colajanni
-
Dual-Phase Federated Deep Unlearning via Weight-Aware Rollback and Reconstruction
Changjun Zhou, Jintao Zheng, Leyou Yang, Pengfei Wang
-
Chethana Prasad Kabgere, Shylaja S S
-
Async Control: Stress-testing Asynchronous Control Measures for LLM Agents
Asa Cooper Stickland, Jan Michelfeit, Arathi Mani, Charlie Griffin, Ollie Matthews, Tomek Korbak, Rogan Inglis, Oliver Makins, Alan Cooney
-
Less Is More: Sparse and Cooperative Perturbation for Point Cloud Attacks
Keke Tang, Tianyu Hao, Xiaofei Wang, Weilong Peng, Denghui Zhang, Peican Zhu, Zhihong Tian
-
MURIM: Multidimensional Reputation-based Incentive Mechanism for Federated Learning
Sindhuja Madabushi, Dawood Wasif, Jin-Hee Cho
-
The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces
Subramanyam Sahoo, Jared Junkin
-
Topologically-Stabilized Graph Neural Networks: Empirical Robustness Across Domains
Jelena Losic
-
Stability-Drift Early Warning for Cyber-Physical Systems Under Degradation Attacks
Daniyal Ganiuly, Nurzhau Bolatbek, Assel Smaiyl
-
Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures (XAMT)
Akhil Sharma, Shaikh Yaser Arafat, Jai Kumar Sharma, Ken Huang
-
PHANTOM: PHysical ANamorphic Threats Obstructing Connected Vehicle Mobility
Md Nahid Hasan Shuvo, Moinul Hossain
-
Saad Alqithami
-
Detecting Prompt Injection Attacks Against Application Using Classifiers
Safwan Shaheer, G. M. Refatul Islam, Mohammad Rafid Hamid, Md. Abrar Faiaz Khan, Md. Omar Faruk, Yaseen Nur
-
PRIVEE: Privacy-Preserving Vertical Federated Learning Against Feature Inference Attacks
Sindhuja Madabushi, Ahmad Faraz Khan, Haider Ali, Ananthram Swami, Rui Ning, Hongyi Wu, Jin-Hee Cho
-
StegaVAR: Privacy-Preserving Video Action Recognition via Steganographic Domain Analysis
Lixin Chen, Chaomeng Chen, Jiale Zhou, Zhijian Wu, Xun Lin
-
GradID: Adversarial Detection via Intrinsic Dimensionality of Gradients
Mohammad Mahdi Razmjoo, Mohammad Mahdi Sharifian, Saeed Bagheri Shouraki
-
Animesh Mishra
-
Iterative Sampling Methods for Sinkhorn Distributionally Robust Optimization
Jie Wang
-
Ahmed Ryan, Junaid Mansur Ifti, Md Erfan, Akond Ashfaque Ur Rahman, Md Rayhanur Rahman
-
The Laminar Flow Hypothesis: Detecting Jailbreaks via Semantic Turbulence in Large Language Models
Md. Hasib Ur Rahman
-
One Leak Away: How Pretrained Model Exposure Amplifies Jailbreak Risks in Finetuned LLMs
Yixin Tan, Zhe Yu, Jun Sakuma
-
Samruddhi Baviskar
-
Auto-Tuning Safety Guardrails for Black-Box Large Language Models
Perry Abdulkadir
-
RAMBO: Reliability Analysis for Mamba through Bit-flip attack Optimization
Sanjay Das, Swastik Bhattacharya, Shamik Kundu, Arnab Raha, Souvik Kundu, Kanad Basu
-
Feeling the Strength but Not the Source: Partial Introspection in LLMs
Ely Hahami, Lavik Jain, Ishaan Sinha
-
Dynamic Homophily with Imperfect Recall: Modeling Resilience in Adversarial Networks
Saad Alqithami
-
Eventually LIL Regret: Almost Sure $\ln\ln T$ Regret for a sub-Gaussian Mixture on Unbounded Data
Shubhada Agrawal, Aaditya Ramdas
-
Hua Ma, Ruoxi Sun, Minhui Xue, Xingliang Yuan, Carsten Rudolph, Surya Nepal, Ling Liu
-
Hellinger loss function for Generative Adversarial Networks
Giovanni Saraceno, Anand N. Vidyashankar, Claudio Agostinelli
-
Minfeng Qi, Qin Wang, Ruiqiang Li, Tianqing Zhu, Shiping Chen
-
Sim2Real Reinforcement Learning for Soccer skills
Jonathan Spraggett
-
Towards Privacy-Preserving Code Generation: Differentially Private Code Language Models
Melih Catal, Pooja Rani, Harald C. Gall
-
Björn Deiseroth, Max Henning Höth, Kristian Kersting, Letitia Parcalabescu
-
Smudged Fingerprints: A Systematic Evaluation of the Robustness of AI Image Fingerprints
Kai Yao, Marc Juarez
-
Super Suffixes: Bypassing Text Generation Alignment and Guard Models Simultaneously
Andrew Adiletta, Kathryn Adiletta, Kemal Derya, Berk Sunar
-
CLINIC: Evaluating Multilingual Trustworthiness in Language Models for Healthcare
Akash Ghosh, Srivarshinee Sridhar, Raghav Kaushik Ravi, Muhsin Muhsin, Sriparna Saha, Chirag Agarwal
-
Beyond Memorization: Gradient Projection Enables Selective Learning in Diffusion Models
Divya Kothandaraman, Jaclyn Pytlarz
-
CAT: Can Trust be Predicted with Context-Awareness in Dynamic Heterogeneous Networks?
Jie Wang, Zheng Yan, Jiahe Lan, Xuyan Li, Elisa Bertino
-
Attacking and Securing Community Detection: A Game-Theoretic Framework
Yifan Niu, Aochuan Chen, Tingyang Xu, Jia Li
-
SpectralKrum: A Spectral-Geometric Defense Against Byzantine Attacks in Federated Learning
Aditya Tripathi, Karan Sharma, Rahul Mishra, Tapas Kumar Maiti
-
Peichun Hua, Hao Li, Shanghao Shi, Zhiyuan Yu, Ning Zhang
-
Neural Chameleons: Language Models Can Learn to Hide Their Thoughts from Unseen Activation Monitors
Max McGuinness, Alex Serrano, Luke Bailey, Scott Emmons
-
Kaichuang Zhang, Wei Yin, Jinghao Yang, Ping Xu
-
CLOAK: Contrastive Guidance for Latent Diffusion-Based Data Obfuscation
Xin Yang, Omid Ardakanian
-
Adversarial Attacks Against Deep Learning-Based Radio Frequency Fingerprint Identification
Jie Ma, Junqing Zhang, Guanxiong Shen, Alan Marshall, Chip-Hong Chang
-
Junling Fan, George Rushevich, Giorgio Rusconi, Mengdi Zhu, Reiner Dizon-Paradis, Domenic Forte
-
Persistent Backdoor Attacks under Continual Fine-Tuning of LLMs
Jing Cui, Yufei Han, Jianbin Jiao, Junge Zhang
-
Factor(U,T): Controlling Untrusted AI by Monitoring their Plans
Edward Lue Chee Lip, Anthony Channg, Diana Kim, Aaron Sandoval, Kevin Zhu
-
PHANTOM: Progressive High-fidelity Adversarial Network for Threat Object Modeling
Jamal Al-Karaki, Muhammad Al-Zafar Khan, Rand Derar Mohammad Al Athamneh
-
Data-Chain Backdoor: Do You Trust Diffusion Models as Generative Data Supplier?
Junchi Lu, Xinke Li, Yuheng Liu, Qi Alfred Chen
-
Wenhan Wu, Zhili He, Huanghuang Liang, Yili Gong, Jiawei Jiang, Chuang Hu, Dazhao Cheng
-
Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention
Yang Yu, Zhuangzhuang Chen, Siqi Wang, Lanqing Li, Xiaomeng Li
-
Targeted Data Protection for Diffusion Model by Matching Training Trajectory
Hojun Lee, Mijin Koo, Yeji Song, Nojun Kwak
-
Devanshu Sahoo, Manish Prasad, Vasudev Majhi, Jahnvi Singh, Vinay Chamola, Yash Sinha, Murari Mandal, Dhruv Kumar
-
CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models
Tong Zhang, Carlos Hinojosa, Bernard Ghanem
-
FLARE: A Wireless Side-Channel Fingerprinting Attack on Federated Learning
Md Nahid Hasan Shuvo, Moinul Hossain, Anik Mallik, Jeffrey Twigg, Fikadu Dagefu
-
A Privacy-Preserving Cloud Architecture for Distributed Machine Learning at Scale
Vinoth Punniyamoorthy, Ashok Gadi Parthi, Mayilsamy Palanigounder, Ravi Kiran Kodali, Bikesh Kumar, Kabilan Kannan
-
Yash Srivastava, Shalin Jain, Sneha Awathare, Nitin Awathare
-
The Eminence in Shadow: Exploiting Feature Boundary Ambiguity for Robust Backdoor Attacks
Zhou Feng, Jiahao Chen, Chunyi Zhou, Yuwen Pu, Tianyu Du, Jinbao Li, Jianhai Chen, Shouling Ji
-
How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation
Devanshu Sahoo, Vasudev Majhi, Arjun Neekhra, Yash Sinha, Murari Mandal, Dhruv Kumar
-
UACER: An Uncertainty-Aware Critic Ensemble Framework for Robust Adversarial Reinforcement Learning
Jiaxi Wu, Tiantian Zhang, Yuxing Wang, Yongzhe Chang, Xueqian Wang
-
Empirical evaluation of the Frank-Wolfe methods for constructing white-box adversarial attacks
Kristina Korotkova, Aleksandr Katrutsa
-
Agniva Maiti, Prajwal Panth, Suresh Chandra Satapathy
-
Watermarks for Language Models via Probabilistic Automata
Yangkun Wang, Jingbo Shang
-
Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation
Hongsin Lee, Hye Won Chung
-
Authority Backdoor: A Certifiable Backdoor Mechanism for Authoring DNNs
Han Yang, Shaofeng Li, Tian Dong, Xiangyu Xu, Guangchi Liu, Zhen Ling
-
Neha, Tarunpreet Bhatia
-
Virtual camera detection: Catching video injection attacks in remote biometric systems
Daniyar Kurmankhojayev, Andrei Shadrikov, Dmitrii Gordin, Mikhail Shkorin, Danijar Gabdullin, Aigerim Kambetbayeva, Kanat Kuatov
-
PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents
Yuqun Zhang, Yuxuan Zhao, Sijia Chen
-
FBA$^2$D: Frequency-based Black-box Attack for AI-generated Image Detection
Xiaojing Chen, Dan Li, Lijun Peng, Jun YanŁetter, Zhiqing Guo, Junyang Chen, Xiao Lan, Zhongjie Ba, Yunfeng DiaoŁetter
-
Privacy-Preserving Computer Vision for Industry: Three Case Studies in Human-Centric Manufacturing
Sander De Coninck, Emilio Gamba, Bart Van Doninck, Abdellatif Bey-Temsamani, Sam Leroux, Pieter Simoens
-
Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs
Jan Betley, Jorio Cocola, Dylan Feng, James Chua, Andy Arditi, Anna Sztyber-Betley, Owain Evans
-
MedForget: Hierarchy-Aware Multimodal Unlearning Testbed for Medical AI
Fengli Wu, Vaidehi Patil, Jaehong Yoon, Yue Zhang, Mohit Bansal
-
FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning
Khurram Khalil, Khaza Anuarul Hoque
-
Unconsciously Forget: Mitigating Memorization; Without Knowing What is being Memorized
Er Jin, Yang Zhang, Yongli Mou, Yanfei Dong, Stefan Decker, Kenji Kawaguchi, Johannes Stegmaier
-
A Distributed Framework for Privacy-Enhanced Vision Transformers on the Edge
Zihao Ding, Mufeng Zhu, Zhongze Tang, Sheng Wei, Yao Liu
-
Goal inference with Rao-Blackwellized Particle Filters
Yixuan Wang, Dan P. Guralnik, Warren E. Dixon
-
Black-Box Behavioral Distillation Breaks Safety Alignment in Medical LLMs
Sohely Jahan, Ruimin Sun
-
Membership and Dataset Inference Attacks on Large Audio Generative Models
Jakub Proboszcz, Paweł Kochanski, Karol Korszun, Donato Crisostomi, Giorgio Strano, Emanuele Rodolà, Kamil Deja, Jan Dubinski
-
Robust and Sparse Estimation of Unbounded Density Ratio under Heavy Contamination
Ryosuke Nagumo, Hironori Fujisawa
-
Weiyi He, Yue Xing
-
Estimation of Stochastic Optimal Transport Maps
Sloan Nietert, Ziv Goldfeld
-
ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data
Ruiqi Wang, Yuqi Jia, Neil Zhenqiang Gong
-
Reference Recommendation based Membership Inference Attack against Hybrid-based Recommender Systems
Xiaoxiao Chi, Xuyun Zhang, Yan Wang, Hongsheng Hu, Wanchun Dou
-
ByteShield: Adversarially Robust End-to-End Malware Detection through Byte Masking
Daniel Gibert, Felip Manyà
-
Robust AI Security and Alignment: A Sisyphean Endeavor?
Apostol Vassilev
-
Zhongjie Jiang
-
Unforgotten Safety: Preserving Safety Alignment of Large Language Models with Continual Learning
Lama Alssum, Hani Itani, Hasan Abed Al Kader Hammoud, Philip Torr, Adel Bibi, Bernard Ghanem
-
TDC-Cache: A Trustworthy Decentralized Cooperative Caching Framework for Web3.0
Jinyu Chen, Long Shi, Taotao Wang, Jiaheng Wang, Wei Zhang
-
LLM-PEA: Leveraging Large Language Models Against Phishing Email Attacks
Najmul Hassan, Prashanth BusiReddyGari, Haitao Zhao, Yihao Ren, Jinsheng Xu, Shaohu Zhang
-
SCOUT: A Defense Against Data Poisoning Attacks in Fine-Tuned Language Models
Mohamed Afane, Abhishek Satyam, Ke Chen, Tao Li, Junaid Farooq, Juntao Chen
-
Futa Waseda, Shojiro Yamabe, Daiki Shiono, Kento Sasaki, Tsubasa Takahashi
-
Yong-Woon Kim
-
Yiming Lu
-
Jinghao Wang, Ping Zhang, Carter Yagemann
-
Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem
Shiva Gaire, Srijan Gyawali, Saroj Mishra, Suman Niroula, Dilip Thakur, Umesh Yadav
-
Interpreting Structured Perturbations in Image Protection Methods for Diffusion Models
Michael R. Martin, Garrick Chan, Kwan-Liu Ma
-
Waleed Razzaq, Yun-Bo Zhao
-
Jiaming Zhang, Che Wang, Yang Cao, Longtao Huang, Wei Yang Bryan Lim
-
A Novel Wasserstein Quaternion Generative Adversarial Network for Color Image Generation
Zhigang Jia, Duan Wang, Hengkai Wang, Yajun Xie, Meixiang Zhao, Xiaoyu Zhao
-
Yi Liu, Weixiang Han, Chengjun Cai, Xingliang Yuan, Cong Wang
-
Differentially Private Synthetic Data Generation Using Context-Aware GANs
Anantaa Kotal, Anupam Joshi
-
Fed-SE: Federated Self-Evolution for Privacy-Constrained Multi-Environment LLM Agents
Xiang Chen, Yuling Shi, Qizhen Lan, Yuchao Qiu, Xiaodong Gu
-
When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation
Joshua Ward, Bochao Gu, Chi-Hua Wang, Guang Cheng
-
Universal Adversarial Suffixes Using Calibrated Gumbel-Softmax Relaxation
Sampriti Soor, Suklav Ghosh, Arijit Sur
-
Sampriti Soor, Suklav Ghosh, Arijit Sur
-
Keito Inoshita
-
Improving the Sensitivity of Backdoor Detectors via Class Subspace Orthogonalization
Guangmingmei Yang, David J. Miller, George Kesidis
-
Robust Agents in Open-Ended Worlds
Mikayel Samvelyan
-
Fully Decentralized Certified Unlearning
Hithem Lamri, Michail Maniatakos
-
Optimal Perturbation Budget Allocation for Data Poisoning in Offline Reinforcement Learning
Junnan Qiu, Jie Li
-
Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search
Manos Plitsis, Giorgos Bouritsas, Vassilis Katsouros, Yannis Panagakis
-
Forecasting Fails: Unveiling Evasion Attacks in Weather Prediction Models
Huzaifa Arif, Pin-Yu Chen, Alex Gittens, James Diffenderfer, Bhavya Kailkhura
-
Worst-case generation via minimax optimization in Wasserstein space
Xiuyuan Cheng, Yao Xie, Linglingzhi Zhu, Yunqin Zhu
-
Multi-Agent Deep Reinforcement Learning for Collaborative UAV Relay Networks under Jamming Atatcks
Thai Duong Nguyen, Ngoc-Tan Nguyen, Thanh-Dao Nguyen, Nguyen Van Huynh, Dinh-Hieu Tran, Symeon Chatzinotas
-
Secure and Privacy-Preserving Federated Learning for Next-Generation Underground Mine Safety
Mohamed Elmahallawy, Sanjay Madria, Samuel Frimpong
-
MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks
Tailun Chen, Yu He, Yan Wang, Shuo Shao, Haolun Zheng, Zhihao Liu, Jinfeng Li, Yuefeng Chen, Zhixuan Chu, Zhan Qin
-
Exposing and Defending Membership Leakage in Vulnerability Prediction Models
Yihan Liao, Jacky Keung, Xiaoxue Ma, Jingyu Zhang, Yicheng Sun
-
Developing a Strong CPS Defender: An Evolutionary Approach
Qingyuan Hu, Christopher M. Poskitt, Jun Sun, Yuqi Chen
-
Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs
Yinan Zhong, Qianhao Miao, Yanjiao Chen, Jiangyi Deng, Yushi Cheng, Wenyuan Xu
-
WOLF: Werewolf-based Observations for LLM Deception and Falsehoods
Mrinal Agarwal, Saad Rana, Theo Sundoro, Hermela Berhe, Spencer Kim, Vasu Sharma, Sean O'Brien, Kevin Zhu
-
Natural Geometry of Robust Data Attribution: From Convex Models to Deep Networks
Shihao Li, Jiachen Li, Dongmei Chen
-
Anirudh Nakra, Nayeeb Rashid, Chau-Wai Wong, Min Wu
-
ZK-APEX: Zero-Knowledge Approximate Personalized Unlearning with Executable Proofs
Mohammad M Maheri, Sunil Cotterill, Alex Davidson, Hamed Haddadi
-
How a Bit Becomes a Story: Semantic Steering via Differentiable Fault Injection
Zafaryab Haider, Md Hafizur Rahman, Shane Moeykens, Vijay Devabhaktuni, Prabuddha Chakraborty
-
Hybrid Attribution Priors for Explainable and Robust Model Training
Zhuoran Zhang, Feng Zhang, Shangyuan Li, Yang Shi, Yuanxing Zhang, Wei Chen, Tengjiao Wang, Kam-Fai Wong
-
HarmTransform: Transforming Explicit Harmful Queries into Stealthy via Multi-Agent Debate
Shenzhe Zhu
-
Understanding Privacy Risks in Code Models Through Training Dynamics: A Causal Approach
Hua Yang, Alejandro Velasco, Sen Fang, Bowen Xu, Denys Poshyvanyk
-
Pay Less Attention to Function Words for Free Robustness of Vision-Language Models
Qiwei Tian, Chenhao Lin, Zhengyu Zhao, Chao Shen
-
CAMO: Causality-Guided Adversarial Multimodal Domain Generalization for Crisis Classification
Pingchuan Ma, Chengshuai Zhao, Bohan Jiang, Saketh Vishnubhatla, Ujun Jeong, Alimohammad Beigi, Adrienne Raglin, Huan Liu
-
AgentCrypt: Advancing Privacy and (Secure) Computation in AI Agent Collaboration
Harish Karthikeyan, Yue Guo, Leo de Castro, Antigoni Polychroniadou, Leo Ardon, Udari Madhushani Sehwag, Sumitra Ganesh, Manuela Veloso
-
Optimization-Guided Diffusion for Interactive Scene Generation
Shihao Li, Naisheng Ye, Tianyu Li, Kashyap Chitta, Tuo An, Peng Su, Boyang Wang, Haiou Liu, Chen Lv, Hongyang Li
-
Xiqiao Xiong, Ouxiang Li, Zhuo Liu, Moxin Li, Wentao Shi, Fuli Feng, Xiangnan He
-
Auditing Games for Sandbagging
Jordan Taylor, Sid Black, Dillon Bowen, Thomas Read, Satvik Golechha, Alex Zelenka-Martin, Oliver Makins, Connor Kissane, Kola Ayonrinde, Jacob Merizian, Samuel Marks, Chris Cundy, Joseph Bloom
-
ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking
Yunzhe Li, Jianan Wang, Hongzi Zhu, James Lin, Shan Chang, Minyi Guo
-
Towards Robust Protective Perturbation against DeepFake Face Swapping
Hengyang Yao, Lin Li, Ke Sun, Jianing Qiu, Huiping Chen
-
When normalization hallucinates: unseen risks in AI-powered whole slide image processing
Karel Moens, Matthew B. Blaschko, Tinne Tuytelaars, Bart Diricx, Jonas De Vylder, Mustafa Yousif
-
Forget and Explain: Transparent Verification of GNN Unlearning
Imran Ahsan (1), Hyunwook Yu (2), Jinsung Kim (2), Mucheol Kim (2) ((1) Department of Smart Cities, Chung-Ang University, (2) Department of Computer Science and Engineering, Chung-Ang University)
-
Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models
Kassoum Sanogo, Renzo Ardiccioni
-
Richard Young
-
Fenghua Weng, Chaochao Lu, Xia Hu, Wenqi Shao, Wenjie Wang
-
Siyuan Xu, Yibing Liu, Peilin Chen, Yung-Hui Li, Shiqi Wang, Sam Kwong
-
Ziming Hong, Tianyu Huang, Runnan Chen, Shanshan Ye, Mingming Gong, Bo Han, Tongliang Liu
-
How Far are Modern Trackers from UAV-Anti-UAV? A Million-Scale Benchmark and New Baseline
Chunhui Zhang, Li Liu, Zhipeng Zhang, Yong Wang, Hao Wen, Xi Zhou, Shiming Ge, Yanfeng Wang
-
Chih-Chung Hsu, Shao-Ning Chen, Chia-Ming Lee, Yi-Fang Wang, Yi-Shiuan Chou
-
Zhibo Liang, Tianze Hu, Zaiye Chen, Mingjie Tang
-
GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering
Jehyeok Yeon, Federico Cinus, Yifan Wu, Luca Luceri
-
Rethinking Robustness: A New Approach to Evaluating Feature Attribution Methods
Panagiota Kiourti, Anu Singh, Preeti Duraipandian, Weichao Zhou, Wenchao Li
-
RDSplat: Robust Watermarking Against Diffusion Editing for 3D Gaussian Splatting
Longjie Zhao, Ziming Hong, Zhenyang Ren, Runnan Chen, Mingming Gong, Tongliang Liu
-
SoK: Trust-Authorization Mismatch in LLM Agent Interactions
Guanquan Shi, Haohua Du, Zhiqiang Wang, Xiaoyu Liang, Weiwenpei Liu, Song Bian, Zhenyu Guan
-
FVA-RAG: Falsification-Verification Alignment for Mitigating Sycophantic Hallucinations
Mayank Ravishankara
-
George Mikros
-
Look Twice before You Leap: A Rational Agent Framework for Localized Adversarial Anonymization
Donghang Duan, Xu Zheng, Yuefeng He, Chong Mu, Leyi Cai, Lizong Zhang
-
MATEX: A Multi-Agent Framework for Explaining Ethereum Transactions
Zifan Peng
-
RunawayEvil: Jailbreaking the Image-to-Video Generative Models
Songping Wang, Rufan Qian, Yueming Lyu, Qinglong Liu, Linzhuang Zou, Jie Qin, Songhua Liu, Caifeng Shan
-
Toward Reliable Machine Unlearning: Theory, Algorithms, and Evaluation
Ali Ebrahimpour-Boroojeny
-
Metaphor-based Jailbreaking Attacks on Text-to-Image Models
Chenyu Zhang, Yiwen Ma, Lanjun Wang, Wenhui Li, Yi Tu, An-An Liu
-
Protecting Bystander Privacy via Selective Hearing in Audio LLMs
Xiao Zhan, Guangzhi Sun, Jose Such, Phil Woodland
-
Delete and Retain: Efficient Unlearning for Document Classification
Aadya Goel, Mayuri Sridhar
-
Degrading Voice: A Comprehensive Overview of Robust Voice Conversion Through Input Manipulation
Xining Song, Zhihua Wei, Rui Wang, Haixiao Hu, Yanxiang Chen, Meng Han
-
Web Technologies Security in the AI Era: A Survey of CDN-Enhanced Defenses
Mehrab Hosain, Sabbir Alom Shuvo, Matthew Ogbe, Md Shah Jalal Mazumder, Yead Rahman, Md Azizul Hakim, Anukul Pandey
-
Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks
Saeid Jamshidi, Kawser Wazed Nafi, Arghavan Moradi Dakhel, Negar Shahabi, Foutse Khomh, Naser Ezzati-Jivan
-
Spoofing-aware Prompt Learning for Unified Physical-Digital Facial Attack Detection
Jiabao Guo, Yadian Wang, Hui Ma, Yuhao Fu, Ju Jia, Hui Liu, Shengeng Tang, Lechao Cheng, Yunfeng Diao, Ajian Liu
-
AGORA: Adversarial Generation Of Real-time Animatable 3D Gaussian Head Avatars
Ramazan Fazylov, Sergey Zagoruyko, Aleksandr Parkin, Stamatis Lefkimmiatis, Ivan Laptev
-
OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation
Xiaojun Jia, Jie Liao, Qi Guo, Teng Ma, Simeng Qin, Ranjie Duan, Tianlin Li, Yihao Huang, Zhitao Zeng, Dongxian Wu, Yiming Li, Wenqi Ren, Xiaochun Cao, Yang Liu
-
Quantization Blindspots: How Model Compression Breaks Backdoor Defenses
Rohan Pandey, Eric Ye
-
Privacy Loss of Noise Perturbation via Concentration Analysis of A Product Measure
Shuainan Liu, Tianxi Ji, Zhongshuo Fang, Lu Wei, Pan Li
-
Mitigating Self-Preference by Authorship Obfuscation
Taslim Mahbub, Shi Feng
-
Hua Wang, Jinghao Lu, Fan Zhang
-
Matching Ranks Over Probability Yields Truly Deep Safety Alignment
Jason Vega, Gagandeep Singh
-
Sadat Shahriar, Navid Ayoobi, Arjun Mukherjee, Mostafa Musharrat, Sai Vishnu Vamsi
-
Experts-Guided Unbalanced Optimal Transport for ISP Learning from Unpaired and/or Paired Data
Georgy Perevozchikov, Nancy Mehta, Egor Ershov, Radu Timofte
-
VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack
Shiji Zhao, Shukun Xiong, Yao Huang, Yan Jin, Zhenyu Wu, Jiyang Guan, Ranjie Duan, Jialing Tao, Hui Xue, Xingxing Wei
-
Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation
Fabian Konstantinidis, Moritz Sackmann, Ulrich Hofmann, Christoph Stiller
-
Taxonomy-Adaptive Moderation Model with Robust Guardrails for Large Language Models
Mahesh Kumar Nandwana, Youngwan Lim, Joseph Liu, Alex Yang, Varun Notibala, Nishchaie Khanna
-
Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs
Igor Shilov, Alex Cloud, Aryo Pradipta Gema, Jacob Goldman-Wetzler, Nina Panickssery, Henry Sleight, Erik Jones, Cem Anil
-
LDLT $\mathcal{L}$-Lipschitz Network: Generalized Deep End-To-End Lipschitz Network Construction
Marius F.R. Juston, Ramavarapu S. Sreenivas, Dustin Nottage, Ahmet Soylemezoglu
-
On the Bayes Inconsistency of Disagreement Discrepancy Surrogates
Neil G. Marchant, Andrew C. Cullen, Feng Liu, Sarah M. Erfani
-
PrivCode: When Code Generation Meets Differential Privacy
Zheng Liu, Chen Gong, Terry Yue Zhuo, Kecen Li, Weichen Yu, Matt Fredrikson, Tianhao Wang
-
TeleAI-Safety: A comprehensive LLM jailbreaking benchmark towards attacks, defenses, and evaluations
Xiuyuan Chen, Jian Zhao, Yuxiang He, Yuan Xun, Xinwei Liu, Yanshu Li, Huilin Zhou, Wei Cai, Ziyan Shi, Yuchen Yuan, Tianle Zhang, Chi Zhang, Xuelong Li
-
Ana-Maria Cretu, Klim Kireev, Amro Abdalla, Wisdom Obinna, Raphael Meier, Sarah Adel Bargal, Elissa M. Redmiles, Carmela Troncoso
-
Weikai Lu, Ziqian Zeng, Kehua Zhang, Haoran Li, Huiping Zhuang, Ruidong Wang, Cen Chen, Hao Peng
-
Explainable Adversarial-Robust Vision-Language-Action Model for Robotic Manipulation
Ju-Young Kim, Ji-Hong Park, Myeongjun Kim, Gun-Woo Kim
-
Safe2Harm: Semantic Isomorphism Attacks for Jailbreaking Large Language Models
Fan Yang
-
Auto-SPT: Automating Semantic Preserving Transformations for Code
Ashish Hooda, Mihai Christodorescu, Chuangang Ren, Aaron Wilson, Kassem Fawaz, Somesh Jha
-
When Privacy Isn't Synthetic: Hidden Data Leakage in Generative AI Models
S.M. Mustaqim, Anantaa Kotal, Paul H. Yi
-
Qilin Zhou, Zhengyuan Wei, Haipeng Wang, Zhuo Wang, W.K. Chan
-
Sheng Liu, Panos Papadimitratos
-
SPOOF: Simple Pixel Operations for Out-of-Distribution Fooling
Ankit Gupta, Christoph Adami, Emily Dolson (Michigan State University)
-
M Zeeshan, Saud Satti
-
Adversarial Limits of Quantum Certification: When Eve Defeats Detection
Davut Emre Tasar
-
RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS
Cong Wang, Changfeng Gao, Yang Xiang, Zhihao Du, Keyu An, Han Zhao, Qian Chen, Xiangang Li, Yingming Gao, Ya Li
-
Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs
Jinbo Liu, Defu Cao, Yifei Wei, Tianyao Su, Yuan Liang, Yushun Dong, Yue Zhao, Xiyang Hu
-
SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security
Wei Zhao, Zhe Li, Jun Sun
-
The Universal Weight Subspace Hypothesis
Prakhar Kaushik, Shravan Chaudhari, Ankit Vaidya, Rama Chellappa, Alan Yuille
-
L. D. M. S. Sai Teja, N. Siva Gopala Krishna, Ufaq Khan, Muhammad Haris Khan, Partha Pakray, Atul Mishra
-
Counterfeit Answers: Adversarial Forgery against OCR-Free Document Visual Question Answering
Marco Pintore, Maura Pintor, Dimosthenis Karatzas, Battista Biggio
-
Sheng Hang, Chaoxiang He, Hongsheng Hu, Hanqing Hu, Bin Benjamin Zhu, Shi-Feng Sun, Dawu Gu, Shuo Wang
-
Guanchen Du, Jianlong Xu, Wei Wei
-
Physics-Guided Deepfake Detection for Voice Authentication Systems
Alireza Mohammadi, Keshav Sood, Dhananjay Thiruvady, Asef Nazari
-
Wei Chee Yew, Hailun Xu, Sanjay Saha, Xiaotian Fan, Hiok Hian Ong, David Yuchen Wang, Kanchan Sarkar, Zhenheng Yang, Danhui Guan
-
SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting
Hanxiu Zhang, Yue Zheng
-
Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs
Tengyun Ma, Jiaqi Yao, Daojing He, Shihao Peng, Yu Li, Shaohui Liu, Zhuotao Tian
-
Out-of-the-box: Black-box Causal Attacks on Object Detectors
Melane Navaratnarajah, David A. Kelly, Hana Chockler
-
In-Context Representation Hijacking
Itay Yona, Amir Sarid, Michael Karasik, Yossi Gandelsman
-
TARA Test-by-Adaptive-Ranks for Quantum Anomaly Detection with Conformal Prediction Guarantees
Davut Emre Tasar, Ceren Ocal Tasar
-
Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits
Robert Dilworth
-
Zhigang Yang, Yuan Liu, Jiawei Zhang, Puning Zhang, Xinqiang Ma
-
Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning
Ge-Peng Ji, Jingyi Liu, Deng-Ping Fan, Nick Barnes
-
Towards Irreversible Machine Unlearning for Diffusion Models
Xun Yuan, Zilong Zhao, Jiayu Li, Aryan Pasikhani, Prosanta Gope, Biplab Sikdar
-
Automatic Attack Discovery for Few-Shot Class-Incremental Learning via Large Language Models
Haidong Kang, Wei Wu, Hanling Wang
-
Training-Free Policy Violation Detection via Activation-Space Whitening in LLMs
Oren Rachmil, Roy Betser, Itay Gershon, Omer Hofman, Nitay Yakoby, Yuval Meron, Idan Yankelev, Asaf Shabtai, Yuval Elovici, Roman Vainshtein
-
Immunity memory-based jailbreak detection: multi-agent adaptive guard for large language models
Jun Leng, Litian Zhang, Xi Zhang
-
Rethinking Security in Semantic Communication: Latent Manipulation as a New Threat
Zhiyuan Xi, Kun Zhu
-
Towards Privacy-Preserving Range Queries with Secure Learned Spatial Index over Encrypted Data
Zuan Wang, Juntao Lu, Jiazhuang Wu, Youliang Tian, Wei Song, Qiuxian Li, Duo Zhang
-
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
Huy Nghiem, Swetasudha Panda, Devashish Khatwani, Huy V. Nguyen, Krishnaram Kenthapadi, Hal Daumé III
-
Peter B. Walker, Hannah Davidson, Aiden Foster, Matthew Lienert, Thomas Pardue, Dale Russell
-
Leon Mayer, Piotr Kalinowski, Caroline Ebersbach, Marcel Knopp, Tim Rädsch, Evangelia Christodoulou, Annika Reinke, Fiona R. Kolbinger, Lena Maier-Hein
-
Studying Various Activation Functions and Non-IID Data for Machine Learning Model Robustness
Long Dang, Thushari Hapuarachchi, Kaiqi Xiong, Jing Lin
-
One Detector Fits All: Robust and Adaptive Detection of Malicious Packages from PyPI to Enterprises
Biagio Montaruli, Luca Compagna, Serena Elisa Ponta, Davide Balzarotti
-
Tipping the Dominos: Topology-Aware Multi-Hop Attacks on LLM-Based Multi-Agent Systems
Ruichao Liang, Le Yin, Jing Chen, Cong Wu, Xiaoyu Zhang, Huangpeng Gu, Zijian Zhang, Yang Liu
-
WildCode: An Empirical Analysis of Code Generated by ChatGPT
Kobra Khanmohammadi, Pooria Roy, Raphael Khoury, Abdelwahab Hamou-Lhadj, Wilfried Patrick Konan
-
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Zhenglin Cheng, Peng Sun, Jianguo Li, Tao Lin
-
Fast and Flexible Robustness Certificates for Semantic Segmentation
Thomas Massena (IRIT-MISFIT, DTIPG - SNCF, UT3), Corentin Friedrich, Franck Mamalet, Mathieu Serrurier (IRIT-MISFIT)
-
Yubo Hou, Mohamed Ragab, Min Wu, Chee-Keong Kwoh, Xiaoli Li, Zhenghua Chen
-
Invasive Context Engineering to Control Large Language Models
Thomas Rivasseau
-
COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers
Junyu Wang, Changjia Zhu, Yuanbo Zhou, Lingyao Li, Xu He, Junjie Xiong
-
VACoT: Rethinking Visual Data Augmentation with VLMs
Zhengzhuo Xu, Chong Sun, SiNan Du, Chen Li, Jing Lyu, Chun Yuan
-
When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents
Tsimur Hadeliya, Mohammad Ali Jauhar, Nidhi Sakpal, Diogo Cruz
-
ADORE: Autonomous Domain-Oriented Relevance Engine for E-commerce
Zheng Fang, Donghao Xie, Ming Pang, Chunyuan Yuan, Xue Jiang, Changping Peng, Zhangang Lin, Zheng Luo
-
CryptoQA: A Large-scale Question-answering Dataset for AI-assisted Cryptography
Mayar Elfares, Pascal Reisert, Tilman Dietz, Manpa Barman, Ahmed Zaki, Ralf Küsters, Andreas Bulling
-
Reasoning-Aware Multimodal Fusion for Hateful Video Detection
Shuonan Yang, Tailin Chen, Jiangbei Yue, Guangliang Cheng, Jianbo Jiao, Zeyu Fu
-
Defense That Attacks: How Robust Models Become Better Attackers
Mohamed Awad, Mahmoud Akrm, Walid Gomaa
-
GraphMatch: Fusing Language and Graph Representations in a Dynamic Two-Sided Work Marketplace
Mikołaj Sacha, Hammad Jafri, Mattie Terzolo, Ayan Sinha, Andrew Rabinovich
-
Lumos: Let there be Language Model System Certification
Isha Chaudhary, Vedaant Jain, Avaljot Singh, Kavya Sachdeva, Sayan Ranu, Gagandeep Singh
-
LeechHijack: Covert Computational Resource Exploitation in Intelligent Agent Systems
Yuanhe Zhang, Weiliu Wang, Zhenhong Zhou, Kun Wang, Jie Zhang, Li Sun, Yang Liu, Sen Su
-
Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities
Yuan Xiong, Ziqi Miao, Lijun Li, Chen Qian, Jie Li, Jing Shao
-
SAGE: Style-Adaptive Generalization for Privacy-Constrained Semantic Segmentation Across Domains
Qingmei Li, Yang Zhang, Peifeng Zhang, Haohuan Fu, Juepeng Zheng
-
Dual-Robust Cross-Domain Offline Reinforcement Learning Against Dynamics Shifts
Zhongjian Qiao, Rui Yang, Jiafei Lyu, Xiu Li, Zhongxiang Dai, Zhuoran Yang, Siyang Gao, Shuang Qiu
-
FGC-Comp: Adaptive Neighbor-Grouped Attribute Completion for Graph-based Anomaly Detection
Junpeng Wu, Pinheng Zong
-
Adversarial Jamming for Autoencoder Distribution Matching
Waleed El-Geresy, Deniz Gündüz
-
FiMMIA: scaling semantic perturbation-based membership inference across modalities
Anton Emelyanov, Sergei Kudriashov, Alena Fenogenova
-
Adaptive Decentralized Federated Learning for Robust Optimization
Shuyuan Wu, Feifei Wang, Yuan Gao, Hansheng Wang
-
Quantum Vanguard: Server Optimized Privacy Fortified Federated Intelligence for Future Vehicles
Dev Gurung, Shiva Raj Pokhrel
-
Ziyi Tong, Feifei Sun, Le Minh Nguyen
-
HydroDCM: Hydrological Domain-Conditioned Modulation for Cross-Reservoir Inflow Prediction
Pengfei Hu, Fan Ming, Xiaoxue Han, Chang Lu, Yue Ning, Dan Lu
-
Robust Tabular Foundation Models
Matthew Peroni, Franck Le, Vadim Sheinin
-
Randomized Masked Finetuning: An Efficient Way to Mitigate Memorization of PIIs in LLMs
Kunj Joshi, David A. Smith
-
Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks
Songwen Zhao, Danqing Wang, Kexun Zhang, Jiaxuan Luo, Zhuo Li, Lei Li
-
DPAC: Distribution-Preserving Adversarial Control for Diffusion Sampling
Han-Jin Lee, Han-Ju Lee, Jin-Seong Kim, Seok-Hwan Choi
-
Yongxin Zhou, Philippe Mulhem, Didier Schwab
-
Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation
Zirui Zhao, Boye Niu, David Hsu, Wee Sun Lee
-
EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations
Xinyun Zhou, Xinfeng Li, Yinan Peng, Ming Xu, Xuanwang Zhang, Miao Yu, Yidong Wang, Xiaojun Jia, Kun Wang, Qingsong Wen, XiaoFeng Wang, Wei Dong
-
Dual Randomized Smoothing: Beyond Global Noise Variance
Chenhao Sun, Yuhao Mao, Martin Vechev
-
Beyond SFT: Reinforcement Learning for Safer Large Reasoning Models with Better Reasoning Ability
Jinghan Jia, Nathalie Baracaldo, Sijia Liu
-
Securing Large Language Models (LLMs) from Prompt Injection Attacks
Omar Farooq Khan Suri, John McCrae
-
Physical ID-Transfer Attacks against Multi-Object Tracking via Adversarial Trajectory
Chenyi Wang, Yanmao Man, Raymond Muller, Ming Li, Z. Berkay Celik, Ryan Gerdes, Jonathan Petit
-
On the Tension Between Optimality and Adversarial Robustness in Policy Optimization
Haoran Li, Jiayu Lv, Congying Han, Zicheng Zhang, Anqi Li, Yan Liu, Tiande Guo, Nan Jiang
-
Ali Nafisi, Sina Asghari, Mohammad Saeed Arvenaghi, Hossein Shakibania
-
Label Forensics: Interpreting Hard Labels in Black-Box Text Classifier
Mengyao Du, Gang Yang, Han Fang, Quanjun Yin, Ee-chien Chang
-
SA-ADP: Sensitivity-Aware Adaptive Differential Privacy for Large Language Models
Stella Etuk, Ashraf Matrawy
-
On the Unreasonable Effectiveness of Last-layer Retraining
John C. Hill, Tyler LaBonte, Xinchen Zhang, Vidya Muthukumar
-
Jimin Choi, Max Z. Li
-
Differentially Private and Federated Structure Learning in Bayesian Networks
Ghita Fassy El Fehri, Aurélien Bellet, Philippe Bastien
-
Zihao Wang, Kar Wai Fok, Vrizlynn L. L. Thing
-
Rongzhe Wei, Peizhi Niu, Xinjie Shen, Tony Tu, Yifan Li, Ruihan Wu, Eli Chien, Olgica Milenkovic, Pan Li
-
TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?
Lewen Yan, Jilin Mei, Tianyi Zhou, Lige Huang, Jie Zhang, Dongrui Liu, Jing Shao
-
Factor(T,U): Factored Cognition Strengthens Monitoring of Untrusted AI
Aaron Sandoval, Cody Rushing
-
Adversarial Robustness of Traffic Classification under Resource Constraints: Input Structure Matters
Adel Chehade, Edoardo Ragusa, Paolo Gastaldo, Rodolfo Zunino
-
Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks
Haowei Fu, Bo Ni, Han Xu, Kunpeng Liu, Dan Lin, Tyler Derr
-
Many-to-One Adversarial Consensus: Exposing Multi-Agent Collusion Risks in AI-Based Healthcare
Adeela Bashir, The Anh han, Zia Ush Shamszaman
-
CluCERT: Certifying LLM Robustness via Clustering-Guided Denoising Smoothing
Zixia Wang, Gaojie Jin, Jia Hu, Ronghui Mu
-
Cen Lu, Yung-Chen Tang, Andrea Cavallaro
-
Concept-Guided Backdoor Attack on Vision Language Models
Haoyu Shen, Weimin Lyu, Haotian Xu, Tengfei Ma
-
Graph Data Augmentation with Contrastive Learning on Covariate Distribution Shift
Fanlong Zeng, Wensheng Gan
-
Bias Injection Attacks on RAG Databases and Sanitization Defenses
Hao Wu, Prateek Saxena
-
World Model Robustness via Surprise Recognition
Geigh Zollicoffer, Tanush Chopra, Mingkuan Yan, Xiaoxu Ma, Kenneth Eaton, Mark Riedl
-
Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios
Jianxiang Zang, Yongda Wei, Ruxue Bai, Shiyu Jiang, Nijia Mo, Binhong Li, Qiang Sun, Hui Liu
-
The Outline of Deception: Physical Adversarial Attacks on Traffic Signs Using Edge Patches
Haojie Jia, Te Hu, Haowen Li, Long Jin, Chongshi Xin, Yuchi Yao, Jiarui Xiao
-
Mitigating Indirect Prompt Injection via Instruction-Following Intent Analysis
Mintong Kang, Chong Xiang, Sanjay Kariyappa, Chaowei Xiao, Bo Li, Edward Suh
-
Tao Zhang, Yevgeniy Vorobeychik
-
Chenyi Zhang, Tao Shang, Chao Guo, Ruohan He
-
SEA: Spectral Edge Attacks on Graph Neural Networks
Yongyu Wang
-
When Safety Blocks Sense: Measuring Semantic Confusion in LLM Refusals
Riad Ahmed Anonto, Md Labid Al Nahiyan, Md Tanvir Hassan
-
Teleportation-Based Defenses for Privacy in Approximate Machine Unlearning
Mohammad M Maheri, Xavier Cadet, Peter Chin, Hamed Haddadi
-
Gradient Inversion in Federated Reinforcement Learning
Shenghong He
-
Adversarial Signed Graph Learning with Differential Privacy
Haobin Ke, Sen Zhang, Qingqing Ye, Xun Ran, Haibo Hu
-
Red Teaming Large Reasoning Models
Jiawei Chen, Yang Yang, Chao Yu, Yu Tian, Zhi Cao, Linghao Li, Hang Su, Zhaoxia Yin
-
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards
Junyan Ye, Leiqi Zhu, Yuncheng Guo, Dongzhi Jiang, Zilong Huang, Yifan Zhang, Zhiyuan Yan, Haohuan Fu, Conghui He, Weijia Li
-
IslandRun: Privacy-Aware Multi-Objective Orchestration for Distributed AI Inference
Bala Siva Sai Akhil Malepati
-
Goutham Nalagatla
-
Razieh Ghaedi, AmirReza BabaAhmadi, Reyer Zwiggelaar, Xinqi Fan, Nashid Alam
-
Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models
Zhongqi Wang, Jie Zhang, Shiguang Shan, Xilin Chen
-
Yongkang Hu, Yu Cheng, Yushuo Zhang, Yuan Xie, Zhaoxia Yin
-
Sample-Efficient Tabular Self-Play for Offline Robust Reinforcement Learning
Na Li, Zewu Zheng, Wei Ni, Hangguan Shan, Wenjie Zhang, Xinyu Li
-
Solving Neural Min-Max Games: The Role of Architecture, Initialization & Dynamics
Deep Patel, Emmanouil-Vasileios Vlatakis-Gkaragkounis
-
Benjamin D. Ballyk, Ankit Gupta, Sujay Konda, Kavitha Subramanian, Chris Landon, Ahmed Ammar Naseer, Georg Maierhofer, Sumanth Swaminathan, Vasudevan Venkateshwaran
-
Privacy Preserving Diffusion Models for Mixed-Type Tabular Data Generation
Timur Sattarov, Marco Schreyer, Damian Borth
-
RECTor: Robust and Efficient Correlation Attack on Tor
Binghui Wu, Dinil Mon Divakaran, Levente Csikor, Mohan Gurusamy
-
TrojanLoC: LLM-based Framework for RTL Trojan Localization
Weihua Xiao, Zeng Wang, Minghao Shao, Raghu Vamshi Hemadri, Ozgur Sinanoglu, Muhammad Shafique, Johann Knechtel, Siddharth Garg, Ramesh Karri
-
Superpixel Attack: Enhancing Black-box Adversarial Attack with Image-driven Division Areas
Issa Oe, Keiichiro Yamamura, Hiroki Ishikura, Ryo Hamahira, Katsuki Fujisawa
-
Yang Li, Chong Ma, Yuanzheng Li, Sen Li, Yanbo Chen, Zhaoyang Dong
-
Pruning Graphs by Adversarial Robustness Evaluation to Strengthen GNN Defenses
Yongyu Wang
-
Adversarial Training for Process Reward Models
Gurusha Juneja, Deepak Nathani, William Yang Wang
-
AgentShield: Make MAS more secure and efficient
Kaixiang Wang, Zhaojiacheng Zhou, Bunyod Suvonov, Jiong Lou, Jie LI
-
Are LLMs Good Safety Agents or a Propaganda Engine?
Neemesh Yadav, Francesco Ortu, Jiarui Liu, Joeun Yook, Bernhard Schölkopf, Rada Mihalcea, Alberto Cazzaniga, Zhijing Jin
-
Fault-Tolerant MARL for CAVs under Observation Perturbations for Highway On-Ramp Merging
Yuchen Shi, Huaxin Pei, Yi Zhang, Danya Yao
-
A Game-Theoretic Approach for Adversarial Information Fusion in Distributed Sensor Networks
Kassem Kallas
-
Hoang Khang Phan, Nhat Tan Le
-
Reasoning Under Pressure: How do Training Incentives Influence Chain-of-Thought Monitorability?
Matt MacDermott, Qiyao Wei, Rada Djoneva, Francis Rhys Ward
-
DeFi TrustBoost: Blockchain and AI for Trustworthy Decentralized Financial Decisions
Swati Sachan, Dale S. Fickett
-
Does Self-Evaluation Enable Wireheading in Language Models?
David Demitri Africa, Hans Ethan Ting
-
Pirzada Suhail, Rehna Afroz, Amit Sethi
-
An Empirical Study on the Security Vulnerabilities of GPTs
Tong Wu, Weibin Wu, Zibin Zheng
-
Watermarks for Embeddings-as-a-Service Large Language Models
Anudeex Shetty
-
AI Deception: Risks, Dynamics, and Controls
Boyuan Chen, Sitong Fang, Jiaming Ji, Yanxu Zhu, Pengcheng Wen, Jinzhou Wu, Yingshui Tan, Boren Zheng, Mengying Yuan, Wenqi Chen, Donghai Hong, Alex Qiu, Xin Chen, Jiayi Zhou, Kaile Wang, Juntao Dai, Borong Zhang, Tianzhuo Yang, Saad Siddiqui, Isabella Duan, Yawen Duan, Brian Tse, Jen-Tse (Jay)Huang, Kun Wang, Baihui Zheng, Jiaheng Liu, Jian Yang, Yiming Li, Wenting Chen, Dongrui Liu, Lukas Vierling, Zhiheng Xi, Haobo Fu, Wenxuan Wang, Jitao Sang, Zhengyan Shi, Chi-Min Chan, Eugenie Shi, Simin Li, Juncheng Li, Wei Ji, Dong Li, Jun Song, Yinpeng Dong, Jie Fu, Bo Zheng, Min Yang, Yike Guo, Philip Torr, Zhongyuan Wang, Yaodong Yang, Tiejun Huang, Ya-Qin Zhang, Hongjiang Zhang, Andrew Yao
-
A Safety and Security Framework for Real-World Agentic Systems
Shaona Ghosh, Barnaby Simkin, Kyriacos Shiarlis, Soumili Nandi, Dan Zhao, Matthew Fiedler, Julia Bazinska, Nikki Pope, Roopa Prabhu, Daniel Rohrer, Michael Demoret, Bartley Richardson
-
Tianyu Zhang, Zihang Xi, Jingyu Hua, Sheng Zhong
-
Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs
Daniel Agyei Asante, Md Mokarram Chowdhury, Yang Li
-
RemedyGS: Defend 3D Gaussian Splatting against Computation Cost Attacks
Yanping Li, Zhening Liu, Zijian Li, Zehong Lin, Jun Zhang
-
GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents
Xinyu Zhang, Yixin Wu, Boyang Zhang, Chenhao Lin, Chao Shen, Michael Backes, Yang Zhang
-
PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration
Junfei Zhan, Haoxun Shen, Zheng Lin, Tengjiao He
-
Mingzhe Li, Renhao Zhang, Zhiyang Wen, Siqi Pan, Bruno Castro da Silva, Juan Zhai, Shiqing Ma
-
Shoe Style-Invariant and Ground-Aware Learning for Dense Foot Contact Estimation
Daniel Sungho Jung, Kyoung Mu Lee
-
Creating Blank Canvas Against AI-enabled Image Forgery
Qi Song, Ziyuan Luo, Renjie Wan
-
Can Protective Watermarking Safeguard the Copyright of 3D Gaussian Splatting?
Wenkai Huang, Yijia Guo, Gaolei Li, Lei Ma, Hang Zhang, Liwen Hu, Jiazheng Wang, Jianhua Li, Tiejun Huang
-
ABounD: Adversarial Boundary-Driven Few-Shot Learning for Multi-Class Anomaly Detection
Runzhi Deng, Yundi Hu, Xinshuang Zhang, Zhao Wang, Xixi Liu, Wang-Zhou Dai, Caifeng Shan, Fang Zhao
-
Shanchuan Lin, Ceyuan Yang, Zhijie Lin, Hao Chen, Haoqi Fan
-
FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning
Yuan Yao, Lixu Wang, Jiaqi Wu, Jin Song, Simin Chen, Zehua Wang, Zijian Tian, Wei Chen, Huixia Li, Xiaoxiao Li
-
Enhancing Trustworthiness with Mixed Precision: Benchmarks, Opportunities, and Challenges
Guanxi Lu, Hao Mark Chen, Zhiqiang Que, Wayne Luk, Hongxiang Fan
-
Privacy-Utility-Bias Trade-offs for Privacy-Preserving Recommender Systems
Shiva Parsarad, Isabel Wagner
-
Difficulties with Evaluating a Deception Detector for AIs
Lewis Smith, Bilal Chughtai, Neel Nanda
-
An Efficient Privacy-preserving Intrusion Detection Scheme for UAV Swarm Networks
Kanchon Gharami, Shafika Showkat Moni
-
Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks
Richard J. Young
-
Minghui Min, Yulu Li, Gang Li, Meng Li, Hongliang Zhang, Miao Pan, Dusit Niyato, Zhu Han
-
Exposing Vulnerabilities in RL: A Novel Stealthy Backdoor Attack through Reward Poisoning
Bokang Zhang, Chaojun Lu, Jianhui Li, Junfeng Wu
-
CacheTrap: Injecting Trojans in LLMs without Leaving any Traces in Inputs or Weights
Mohaiminul Al Nahian (1), Abeer Matar A. Almalky (1), Gamana Aragonda (2), Ranyang Zhou (2), Sabbir Ahmed (1), Dmitry Ponomarev (1), Li Yang (3), Shaahin Angizi (2), Adnan Siraj Rakin (1) ((1) SUNY Binghamton, (2) New Jersey Institute of Technology, (3) UNC Charlotte)
-
Ghosting Your LLM: Without The Knowledge of Your Gradient and Data
Abeer Matar A. Almalky (1), Ziyan Wang (2), Mohaiminul Al Nahian (1), Li Yang (2), Adnan Siraj Rakin (1) ((1) Binghamton University, (2) UNC Charlotte)
-
NetDeTox: Adversarial and Efficient Evasion of Hardware-Security GNNs via RL-LLM Orchestration
Zeng Wang, Minghao Shao, Akashdeep Saha, Ramesh Karri, Johann Knechtel, Muhammad Shafique, Ozgur Sinanoglu
-
Towards Trustworthy Legal AI through LLM Agents and Formal Reasoning
Linze Chen, Yufan Cai, Zhe Hou, Jinsong Dong
-
Resilient Charging Infrastructure via Decentralized Coordination of Electric Vehicles at Scale
Chuhao Qin, Alexandru Sorici, Andrei Olaru, Evangelos Pournaras, Adina Magda Florea
-
GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision
Yuxiao Xiang, Junchi Chen, Zhenchao Jin, Changtao Miao, Haojie Yuan, Qi Chu, Tao Gong, Nenghai Yu
-
Taehoon Kang, Taeyong Kim
-
Dongkyu Derek Cho, Huan Song, Arijit Ghosh Chowdhury, Haotian An, Yawei Wang, Rohit Thekkanal, Negin Sokhandan, Sharlina Keshava, Hannah Marlowe
-
CAHS-Attack: CLIP-Aware Heuristic Search Attack Method for Stable Diffusion
Shuhan Xia, Jing Dai, Hui Ouyang, Yadong Shang, Dongxiao Zhao, Peipei Li
-
Privacy in Federated Learning with Spiking Neural Networks
Dogukan Aksu, Jesus Martinez del Rincon, Ihsen Alouani
-
When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models
Hui Lu, Yi Yu, Yiming Yang, Chenyu Yi, Qixin Zhang, Bingquan Shen, Alex C. Kot, Xudong Jiang
-
Self-Guided Defense: Adaptive Safety Alignment for Reasoning Models via Synthesized Guidelines
Yuhang Wang, Yanxu Zhu, Dongyuan Lu, Jitao Sang
-
Multimodal Robust Prompt Distillation for 3D Point Cloud Models
Xiang Gu, Liming Lu, Xu Zheng, Anan Du, Yongbin Zhou, Shuchao Pang
-
HarmonicAttack: An Adaptive Cross-Domain Audio Watermark Removal
Kexin Li, Xiao Hu, Ilya Grishchenko, David Lie
-
Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models
Naifu Zhang, Wei Tao, Xi Xiao, Qianpu Sun, Yuxin Zheng, Wentao Mo, Peiqiang Wang, Nan Zhang
-
Escaping the Verifier: Learning to Reason via Demonstrations
Locke Cai, Ivan Provilkov
-
Al Amin, Kamrul Hasan, Liang Hong, Sharif Ullah
-
TEAR: Temporal-aware Automated Red-teaming for Text-to-Video Models
Jiaming He, Guanyu Hou, Hongwei Li, Zhicong Huang, Kangjie Chen, Yi Yu, Wenbo Jiang, Guowen Xu, Tianwei Zhang
-
MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training
Haotian Xue, Qi Chen, Zhonghao Wang, Xun Huang, Eli Shechtman, Jinrong Xie, Yongxin Chen
-
Yaw Osei Adjei (Kwame Nkrumah University of Science and Technology)
-
Dataset Poisoning Attacks on Behavioral Cloning Policies
Akansha Kalra, Soumil Datta, Ethan Gilmore, Duc La, Guanhong Tao, Daniel S. Brown
-
Computing Strategic Responses to Non-Linear Classifiers
Jack Geary, Boyan Gao, Henry Gouk
-
EvilGenie: A Reward Hacking Benchmark
Jonathan Gabor, Jayson Lynch, Jonathan Rosenfeld
-
Data Exfiltration by Compression Attack: Definition and Evaluation on Medical Image Data
Huiyu Li, Nicholas Ayache, Hervé Delingette
-
Illuminating the Black Box: Real-Time Monitoring of Backdoor Unlearning in CNNs via Explainable AI
Tien Dat Hoang
-
Active Learning for GCN-based Action Recognition
Hichem Sahbi
-
Deceptron: Learned Local Inverses for Fast and Stable Physics Inversion
Aaditya L. Kachhadiya
-
Standardized Threat Taxonomy for AI Security, Governance, and Regulatory Compliance
Hernan Huwyler
-
Exploring Dynamic Properties of Backdoor Training Through Information Bottleneck
Xinyu Liu, Xu Zhang, Can Chen, Ren Wang
-
ABLE: Using Adversarial Pairs to Construct Local Models for Explaining Model Predictions
Krishna Khadka, Sunny Shree, Pujan Budhathoki, Yu Lei, Raghu Kacker, D. Richard Kuhn
-
The Double-Edged Nature of the Rashomon Set for Trustworthy Machine Learning
Ethan Hsu, Harry Chen, Chudi Zhong, Lesia Semenova
-
Fatemeh Akbarian, Anahita Baninajjar, Yingyi Zhang, Ananth Balashankar, Amir Aminifar
-
SA^2GFM: Enhancing Robust Graph Foundation Models with Structure-Aware Semantic Augmentation
Junhua Shi, Qingyun Sun, Haonan Yuan, Xingcheng Fu
-
Self-Transparency Failures in Expert-Persona LLMs: How Instruction-Following Overrides Disclosure
Alex Diep
-
Towards Benign Memory Forgetting for Selective Multimodal Large Language Model Unlearning
Zhen Zeng, Leijiang Gu, Zhangling Duan, Feng Li, Zenglin Shi, Cees G. M. Snoek, Meng Wang
-
Data Augmentation Techniques to Reverse-Engineer Neural Network Weights from Input-Output Queries
Alexander Beiser, Flavio Martinelli, Wulfram Gerstner, Johanni Brea
-
Quantifying the Privacy Implications of High-Fidelity Synthetic Network Traffic
Van Tran, Shinan Liu, Tian Li, Nick Feamster
-
PaTAS: A Parallel System for Trust Propagation in Neural Networks Using Subjective Logic
Koffi Ismael Ouattara, Ioannis Krontiris, Theo Dimitrakos, Dennis Eisermann, Frank Kargl
-
Cross-LLM Generalization of Behavioral Backdoor Detection in AI Agent Supply Chains
Arun Chowdary Sanna
-
Zero-Knowledge Proof Based Verifiable Inference of Models
Yunxiao Wang
-
On the Feasibility of Hijacking MLLMs' Decision Chain via One Perturbation
Changyue Li, Jiaying Li, Youliang Yuan, Jiaming He, Zhicong Huang, Pinjia He
-
Sidahmed Benabderrahmane, James Cheney, Talal Rahwan
-
BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents
Kaiyuan Zhang, Mark Tenenholtz, Kyle Polley, Jerry Ma, Denis Yarats, Ninghui Li
-
Adversarial Confusion Attack: Disrupting Multimodal Large Language Models
Jakub Hoscilowicz, Artur Janicki
-
GFT-GCN: Privacy-Preserving 3D Face Mesh Recognition with Spectral Diffusion
Hichem Felouat, Hanrui Wang, Isao Echizen
-
V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs
Sen Nie, Jie Zhang, Jianxin Yan, Shiguang Shan, Xilin Chen
-
The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation
Weijia Mao, Hao Chen, Zhenheng Yang, Mike Zheng Shou
-
TReFT: Taming Rectified Flow Models For One-Step Image Translation
Shengqian Li, Ming Gao, Yi Liu, Zuzeng Lin, Feng Wang, Feng Dai
-
GS-Checker: Tampering Localization for 3D Gaussian Splatting
Haoliang Han, Ziyuan Luo, Jun Qi, Anderson Rocha, Renjie Wan
-
Frequency Bias Matters: Diving into Robust and Generalized Deep Image Forgery Detection
Chi Liu, Tianqing Zhu, Wanlei Zhou, Wei Zhao
-
Jun Jia, Hongyi Miao, Yingjie Zhou, Linhan Cao, Yanwei Jiang, Wangqiu Zhou, Dandan Zhu, Hua Yang, Wei Sun, Xiongkuo Min, Guangtao Zhai
-
Latent Diffusion Inversion Requires Understanding the Latent Space
Mingxing Rao, Bowen Qu, Daniel Moyer
-
Shreevanth Krishnaa Gopalakrishnan, Stephen Hailes
-
Learning from Risk: LLM-Guided Generation of Safety-Critical Scenarios with Prior Knowledge
Yuhang Wang, Heye Huang, Zhenhua Xu, Kailai Sun, Baoshen Guo, Jinhua Zhao
-
Xiaojiao Xiao, Qinmin Vivian Hu, Tae Hyun Kim, Guanghui Wang
-
Trung Cuong Dang, David Mohaisen
-
Supporting Students in Navigating LLM-Generated Insecure Code
Jaehwan Park, Kyungchan Lim, Seonhye Park, Doowon Kim
-
Securing the Model Context Protocol (MCP): Risks, Controls, and Governance
Herman Errico, Jiquan Ngiam, Shanita Sojan
-
Categorical Framework for Quantum-Resistant Zero-Trust AI Security
I. Cherkaoui, C. Clarke, J. Horgan, I. Dey
-
Jun Jia, Hongyi Miao, Yingjie Zhou, Wangqiu Zhou, Jianbo Zhang, Linhan Cao, Dandan Zhu, Hua Yang, Xiongkuo Min, Wei Sun, Guangtao Zhai
-
FlowSteer: Guiding Few-Step Image Synthesis with Authentic Trajectories
Lei Ke, Hubery Yin, Gongye Liu, Zhengyao Lv, Jingcai Guo, Chen Li, Wenhan Luo, Yujiu Yang, Jing Lyu
-
Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations
Ryan Wong (1), Hosea David Yu Fei Ng (1), Dhananjai Sharma (1), Glenn Jun Jie Ng (1), Kavishvaran Srinivasan (1) ((1) National University of Singapore)
-
Learning to Compress Graphs via Dual Agents for Consistent Topological Robustness Evaluation
Qisen Chai, Yansong Wang, Junjie Huang, Tao Jia
-
Xurui Li, Kaisong Song, Rui Zhu, Pin-Yu Chen, Haixu Tang
-
Mohamed Rissal Hedna, Sesugh Samuel Nder
-
Yingjia Shang, Yi Liu, Huimin Wang, Furong Li, Wenfang Sun, Wu Chengyu, Yefeng Zheng
-
Open-weight genome language model safeguards: Assessing robustness via adversarial fine-tuning
James R. M. Black, Moritz S. Hanke, Aaron Maiwald, Tina Hernandez-Boussard, Oliver M. Crook, Jaspreet Pannu
-
UniGame: Turning a Unified Multimodal Model Into Its Own Adversary
Zhaolong Su, Wang Lu, Hao Chen, Sharon Li, Jindong Wang
-
Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation
Junbo Zhang, Ran Chen, Qianli Zhou, Xinyang Deng, Wen Jiang
-
Now You See It, Now You Don't - Instant Concept Erasure for Safe Text-to-Image and Video Generation
Shristi Das Biswas, Arani Roy, Kaushik Roy
-
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
Juncheng Li, Yige Li, Hanxun Huang, Yunhao Chen, Xin Wang, Yixu Wang, Xingjun Ma, Yu-Gang Jiang
-
Leveraging Adversarial Learning for Pathological Fidelity in Virtual Staining
José Teixeira, Pascal Klöckner, Diana Montezuma, Melis Erdal Cesur, João Fraga, Hugo M. Horlings, Jaime S. Cardoso, Sara P. Oliveira
-
Beilin Chu, Weike You, Mengtao Li, Tingting Zheng, Kehan Zhao, Xuan Xu, Zhigao Lu, Jia Song, Moxuan Xu, Linna Zhou
-
Three-Dimensional Anatomical Data Generation Based on Artificial Neural Networks
Ann-Sophia Müller, Moonkwang Jeong, Meng Zhang, Jiyuan Tian, Arkadiusz Miernik, Stefanie Speidel, Tian Qiu
-
Robust and Generalizable GNN Fine-Tuning via Uncertainty-aware Adapter Learning
Bo Jiang, Weijun Zhao, Beibei Wang, Xiao Wang, Jin Tang
-
FedPoisonTTP: A Threat Model and Poisoning Attack for Federated Test-Time Personalization
Md Akil Raihan Iftee, Syed Md. Ahnaf Hasan, Amin Ahsan Ali, AKM Mahbubur Rahman, Sajib Mistry, Aneesh Krishna
-
Subtract the Corruption: Training-Data-Free Corrective Machine Unlearning using Task Arithmetic
Mostafa Mozafari, Farooq Ahmad Wani, Maria Sofia Bucarelli, Fabrizio Silvestri
-
Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM
Adarsh Kumarappan, Ayushi Mehrotra
-
Hi-SAFE: Hierarchical Secure Aggregation for Lightweight Federated Learning
Hyeong-Gun Joo, Songnam Hong, Seunghwan Lee, Dong-Joon Shin
-
Targeted Manipulation: Slope-Based Attacks on Financial Time-Series Data
Dominik Luszczynski
-
RoguePrompt: Dual-Layer Ciphering for Self-Reconstruction to Circumvent LLM Moderation
Benyamin Tafreshian
-
Yu Cui, Yifei Liu, Hang Fu, Sicheng Pan, Haibin Zhang, Cong Zuo, Licheng Wang
-
AttackPilot: Autonomous Inference Attacks Against ML Services With LLM-Based Agents
Yixin Wu, Rui Wen, Chi Cui, Michael Backes, Yang Zhang
-
Synthetic Data: AI's New Weapon Against Android Malware
Angelo Gaspar Diniz Nogueira, Kayua Oleques Paim, Hendrio Bragança, Rodrigo Brandão Mansilha, Diego Kreutz
-
Steven Peh
-
Maria Thoma, Michalis A. Savelonas, Dimitris K. Iakovidis
-
Automating Deception: Scalable Multi-Turn LLM Jailbreaks
Adarsh Kumarappan, Ananya Mujoo
-
An Invariant Latent Space Perspective on Language Model Inversion
Wentao Ye, Jiaqi Hu, Haobo Wang, Xinpeng Ti, Zhiqing Xiao, Hao Chen, Liyao Li, Lei Feng, Sai Wu, Junbo Zhao
-
DISCO: A Browser-Based Privacy-Preserving Framework for Distributed Collaborative Learning
Julien T. T. Vignoud, Valérian Rousset, Hugo El Guedj, Ignacio Aleman, Walid Bennaceur, Batuhan Faik Derinbay, Eduard Ďurech, Damien Gengler, Lucas Giordano, Felix Grimberg, Franziska Lippoldt, Christina Kopidaki, Jiafan Liu, Lauris Lopata, Nathan Maire, Paul Mansat, Martin Milenkoski, Emmanuel Omont, Güneş Özgün, Mina Petrović, Francesco Posa, Morgan Ridel, Giorgio Savini, Marcel Torne, Lucas Trognon, Alyssa Unell, Olena Zavertiaieva, Sai Praneeth Karimireddy, Tahseen Rabbani, Mary-Anne Hartley, Martin Jaggi
-
EAGER: Edge-Aligned LLM Defense for Robust, Efficient, and Accurate Cybersecurity Question Answering
Onat Gungor, Roshan Sood, Jiasheng Zhou, Tajana Rosing
-
David Amebley, Sayanton Dibbo
-
Medical Malice: A Dataset for Context-Aware Safety in Healthcare LLMs
Andrew Maranhão Ventura D'addario
-
Natural Emergent Misalignment from Reward Hacking in Production RL
Monte MacDiarmid, Benjamin Wright, Jonathan Uesato, Joe Benton, Jon Kutasov, Sara Price, Naia Bouscal, Sam Bowman, Trenton Bricken, Alex Cloud, Carson Denison, Johannes Gasteiger, Ryan Greenblatt, Jan Leike, Jack Lindsey, Vlad Mikulik, Ethan Perez, Alex Rodrigues, Drake Thomas, Albert Webson, Daniel Ziegler, Evan Hubinger
-
Xiaoqing Wang, Keman Huang, Bin Liang, Hongyu Li, Xiaoyong Du
-
Evaluating perturbation robustnessof generative systems that use COBOL code inputs
Samuel Ackerman, Wesam Ibraheem, Orna Raz, Marcel Zalmanovici
-
Syed Mohaiminul Hoque, Naimur Rahman, Md Sakhawat Hossain
-
Richard J. Young
-
Hao Shen, Jikang Cheng, Renye Yan, Zhongyuan Wang, Wei Peng, Baojin Huang
-
Robust Physical Adversarial Patches Using Dynamically Optimized Clusters
Harrison Bagley, Will Meakin, Simon Lucey, Yee Wei Law, Tat-Jun Chin
-
Generative Myopia: Why Diffusion Models Fail at Structure
Milad Siami
-
Ensuring Calibration Robustness in Split Conformal Prediction Under Adversarial Attacks
Xunlei Qian, Yue Xing
-
Differential privacy with dependent data
Valentin Roth, Marco Avella-Medina
-
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
Yanting Wang, Runpeng Geng, Jinghui Chen, Minhao Cheng, Jinyuan Jia
-
Building Resilient Information Ecosystems: Large LLM-Generated Dataset of Persuasion Attacks
Hsien-Te Kao, Aleksey Panasyuk, Peter Bautista, William Dupree, Gabriel Ganberg, Jeffrey M. Beaubien, Laura Cassani, Svitlana Volkova
-
Yi Zhang, Tianxiang Xu, Zijian Li, Chao Zhang, Kunyu Zhang, Zhan Gao, Meinuo Li, Xiaohan Zhang, Qichao Qi, Bing Chen
-
Position: The Complexity of Perfect AI Alignment -- Formalizing the RLHF Trilemma
Subramanyam Sahoo, Aman Chadha, Vinija Jain, Divya Chaudhary
-
Svitlana Volkova, Will Dupree, Hsien-Te Kao, Peter Bautista, Gabe Ganberg, Jeff Beaubien, Laura Cassani
-
Yanxi Li, Ruocheng Shan
-
Towards Effective, Stealthy, and Persistent Backdoor Attacks Targeting Graph Foundation Models
Jiayi Luo, Qingyun Sun, Lingjuan Lyu, Ziwei Zhang, Haonan Yuan, Xingcheng Fu, Jianxin Li
-
Privacy Auditing of Multi-domain Graph Pre-trained Model under Membership Inference Attacks
Jiayi Luo, Qingyun Sun, Yuecen Wei, Haonan Yuan, Xingcheng Fu, Jianxin Li
-
H. Zhang, L. Zhang, G. Epiphaniou, C. Maple
-
Adversarial Pseudo-replay for Exemplar-free Class-incremental Learning
Hiroto Honda
-
Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction
Yusong Wu, Stephen Brade, Teng Ma, Tia-Jane Fowler, Enning Yang, Berker Banar, Aaron Courville, Natasha Jaques, Cheng-Zhi Anna Huang
-
Federated Anomaly Detection and Mitigation for EV Charging Forecasting Under Cyberattacks
Oluleke Babayomi, Dong-Seong Kim
-
Understanding Private Learning From Feature Perspective
Meng Ding, Mingxi Lei, Shaopeng Fu, Shaowei Wang, Di Wang, Jinhui Xu
-
Curvature-Aware Safety Restoration In LLMs Fine-Tuning
Thong Bach, Thanh Nguyen-Tang, Dung Nguyen, Thao Minh Le, Truyen Tran
-
Vulnerability-Aware Robust Multimodal Adversarial Training
Junrui Zhang, Xinyu Zhao, Jie Peng, Chenjie Wang, Jianmin Ji, Tianlong Chen
-
Beyond Jailbreak: Unveiling Risks in LLM Applications Arising from Blurred Capability Boundaries
Yunyi Zhang, Shibo Cui, Baojun Liu, Jingkai Yu, Min Zhang, Fan Shi, Han Zheng
-
ASTRA: Agentic Steerability and Risk Assessment Framework
Itay Hazan, Yael Mathov, Guy Shtar, Ron Bitton, Itsik Mantin
-
Exploiting the Experts: Unauthorized Compression in MoE-LLMs
Pinaki Prasad Guha Neogi, Ahmad Mohammadshirazi, Dheeraj Kulshrestha, Rajiv Ramnath
-
Geometric-Disentangelment Unlearning
Duo Zhou, Yuji Zhang, Tianxin Wei, Ruizhong Qiu, Ke Yang, Xiao Lin, Cheng Qian, Jingrui He, Hanghang Tong, Heng Ji, Huan Zhang
-
Don't Learn, Ground: A Case for Natural Language Inference with Visual Grounding
Daniil Ignatev, Ayman Santeer, Albert Gatt, Denis Paperno
-
Vision Language Models are Confused Tourists
Patrick Amadeus Irawan, Ikhlasul Akmal Hanif, Muhammad Dehan Al Kautsar, Genta Indra Winata, Fajri Koto, Alham Fikri Aji
-
MultiPriv: Benchmarking Individual-Level Privacy Reasoning in Vision-Language Models
Xiongtao Sun, Hui Li, Jiaming Zhang, Yujie Yang, Kaili Liu, Ruxin Feng, Wen Jun Tan, Wei Yang Bryan Lim
-
One-Step Diffusion Transformer for Controllable Real-World Image Super-Resolution
Yushun Fang, Yuxiang Chen, Shibo Yin, Qiang Hu, Jiangchao Yao, Ya Zhang, Xiaoyun Zhang, Yanfeng Wang
-
ATAC: Augmentation-Based Test-Time Adversarial Correction for CLIP
Linxiang Su, András Balogh
-
Zheng Wang, Yi Zhang, Siddartha Khastgir, Carsten Maple, Xingyu Zhao
-
MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language Models
Yuqi Li, Junhao Dong, Chuanguang Yang, Shiping Wen, Piotr Koniusz, Tingwen Huang, Yingli Tian, Yew-Soon Ong
-
Steering in the Shadows: Causal Amplification for Activation Space Attacks in Large Language Models
Zhiyuan Xu, Stanislav Abaimov, Joseph Gardiner, Sana Belguith
-
Cognitive Inception: Agentic Reasoning against Visual Deceptions by Injecting Skepticism
Yinjie Zhao, Heng Zhao, Bihan Wen, Joey Tianyi Zhou
-
Evaluating Adversarial Vulnerabilities in Modern Large Language Models
Tom Perel
-
MURMUR: Using cross-user chatter to break collaborative language agents in groups
Atharv Singh Patlan, Peiyao Sheng, S. Ashwin Hebbar, Prateek Mittal, Pramod Viswanath
-
Enhancing Adversarial Transferability through Block Stretch and Shrink
Quan Liu, Feng Ye, Chenhao Lu, Shuming Zhen, Guanliang Huang, Lunzhe Chen, Xudong Ke
-
AEGIS: Preserving privacy of 3D Facial Avatars with Adversarial Perturbations
Dawid Wolkiewicz, Anastasiya Pechko, Przemysław Spurek, Piotr Syga
-
GANGR: GAN-Assisted Scalable and Efficient Global Routing Parallelization
Hadi Khodaei Jooshin, Inna Partin-Vaisband
-
Jithin Krishnan
-
Detecting Sleeper Agents in Large Language Models via Semantic Drift Analysis
Shahin Zanbaghi, Ryan Rostampour, Farhan Abid, Salim Al Jarmakani
-
Hiding in the AI Traffic: Abusing MCP for LLM-Powered Agentic Red Teaming
Strahinja Janjuesvic, Anna Baron Garcia, Sohrob Kazerounian
-
KeFan Li, Mengfei Wang, Hengzhi Zhang, Zhichao Li, Yuan Yuan, Mu Li, Xiang Gao, Hailong Sun, Chunming Hu, Weifeng Lv
-
Physically Realistic Sequence-Level Adversarial Clothing for Robust Human-Detection Evasion
Dingkun Zhou, Patrick P. K. Chan, Hengxu Wu, Shikang Zheng, Ruiqi Huang, Yuanjie Zhao
-
When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models
Yuping Yan, Yuhan Xie, Yinxin Zhang, Lingjuan Lyu, Yaochu Jin
-
Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security
Wei Zhao, Zhe Li, Yige Li, Jun Sun
-
"To Survive, I Must Defect": Jailbreaking LLMs via the Game-Theory Scenarios
Zhen Sun, Zongmin Zhang, Deqi Liang, Han Sun, Yule Liu, Yun Shen, Xiangshan Gao, Yilong Yang, Shuai Liu, Yutao Yue, Xinlei He
-
Sayak Mukherjee, Samrat Chatterjee, Emilie Purvine, Ted Fujimoto, Tegan Emerson
-
PSM: Prompt Sensitivity Minimization via LLM-Guided Black-Box Optimization
Huseein Jawad, Nicolas Brunel
-
Layer-wise Noise Guided Selective Wavelet Reconstruction for Robust Medical Image Segmentation
Yuting Lu, Ziliang Wang, Weixin Xu, Wei Zhang, Yongqiang Zhao, Yang Yu, Xiaohong Zhang
-
An Image Is Worth Ten Thousand Words: Verbose-Text Induction Attacks on VLMs
Zhi Luo, Zenghui Yuan, Wenqi Wei, Daizong Liu, Pan Zhou
-
Erase to Retain: Low Rank Adaptation Guided Selective Unlearning in Medical Segmentation Networks
Nirjhor Datta, Md. Golam Rabiul Alam
-
Loss Functions Robust to the Presence of Label Errors
Nicholas Pellegrino, David Szczecina, Paul Fieguth
-
Rate-optimal community detection near the KS threshold via node-robust algorithms
Jingqiu Ding, Yiding Hua, Kasper Lindberg, David Steurer, Aleksandr Storozhenko
-
Yijun Yang, Lichao Wang, Jianping Zhang, Chi Harold Liu, Lanqing Hong, Qiang Xu
-
Chunyang Li, Zifeng Kang, Junwei Zhang, Zhuo Ma, Anda Cheng, Xinghua Li, Jianfeng Ma
-
AutoBackdoor: Automating Backdoor Attacks via LLM Agents
Yige Li, Zhe Li, Wei Zhao, Nay Myat Min, Hanxun Huang, Xingjun Ma, Jun Sun
-
SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge
Adeel Yousaf, Joseph Fioresi, James Beetham, Amrit Singh Bedi, Mubarak Shah
-
PEPPER: Perception-Guided Perturbation for Robust Backdoor Defense in Text-to-Image Diffusion Models
Oscar Chew, Po-Yi Lu, Jayden Lin, Kuan-Hao Huang, Hsuan-Tien Lin
-
Membership Inference Attacks Beyond Overfitting
Mona Khalil, Alberto Blanco-Justicia, Najeeb Jebreel, Josep Domingo-Ferrer
-
As If We've Met Before: LLMs Exhibit Certainty in Recognizing Seen Files
Haodong Li, Jingqi Zhang, Xiao Cheng, Peihua Mai, Haoyu Wang, Yang Pan
-
Effective Code Membership Inference for Code Completion Models via Adversarial Prompts
Yuan Jiang, Zehao Li, Shan Huang, Christoph Treude, Xiaohong Su, Tiantian Wang
-
Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks
Zimo Ji, Xunguang Wang, Zongjie Li, Pingchuan Ma, Yudong Gao, Daoyuan Wu, Xincheng Yan, Tian Tian, Shuai Wang
-
Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models
Piercosma Bisconti, Matteo Prandi, Federico Pierucci, Francesco Giarrusso, Marcantonio Bracale, Marcello Galisai, Vincenzo Suriani, Olga Sorokoletova, Federico Sartore, Daniele Nardi
-
HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation
Linyin Luo, Yujuan Ding, Yunshan Ma, Wenqi Fan, Hanjiang Lai
-
What Your Features Reveal: Data-Efficient Black-Box Feature Inversion Attack for Split DNNs
Zhihan Ren, Lijun He, Jiaxi Liang, Xinzhu Fu, Haixia Bi, Fan Li
-
Transferable Dual-Domain Feature Importance Attack against AI-Generated Image Detector
Weiheng Zhu, Gang Cao, Jing Liu, Lifang Yu, Shaowei Weng
-
Robust Bayesian Optimisation with Unbounded Corruptions
Abdelhamid Ezzerg, Ilija Bogunovic, Jeremias Knoblauch
-
Critical Evaluation of Quantum Machine Learning for Adversarial Robustness
Saeefa Rubaiyet Nowmi, Jesus Lopez, Md Mahmudul Alam Imon, Shahrooz Pouryouse, Mohammad Saidur Rahman
-
Trustworthy GenAI over 6G: Integrated Applications and Security Frameworks
Bui Duc Son, Trinh Van Chien, Dong In Kim
-
Privacy-Preserving IoT in Connected Aircraft Cabin
Nilesh Vyas, Benjamin Zhao, Aygün Baltaci, Gustavo de Carvalho Bertoli, Hassan Asghar, Markus Klügel, Gerrit Schramm, Martin Kubisch, Dali Kaafar
-
Securing AI Agents Against Prompt Injection Attacks
Badrinath Ramakrishnan, Akshaya Balaji
-
TopoReformer: Mitigating Adversarial Attacks Using Topological Purification in OCR Models
Bhagyesh Kumar, A S Aravinthakashan, Akshat Satyanarayan, Ishaan Gakhar, Ujjwal Verma
-
Boundary-Aware Adversarial Filtering for Reliable Diagnosis under Extreme Class Imbalance
Yanxuan Yu, Michael S. Hughes, Julien Lee, Jiacheng Zhou, Andrew F. Laine
-
When Harmless Words Harm: A New Threat to LLM Safety via Conceptual Triggers
Zhaoxin Zhang, Borui Chen, Yiming Hu, Youyang Qu, Tianqing Zhu, Longxiang Gao
-
When Words Change the Model: Sensitivity of LLMs for Constraint Programming Modelling
Alessio Pellegrino, Jacopo Mauro
-
From Narrow Unlearning to Emergent Misalignment: Causes, Consequences, and Containment in LLMs
Erum Mushtaq, Anil Ramakrishna, Satyapriya Krishna, Sattvik Sahai, Prasoon Goyal, Kai-Wei Chang, Tao Zhang, Rahul Gupta
-
Yule Liu, Heyi Zhang, Jinyi Zheng, Zhen Sun, Zifan Peng, Tianshuo Cong, Yilong Yang, Xinlei He, Zhuo Ma
-
FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration
Jingren Liu, Shuning Xu, Qirui Yang, Yun Wang, Xiangyu Chen, Zhong Ji
-
Certified Signed Graph Unlearning
Junpeng Zhao, Lin Li, Kaixi Hu, Kaize Shi, Jingling Yuan
-
Kangqiao Zhao, Shuo Huai, Xurui Song, Jun Luo
-
Sigil: Server-Enforced Watermarking in U-Shaped Split Federated Learning via Gradient Injection
Zhengchunmin Dai, Jiaxiong Tang, Peng Sun, Honglong Chen, Liantao Wu
-
Abolfazl Younesi, Leon Kiss, Zahra Najafabadi Samani, Juan Aznar Poveda, Thomas Fahringer
-
Stealth Fine-Tuning: Efficiently Breaking Alignment in RVLMs Using Self-Generated CoT
Le Yu, Zhengyue Zhao, Yawen Zheng, Yunhao Liu
-
Unified Defense for Large Language Models against Jailbreak and Fine-Tuning Attacks in Education
Xin Yi, Yue Li, Dongsheng Shi, Linlin Wang, Xiaoling Wang, Liang He
-
Steganographic Backdoor Attacks in NLP: Ultra-Low Poisoning and Defense Evasion
Eric Xue, Ruiyi Zhang, Zijun Zhang, Pengtao Xie
-
Coffee: Controllable Diffusion Fine-tuning
Ziyao Zeng, Jingcheng Ni, Ruyi Liu, Alex Wong
-
MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs
Huiyi Chen, Jiawei Peng, Dehai Min, Changchang Sun, Kaijie Chen, Yan Yan, Xu Yang, Lu Cheng
-
Certified but Fooled! Breaking Certified Defences with Ghost Certificates
Quoc Viet Vo, Tashreque M. Haq, Paul Montague, Tamas Abraham, Ehsan Abbasnejad, Damith C. Ranasinghe
-
Observational Auditing of Label Privacy
Iden Kalemaj, Luca Melis, Maxime Boucher, Ilya Mironov, Saeed Mahloujifar
-
N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator
Zheyu Lin, Jirui Yang, Hengqi Guo, Yubing Bao, Yao Guan
-
Watch Out for the Lifespan: Evaluating Backdoor Attacks Against Federated Model Adaptation
Bastien Vuillod, Pierre-Alain Moellic, Jean-Max Dutertre
-
Zifan Wang, Georgios Pantazis, Sergio Grammatico, Michael M. Zavlanos, Karl H. Johansson
-
Dynamic Black-box Backdoor Attacks on IoT Sensory Data
Ajesh Koyatan Chathoth, Stephen Lee
-
Privis: Towards Content-Aware Secure Volumetric Video Delivery
Kaiyuan Hu, Hong Kang, Yili Jin, Junhua Liu, Chengming Hu, Haolun Wu, Xue Liu
-
Beyond Fixed and Dynamic Prompts: Embedded Jailbreak Templates for Advancing LLM Security
Hajun Kim, Hyunsik Na, Daeseon Choi
-
SecureSign: Bridging Security and UX in Mobile Web3 through Emulated EIP-6963 Sandboxing
Charles Cheng Ji, Brandon Kong
-
Mathieu Dufour, Andrew Duncan
-
Henry Wong, Clement Fung, Weiran Lin, Karen Li, Stanley Chen, Lujo Bauer
-
Yuwen Zhang, Viet Tran, Paul Weng
-
Jailbreaking Large Vision Language Models in Intelligent Transportation Systems
Badhan Chandra Das, Md Tasnim Jawad, Md Jueal Mia, M. Hadi Amini, Yanzhao Wu
-
Find the Leak, Fix the Split: Cluster-Based Method to Prevent Leakage in Video-Derived Datasets
Noam Glazner, Noam Tsfaty, Sharon Shalev, Avishai Weizman
-
Accuracy is Not Enough: Poisoning Interpretability in Federated Learning via Color Skew
Farhin Farhad Riya, Shahinul Hoque, Jinyuan Stella Sun, Olivera Kotevska
-
The Battle of Metasurfaces: Understanding Security in Smart Radio Environments
Paul Staat, Christof Paar, Swarun Kumar
-
What Color Is It? A Text-Interference Multimodal Hallucination Benchmark
Jinkun Zhao, Lei Huang, Haixin Ge, Wenjun Wu
-
Pascal Zimmer, Ghassan Karame
-
VEIL: Jailbreaking Text-to-Video Models via Visual Exploitation from Implicit Language
Zonghao Ying, Moyang Chen, Nizhang Li, Zhiqiang Wang, Wenxin Zhang, Quanchen Zou, Zonglei Jing, Aishan Liu, Xianglong Liu
-
Scaling Patterns in Adversarial Alignment: Evidence from Multi-LLM Jailbreak Experiments
Samuel Nathanson, Rebecca Williams, Cynthia Matuszek
-
Uncovering and Aligning Anomalous Attention Heads to Defend Against NLP Backdoor Attacks
Haotian Jin, Yang Li, Haihui Fan, Lin Shen, Xiangfang Li, Bo Li
-
Falsely Accused: How AI Detectors Misjudge Slightly Polished Arabic Articles
Saleh Almohaimeed, Saad Almohaimeed, Mousa Jari, Khaled A. Alobaid, Fahad Alotaibi
-
Evaluating Autoformalization Robustness via Semantically Similar Paraphrasing
Hayden Moore, Asfahan Shah
-
Shaowei Guan, Yu Zhai, Zhengyu Zhang, Yanze Wang, Hin Chi Kwok
-
Runhao Jiang, Chengzhi Jiang, Rui Yan, Huajin Tang
-
Model Inversion Attack Against Deep Hashing
Dongdong Zhao, Qiben Xu, Ranxin Fang, Baogang Song
-
Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio
Guangke Chen, Yuhui Wang, Shouling Ji, Xiapu Luo, Ting Wang
-
GraphToxin: Reconstructing Full Unlearned Graphs from Graph Unlearning
Ying Song, Balaji Palanisamy
-
Exposing Weak Links in Multi-Agent Systems under Adversarial Prompting
Nirmit Arora, Sathvik Joel, Ishan Kavathekar, Palak, Rohan Gandhi, Yash Pandya, Tanuja Ganu, Aditya Kanade, Akshay Nambi
-
Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis
Farhad Abtahi, Fernando Seoane, Iván Pau, Mario Vega-Barbas
-
HealSplit: Towards Self-Healing through Adversarial Distillation in Split Federated Learning
Yuhan Xie, Chen Lyu
-
AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models
Haokun Chen, Jianing Li, Yao Zhang, Jinhe Bi, Yan Xia, Jindong Gu, Volker Tresp
-
Shaowei Guan, Hin Chi Kwok, Ngai Fong Law, Gregor Stiglic, Vivian Hui
-
Private Frequency Estimation Via Residue Number Systems
Héber H. Arcolezi
-
LANE: Lexical Adversarial Negative Examples for Word Sense Disambiguation
Jader Martins Camboim de Sá, Jooyoung Lee, Cédric Pruski, Marcos Da Silveira
-
Redwan Hussain, Mizanur Rahman, Prithwiraj Bhattacharjee
-
Questioning the Stability of Visual Question Answering
Amir Rosenfeld, Neta Glazer, Ethan Fetaya
-
One-to-N Backdoor Attack in 3D Point Cloud via Spherical Trigger
Dongmei Shan, Wei Lian, Chongxia Wang
-
Parameter-Efficient MoE LoRA for Few-Shot Multi-Style Editing
Cong Cao, Yujie Xu, Xiaodong Xu
-
SimuFreeMark: A Noise-Simulation-Free Robust Watermarking Against Image Editing
Yichao Tang, Mingyang Li, Di Miao, Sheng Li, Zhenxing Qian, Xinpeng Zhang
-
Unsupervised Robust Domain Adaptation: Paradigm, Theory and Algorithm
Fuxiang Huang, Xiaowei Fu, Shiyu Ye, Lina Ma, Wen Li, Xinbo Gao, David Zhang, Lei Zhang
-
Adaptive Symmetrization of the KL Divergence
Omri Ben-Dov, Luiz F.O. Chamon
-
Armadillo: Robust Single-Server Secure Aggregation for Federated Learning with Input Validation
Yiping Ma, Yue Guo, Harish Karthikeyan, Antigoni Polychroniadou
-
On the Information-Theoretic Fragility of Robust Watermarking under Diffusion Editing
Yunyi Ni, Ziyu Yang, Ze Niu, Emily Davis, Finn Carter
-
SEAL: Subspace-Anchored Watermarks for LLM Ownership
Yanbo Dai, Zongjie Li, Zhenlan Ji, Shuai Wang
-
Robustness of LLM-enabled vehicle trajectory prediction under data security threats
Feilong Wang, Fuqiang Liu
-
MoETTA: Test-Time Adaptation Under Mixed Distribution Shifts with MoE-LayerNorm
Xiao Fan, Jingyan Jiang, Zhaoru Chen, Fanding Huang, Xiao Chen, Qinting Jiang, Bowen Zhang, Xing Tang, Zhi Wang
-
Learning Fair Representations with Kolmogorov-Arnold Networks
Amisha Priyadarshini, Sergio Gago-Masague
-
Better LLM Reasoning via Dual-Play
Zhengxin Zhang, Chengyu Huang, Aochong Oliver Li, Claire Cardie
-
Le Xu, Jiayu Chen
-
NegBLEURT Forest: Leveraging Inconsistencies for Detecting Jailbreak Attacks
Lama Sleem, Jerome Francois, Lujun Li, Nathan Foucher, Niccolo Gentile, Radu State
-
Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging
Qinfeng Li, Miao Pan, Jintao Chen, Fu Teng, Zhiqiang Shen, Ge Su, Hao Peng, Xuhong Zhang
-
BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models
Shuaitong Liu, Renjue Li, Lijia Yu, Lijun Zhang, Zhiming Liu, Gaojie Jin
-
PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization
Runpeng Geng, Yanting Wang, Chenlong Yin, Minhao Cheng, Ying Chen, Jinyuan Jia
-
Optimal Welfare in Noncooperative Network Formation under Attack
Natan Doubez, Pascal Lenzner, Marcus Wunderlich
-
Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
Yudong Yang, Xuezhen Zhang, Zhifeng Han, Siyin Wang, Jimin Zhuang, Zengrui Jin, Jing Shao, Guangzhi Sun, Chao Zhang
-
destroR: Attacking Transfer Models with Obfuscous Examples to Discard Perplexity
Saadat Rafid Ahmed, Rubayet Shareen, Radoan Sharkar, Nazia Hossain, Mansur Mahi, Farig Yousuf Sadeque
-
Nikolaos Tsagkas, Andreas Sochopoulos, Duolikun Danier, Sethu Vijayakumar, Alexandros Kouris, Oisin Mac Aodha, Chris Xiaoxuan Lu
-
DeepDefense: Layer-Wise Gradient-Feature Alignment for Building Robust Neural Networks
Ci Lin, Tet Yeap, Iluju Kiringa, Biwei Zhang
-
CTRL-ALT-DECEIT: Sabotage Evaluations for Automated AI R&D
Francis Rhys Ward, Teun van der Weij, Hanna Gábor, Sam Martin, Raja Mehta Moreno, Harel Lidar, Louis Makower, Thomas Jodrell, Lauren Robson
-
Black-Box On-Policy Distillation of Large Language Models
Tianzhu Ye, Li Dong, Zewen Chi, Xun Wu, Shaohan Huang, Furu Wei
-
Phantom Menace: Exploring and Enhancing the Robustness of VLA Models Against Physical Sensor Attacks
Xuancun Lu, Jiaxiang Chen, Shilin Xiao, Zizhi Jin, Zhangrui Chen, Hanwen Yu, Bohan Qian, Ruochen Zhou, Xiaoyu Ji, Wenyuan Xu
-
Consensus Sampling for Safer Generative AI
Adam Tauman Kalai, Yael Tauman Kalai, Or Zamir
-
Robust and Diverse Multi-Agent Learning via Rational Policy Gradient
Niklas Lauffer, Ameesh Shah, Micah Carroll, Sanjit A. Seshia, Stuart Russell, Michael Dennis
-
FAST-CAD: A Fairness-Aware Framework for Non-Contact Stroke Diagnosis
Tianming Sha, Zechuan Chen, Zhan Cheng, Haotian Zhai, Xuwei Ding, Junnan Li, Haixiang Tang, Zaoting Sun, Yanchuan Tang, Yongzhe Yi, Yanjie Huang, Anhao Li, Yuan Gao, Keze Wang
-
iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification
Zixun Xiong, Gaoyi Wu, Qingyang Yu, Mingyu Derek Ma, Lingfeng Yao, Miao Pan, Xiaojiang Du, Hao Wang
-
From Structure to Detail: Hierarchical Distillation for Efficient Diffusion Model
Hanbo Cheng, Peng Wang, Kaixiang Lei, Qi Li, Zhen Zou, Pengfei Hu, Jun Du
-
Improving Sustainability of Adversarial Examples in Class-Incremental Learning
Taifeng Liu, Xinjing Liu, Liangqiu Dong, Yang Liu, Yilong Yang, Zhuo Ma
-
Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment
Shigeki Kusaka, Keita Saito, Mikoto Kudo, Takumi Tanabe, Akifumi Wachi, Youhei Akimoto
-
Differentially Private Rankings via Outranking Methods and Performance Data Aggregation
Luis Del Vasto-Terrientes
-
Jian Wang, Hong Shen, Chan-Tong Lam
-
GuardFed: A Trustworthy Federated Learning Framework Against Dual-Facet Attacks
Yanli Li, Yanan Zhou, Zhongliang Guo, Nan Yang, Yuning Zhang, Huaming Chen, Dong Yuan, Weiping Ding, Witold Pedrycz
-
Jiajie Su, Zihan Nan, Yunshan Ma, Xiaobo Xia, Xiaohua Feng, Weiming Liu, Xiaolin Zheng, Chaochao Chen
-
Spatio-Temporal Graph Unlearning
Qiming Guo, Wenbo Sun, Wenlu Wang
-
AdaptDel: Adaptable Deletion Rate Randomized Smoothing for Certified Robustness
Zhuoqun Huang, Neil G. Marchant, Olga Ohrimenko, Benjamin I. P. Rubinstein
-
Boosting Adversarial Transferability via Ensemble Non-Attention
Yipeng Zou, Qin Liu, Jie Wu, Yu Peng, Guo Chen, Hui Zhou, Guanghui Ye
-
Diversifying Counterattacks: Orthogonal Exploration for Robust CLIP Inference
Chengze Jiang, Minjing Dong, Xinli Shi, Jie Gui
-
DBINDS - Can Initial Noise from Diffusion Model Inversion Help Reveal AI-Generated Videos?
Yanlin Wu, Xiaogang Yuan, Dezhi An
-
Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation
Kazuki Iwahana, Yusuke Yamasaki, Akira Ito, Takayuki Miura, Toshiki Shibahara
-
Fairness-Aware Few-Shot Learning for Audio-Visual Stress Detection
Anushka Sanjay Shelke, Aditya Sneh, Arya Adyasha, Haroon R. Lone
-
Philipp Dingfelder, Christian Riess
-
Philip Sosnin, Matthew Wicker, Josh Collyer, Calvin Tsay
-
DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks
Yunfei Yang, Xiaojun Chen, Yuexin Xuan, Zhendong Zhao, Xin Zhao, He Li
-
Adversarially and Distributionally Robust Virtual Energy Storage Systems via the Scenario Approach
Georgios Pantazis, Nicola Mignoni, Raffaele Carli, Mariagrazia Dotoli, Sergio Grammatico
-
Evaluating from Benign to Dynamic Adversarial: A Squid Game for Large Language Models
Zijian Chen, Wenjun Zhang, Guangtao Zhai
-
Transferable Hypergraph Attack via Injecting Nodes into Pivotal Hyperedges
Meixia He, Peican Zhu, Le Cheng, Yangming Guo, Manman Yuan, Keke Tang
-
Convergence dynamics of Agent-to-Agent Interactions with Misaligned objectives
Romain Cosentino, Sarath Shekkizhar, Adam Earle
-
3D Guard-Layer: An Integrated Agentic AI Safety System for Edge Artificial Intelligence
Eren Kurshan, Yuan Xie, Paul Franzon
-
Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding
Joseph Fioresi, Ishan Rajendrakumar Dave, Mubarak Shah
-
Jingjie He, Weijie Liang, Zihan Shan, Matthew Caesar
-
Automated Hardware Trojan Insertion in Industrial-Scale Designs
Yaroslav Popryho, Debjit Pal, Inna Partin-Vaisband
-
Jian Wang, Lijun He, Yixing Yong, Haixia Bi, Fan Li
-
A methodological analysis of prompt perturbations and their effect on attack success rates
Tiago Machado, Maysa Malfiza Garcia de Macedo, Rogerio Abreu de Paula, Marcelo Carpinette Grave, Aminat Adebiyi, Luan Soares de Souza, Enrico Santarelli, Claudio Pinhanez
-
SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought
Shourya Batra, Pierce Tillman, Samarth Gaggar, Shashank Kesineni, Kevin Zhu, Sunishchal Dev, Ashwinee Panda, Vasu Sharma, Maheep Chaudhary
-
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
Chloe Li, Mary Phuong, Daniel Tan
-
A Theoretical Analysis of Detecting Large Model-Generated Time Series
Junji Hou, Junzhou Zhao, Shuo Zhang, Pinghui Wang
-
Liang Shan, Kaicheng Shen, Wen Wu, Zhenyu Ying, Chaochao Lu, Guangze Ye, Liang He
-
Chun-Ming Huang, Li-Heng Chang, I-Hsin Chang, An-Sheng Lee, Hao Kuo-Chen
-
Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models
Kunhao Li, Wenhao Li, Di Wu, Lei Yang, Jun Bai, Ju Jia, Jason Xue
-
Differentiated Directional Intervention A Framework for Evading LLM Safety Alignment
Peng Zhang, peijie sun
-
FoCLIP: A Feature-Space Misalignment Framework for CLIP-Based Image Manipulation and Detection
Yulin Chen, Zeyuan Wang, Tianyuan Yu, Yingmei Wei, Liang Bai
-
E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Zhisheng Zhang, Derui Wang, Yifan Mi, Zhiyong Wu, Jie Gao, Yuxin Cao, Kai Ye, Minhui Xue, Jie Hao
-
More Agents Helps but Adversarial Robustness Gap Persists
Khashayar Alavi, Zhastay Yeltay, Lucie Flek, Akbar Karimi
-
Enabling Off-Policy Imitation Learning with Deep Actor Critic Stabilization
Sayambhu Sen, Shalabh Bhatnagar
-
Verifying rich robustness properties for neural networks
Mohammad Afzal, S. Akshay, Ashutosh Gupta
-
LoReTTA: A Low Resource Framework To Poison Continuous Time Dynamic Graphs
Himanshu Pal, Venkata Sai Pranav Bachina, Ankit Gangwal, Charu Sharma
-
Language Generation with Infinite Contamination
Anay Mehrotra, Grigoris Velegkas, Xifan Yu, Felix Zhou
-
Ruiheng Liu, XiaoBing Chen, Jinyu Zhang, Qiongwen Zhang, Yu Zhang, Bailong Yang
-
Yilin Jiang, Mingzi Zhang, Xuanyu Yin, Sheng Jin, Suyu Lu, Zuocan Ying, Zengyi Yu, Xiangjie Kong
-
HLPD: Aligning LLMs to Human Language Preference for Machine-Revised Text Detection
Fangqi Dai, Xingjian Jiang, Zizhuang Deng
-
SCOPE: Intrinsic Semantic Space Control for Mitigating Copyright Infringement in LLMs
Zhenliang Zhang, Xinyu Hu, Xiaojun Wan
-
Graph Representation-based Model Poisoning on the Heterogeneous Internet of Agents
Hanlin Cai, Houtianfu Wang, Haofan Dong, Kai Li, Ozgur B. Akan
-
Certified L2-Norm Robustness of 3D Point Cloud Recognition in the Frequency Domain
Liang Zhou, Qiming Wang, Tianze Chen
-
3D-ANC: Adaptive Neural Collapse for Robust 3D Point Cloud Recognition
Yuanmin Huang, Wenxuan Li, Mi Zhang, Xiaohan Zhang, Xiaoyu You, Min Yang
-
From Pretrain to Pain: Adversarial Vulnerability of Video Foundation Models Without Task Knowledge
Hui Lu, Yi Yu, Song Xia, Yiming Yang, Deepu Rajan, Boon Poh Ng, Alex Kot, Xudong Jiang
-
Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation
Yuxuan Zhou, Tao Yu, Wen Huang, Yuheng Zhang, Tao Dai, Shu-Tao Xia
-
Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization
Binyan Xu, Fan Yang, Di Tang, Xilin Dai, Kehuan Zhang
-
PlanT 2.0: Exposing Biases and Structural Flaws in Closed-Loop Driving
Simon Gerstenecker, Andreas Geiger, Katrin Renz
-
Non-Rival Data as Rival Products: An Encapsulation-Forging Approach for Data Synthesis
Kaidong Wang, Jiale Li, Shao-Bo Lin, Yao Wang
-
Beyond Uniform Deletion: A Data Value-Weighted Framework for Certified Machine Unlearning
Lisong He, Yi Yang, Xiangyu Chang
-
Breaking Privacy in Federated Clustering: Perfect Input Reconstruction via Temporal Correlations
Guang Yang, Lixia Luo, Qiongxiu Li
-
On Stealing Graph Neural Network Models
Marcin Podhajski, Jan Dubiński, Franziska Boenisch, Adam Dziedzic, Agnieszka Pręgowska, Tomasz P. Michalak
-
A Fully Polynomial-Time Algorithm for Robustly Learning Halfspaces over the Hypercube
Gautam Chandrasekaran, Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan
-
Adam Piaseczny, Eric Ruzomberka, Rohit Parasnis, Christopher G. Brinton
-
Uncovering Pretraining Code in LLMs: A Syntax-Aware Attribution Approach
Yuanheng Li, Zhuoyang Chen, Xiaoyun Liu, Yuhao Wang, Mingwei Liu, Yang Shi, Kaifeng Huang, Shengjie Zhao
-
Privacy on the Fly: A Predictive Adversarial Transformation Network for Mobile Sensor Data
Tianle Song, Chenhao Lin, Yang Cao, Zhengyu Zhao, Jiahao Sun, Chong Zhang, Le Yang, Chao Shen
-
JPRO: Automated Multimodal Jailbreaking via Multi-Agent Collaboration Framework
Yuxuan Zhou, Yang Bai, Kuofeng Gao, Tao Dai, Shu-Tao Xia
-
Private Sketches for Linear Regression
Shrutimoy Das, Debanuj Nayak, Anirban Dasgupta
-
Shuangqing Xu, Yifeng Zheng, Zhongyun Hua
-
Qiang Wang, Liying Yang, Jiayun Song, Yifan Bai, Jingtao Du
-
Biologically-Informed Hybrid Membership Inference Attacks on Generative Genomic Models
Asia Belfiore, Jonathan Passerat-Palmbach, Dmitrii Usynin
-
Efficient LLM Safety Evaluation through Multi-Agent Debate
Dachuan Lin, Guobin Shen, Zihao Yang, Tianrong Liu, Dongcheng Zhao, Yi Zeng
-
Dilli Prasad Sharma, Liang Xue, Xiaowei Sun, Xiaodong Lin, Pulei Xiong
-
RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework
Seif Ikbarieh, Kshitiz Aryal, Maanak Gupta
-
Mojtaba Noghabaei
-
Privacy-Preserving Federated Learning for Fair and Efficient Urban Traffic Optimization
Rathin Chandra Shit, Sharmila Subudhi
-
EchoMark: Perceptual Acoustic Environment Transfer with Watermark-Embedded Room Impulse Response
Chenpei Huang, Lingfeng Yao, Kyu In Lee, Lan Emily Zhang, Xun Chen, Miao Pan
-
TriShGAN: Enhancing Sparsity and Robustness in Multivariate Time Series Counterfactuals Explanation
Hongnan Ma, Yiwei Shi, Guanxiong Sun, Mengyue Yang, Weiru Liu
-
Robust Nearest Neighbour Retrieval Using Targeted Manifold Manipulation
B. Ghosh, H. Harikumar, S. Rana
-
Probably Approximately Global Robustness Certification
Peter Blohm, Patrick Indri, Thomas Gärtner, Sagar Malhotra
-
EASE: Practical and Efficient Safety Alignment for Small Language Models
Haonan Shi, Guoli Wang, Tu Ouyang, An Wang
-
When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins
Yigitcan Kaya, Anton Landerer, Stijn Pletinckx, Michelle Zimmermann, Christopher Kruegel, Giovanni Vigna
-
CGCE: Classifier-Guided Concept Erasure in Generative Models
Viet Nguyen, Vishal M. Patel
-
Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs
Alina Fastowski, Bardh Prenkaj, Yuxiao Li, Gjergji Kasneci
-
Runtime Safety Monitoring of Deep Neural Networks for Perception: A Survey
Albert Schotschneider, Svetlana Pavlitska, J. Marius Zöllner
-
A Privacy-Preserving Federated Learning Method with Homomorphic Encryption in Omics Data
Yusaku Negoya, Feifei Cui, Zilong Zhang, Miao Pan, Tomoaki Ohtsuki, Aohan Li
-
MCP-RiskCue: Can LLM infer risk information from MCP server System Logs?
Jiayi Fu, Qiyao Sun
-
Identity Card Presentation Attack Detection: A Systematic Review
Esteban M. Ruiz, Juan E. Tapia, Reinel T. Soto, Christoph Busch
-
Catching Contamination Before Generation: Spectral Kill Switches for Agents
Valentin Noël
-
CatBack: Universal Backdoor Attacks on Tabular Data via Categorical Encoding
Behrad Tajalli, Stefanos Koffas, Stjepan Picek
-
Enhancing Robustness of Graph Neural Networks through p-Laplacian
Anuj Kumar Sirohi, Subhanu Halder, Kabir Kumar, Sandeep Kumar
-
IndirectAD: Practical Data Poisoning Attacks against Recommender Systems for Item Promotion
Zihao Wang, Tianhao Mao, XiaoFeng Wang, Di Tang, Xiaozhong Liu
-
Perturbation-mitigated USV Navigation with Distributionally Robust Reinforcement Learning
Zhaofan Zhang, Minghao Yang, Sihong Xie, Hui Xiong
-
Can Fine-Tuning Erase Your Edits? On the Fragile Coexistence of Knowledge Editing and Adaptation
Yinjie Cheng, Paul Youssef, Christin Seifert, Jörg Schlötterer, Zhixue Zhao
-
Tharindu Fernando, Clinton Fookes, Sridha Sridharan
-
Learning Fourier shapes to probe the geometric world of deep neural networks
Jian Wang, Yixing Yong, Haixia Bi, Lijun He, Fan Li
-
Pluralistic Behavior Suite: Stress-Testing Multi-Turn Adherence to Custom Behavioral Policies
Prasoon Varshney, Makesh Narsimhan Sreedhar, Liwei Jiang, Traian Rebedea, Christopher Parisien
-
Deep learning models are vulnerable, but adversarial examples are even more vulnerable
Jun Li, Yanwei Xu, Keran Li, Xiaoli Zhang
-
TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems
Ishan Kavathekar, Hemang Jain, Ameya Rathod, Ponnurangam Kumaraguru, Tanuja Ganu
-
Yiting He, Zhishuai Liu, Weixin Wang, Pan Xu
-
Steering Language Models with Weight Arithmetic
Constanza Fierro, Fabien Roger
-
ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
Amr Gomaa, Ahmed Salem, Sahar Abdelnabi
-
Quantifying the Risk of Transferred Black Box Attacks
Disesdi Susanna Cox, Niklas Bunzel
-
Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding
Hadi Reisizadeh, Jiajun Ruan, Yiwei Chen, Soumyadeep Pal, Sijia Liu, Mingyi Hong
-
Associative Poisoning to Generative Machine Learning
Mathias Lundteigen Mohus, Jingyue Li, Zhirong Yang
-
Marius Fracarolli, Michael Staniek, Stefan Riezler
-
Janet Jenq, Hongda Shen
-
Adversarially Robust Multitask Adaptive Control
Kasra Fallah, Leonardo F. Toso, James Anderson
-
Md. Abdul Awal, Mrigank Rochan, Chanchal K. Roy
-
TRICK: Time and Range Integrity ChecK using Low Earth Orbiting Satellite for Securing GNSS
Arslan Mumtaz, Mridula Singh
-
A Secured Intent-Based Networking (sIBN) with Data-Driven Time-Aware Intrusion Detection
Urslla Uchechi Izuazu, Mounir Bensalem, Admela Jukan
-
VMDT: Decoding the Trustworthiness of Video Foundation Models
Yujin Potter, Zhun Wang, Nicholas Crispino, Kyle Montgomery, Alexander Xiong, Ethan Y. Chang, Francesco Pinto, Yuqi Chen, Rahul Gupta, Morteza Ziyadi, Christos Christodoulopoulos, Bo Li, Chenguang Wang, Dawn Song
-
Distributionally Robust Self Paced Curriculum Reinforcement Learning
Anirudh Satheesh, Keenan Powell, Vaneet Aggarwal
-
Distributionally Robust Multimodal Machine Learning
Peilin Yang, Yu Ma
-
AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research
Tim Beyer, Jonas Dornbusch, Jakob Steimle, Moritz Ladenburger, Leo Schwinn, Stephan Günnemann
-
DeNoise: Learning Robust Graph Representations for Unsupervised Graph-Level Anomaly Detection
Qingfeng Chen, Haojin Zeng, Jingyi Jie, Shichao Zhang, Debo Cheng
-
On the Brittleness of CLIP Text Encoders
Allie Tran, Luca Rossetto
-
Differentially Private In-Context Learning with Nearest Neighbor Search
Antti Koskela, Tejas Kulkarni, Laith Zumot
-
RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning
Xinyuan Li, Murong Xu, Wenbiao Tao, Hanlun Zhu, Yike Zhao, Jipeng Zhang, Yunshi Lan
-
Black-Box Guardrail Reverse-engineering Attack
Hongwei Yao, Yun Xia, Shuo Shao, Haoran Shi, Tong Qiao, Cong Wang
-
PrivacyCD: Hierarchical Unlearning for Protecting Student Privacy in Cognitive Diagnosis
Mingliang Hou, Yinuo Wang, Teng Guo, Zitao Liu, Wenzhou Dou, Jiaqi Zheng, Renqiang Luo, Mi Tian, Weiqi Luo
-
A Parallel Region-Adaptive Differential Privacy Framework for Image Pixelization
Ming Liu
-
Adversarially Robust and Interpretable Magecart Malware Detection
Pedro Pereira, José Gouveia, João Vitorino, Eva Maia, Isabel Praça
-
P-MIA: A Profiled-Based Membership Inference Attack on Cognitive Diagnosis Models
Mingliang Hou, Yinuo Wang, Teng Guo, Zitao Liu, Wenzhou Dou, Jiaqi Zheng, Renqiang Luo, Mi Tian, Weiqi Luo
-
Prompt-Based Safety Guidance Is Ineffective for Unlearned Text-to-Image Diffusion Models
Jiwoo Shin, Byeonghu Na, Mina Kang, Wonhyeok Choi, Il-chul Moon
-
Security Evaluation of Quantum Circuit Split Compilation under an Oracle-Guided Attack
Hongyu Zhang, Yuntao Liu
-
Fooling Algorithms in Non-Stationary Bandits using Belief Inertia
Gal Mendelson, Eyal Tadmor
-
Guarding the Meaning: Self-Supervised Training for Semantic Robustness in Guard Models
Cristina Pinneri, Christos Louizos
-
Decoupling Augmentation Bias in Prompt Learning for Vision-Language Models
Gahyeon Kim, Sohee Kim, Seokju Lee
-
Whisper Leak: a side-channel attack on Large Language Models
Geoff McDonald, Jonathan Bar Or
-
From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation
Najrin Sultana, Md Rafi Ur Rashid, Kang Gu, Shagufta Mehnaz
-
Yize Liu, Yunyun Hou, Aina Sui
-
A Lightweight 3D-CNN for Event-Based Human Action Recognition with Privacy-Preserving Potential
Mehdi Sefidgar Dilmaghani, Francis Fowley, Peter Corcoran
-
Byzantine-Robust Federated Learning with Learnable Aggregation Weights
Javad Parsa, Amir Hossein Daghestani, André M. H. Teixeira, Mikael Johansson
-
Death by a Thousand Prompts: Open Model Vulnerability Analysis
Amy Chang, Nicholas Conley, Harish Santhanalakshmi Ganesan, Adam Swanda
-
Bayesian Advantage of Re-Identification Attack in the Shuffle Model
Pengcheng Su, Haibo Cheng, Ping Wang
-
Auditing M-LLMs for Privacy Risks: A Synthetic Benchmark and Evaluation Framework
Junhao Li, Jiahao Chen, Zhou Feng, Chunyi Zhou
-
Desert Waste Detection and Classification Using Data-Based and Model-Based Enhanced YOLOv12 DL Model
Abdulmumin Sa'ad, Sulaimon Oyeniyi Adebayo, Abdul Jabbar Siddiqui
-
Contamination Detection for VLMs using Multi-Modal Semantic Perturbation
Jaden Park, Mu Cai, Feng Yao, Jingbo Shang, Soochahn Lee, Yong Jae Lee
-
Rishi Rajesh Shah, Chen Henry Wu, Shashwat Saxena, Ziqian Zhong, Alexander Robey, Aditi Raghunathan
-
SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking
Wenyuan Yang, Yichen Sun, Changzheng Chen, Zhixuan Chu, Jiaheng Zhang, Yiming Li, Dacheng Tao
-
Diffusion-Based Image Editing: An Unforeseen Adversary to Robust Invisible Watermarks
Wenkai Fu, Finn Carter, Yue Wang, Emily Davis, Bo Zhang
-
When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning
Chenyu Zhang, Minsol Kim, Shohreh Ghorbani, Jingyao Wu, Rosalind Picard, Patricia Maes, Paul Pu Liang
-
Optimizing AI Agent Attacks With Synthetic Data
Chloe Loughridge, Paul Colognese, Avery Griffin, Tyler Tracy, Jon Kutasov, Joe Benton
-
AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
Aashray Reddy, Andrew Zagula, Nicholas Saban
-
On The Dangers of Poisoned LLMs In Security Automation
Patrick Karlsen, Even Eilertsen
-
Ferhat Ozgur Catak, Jungwon Seo, Umit Cali
-
LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context
Yudong Li, Zhongliang Yang, Kejiang Chen, Wenxuan Wang, Tianxin Zhang, Sifang Wan, Kecheng Wang, Haitian Li, Xu Wang, Lefan Cheng, Youdan Yang, Baocheng Chen, Ziyu Liu, Yufei Sun, Liyan Wu, Wenya Wen, Xingchi Gu, Peiru Yang
-
Hao Li, Daiwei Lu, Jesse d'Almeida, Dilara Isik, Ehsan Khodapanah Aghdam, Nick DiSanto, Ayberk Acar, Susheela Sharma, Jie Ying Wu, Robert J. Webster III, Ipek Oguz
-
Robust Face Liveness Detection for Biometric Authentication using Single Image
Poulami Raha, Yeongnam Chae
-
A Non-Adversarial Approach to Idempotent Generative Modelling
Mohammed Al-Jaff, Giovanni Luca Marchetti, Michael C Welle, Jens Lundell, Mats G. Gustafsson, Gustav Eje Henter, Hossein Azizpour, Danica Kragic
-
Nesterov-Accelerated Robust Federated Learning Over Byzantine Adversaries
Lihan Xu, Yanjie Dong, Gang Wang, Runhao Zeng, Xiaoyi Fan, Xiping Hu
-
Enhancing Federated Learning Privacy with QUBO
Andras Ferenczi, Sutapa Samanta, Dagen Wang, Todd Hodges
-
Nicolas Riccieri Gardin Assumpcao, Leandro Villas
-
PrivGNN: High-Performance Secure Inference for Cryptographic Graph Neural Networks
Fuyi Wang, Zekai Chen, Mingyuan Fan, Jianying Zhou, Lei Pan, Leo Yu Zhang
-
An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks
Xu Liu, Yan Chen, Kan Ling, Yichi Zhu, Hengrun Zhang, Guisheng Fan, Huiqun Yu
-
Verifying LLM Inference to Prevent Model Weight Exfiltration
Roy Rinberg, Adam Karvonen, Alex Hoover, Daniel Reuter, Keri Warr
-
Evaluating Control Protocols for Untrusted AI Agents
Jon Kutasov, Chloe Loughridge, Yuqi Sun, Henry Sleight, Buck Shlegeris, Tyler Tracy, Joe Benton
-
W.K.M Mithsara, Ning Yang, Ahmed Imteaj, Hussein Zangoti, Abdur R. Shahid
-
Online Learning to Rank under Corruption: A Robust Cascading Bandits Approach
Fatemeh Ghaffari, Siddarth Sitaraman, Xutong Liu, Xuchuang Wang, Mohammad Hajiesmaili
-
PrivyWave: Privacy-Aware Wireless Sensing of Heartbeat
Yixuan Gao, Tanvir Ahmed, Zekun Chang, Thijs Roumen, Rajalakshmi Nandakumar
-
Aheer Sravon, Devdyuti Mazumder, Md. Ibrahim
-
Bayesian Evaluation of Large Language Model Behavior
Rachel Longjohn, Shang Wu, Saatvik Kher, Catarina Belém, Padhraic Smyth
-
Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing
Jinhua Yin, Peiru Yang, Chen Yang, Huili Wang, Zhiyang Hu, Shangguang Wang, Yongfeng Huang, Tao Qi
-
RobustFSM: Submodular Maximization in Federated Setting with Malicious Clients
Duc A. Tran, Dung Truong, Duy Le
-
CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing
Yifan Zhou, Tianshi Xu, Jue Hong, Ye Wu, Meng Li
-
Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models
Daniyal Ganiuly, Assel Smaiyl
-
Probabilistic Robustness for Free? Revisiting Training via a Benchmark
Yi Zhang, Zheng Wang, Zhen Chen, Wenjie Ruan, Qing Guo, Siddartha Khastgir, Carsten Maple, Xingyu Zhao
-
RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models
Hongyin Zhang, Shuo Zhang, Junxi Jin, Qixin Zeng, Runze Li, Donglin Wang
-
EraseFlow: Learning Concept Erasure Policies via GFlowNet-Driven Alignment
Abhiram Kusumba, Maitreya Patel, Kyle Min, Changhoon Kim, Chitta Baral, Yezhou Yang
-
Runyu Lu, Peng Zhang, Ruochuan Shi, Yuanheng Zhu, Dongbin Zhao, Yang Liu, Dong Wang, Cesare Alippi
-
Do Methods to Jailbreak and Defend LLMs Generalize Across Languages?
Berk Atil, Rebecca J. Passonneau, Fred Morstatter
-
Ruofan Liu, Yun Lin, Zhiyong Huang, Jin Song Dong
-
Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning
Qiusi Zhan, Hyeonjeong Ha, Rui Yang, Sirui Xu, Hanyang Chen, Liang-Yan Gui, Yu-Xiong Wang, Huan Zhang, Heng Ji, Daniel Kang
-
Consistency Training Helps Stop Sycophancy and Jailbreaks
Alex Irpan, Alexander Matt Turner, Mark Kurzeja, David K. Elson, Rohin Shah
-
ZEBRA: Towards Zero-Shot Cross-Subject Generalization for Universal Brain Visual Decoding
Haonan Wang, Jingyu Lu, Hongrui Li, Xiaomeng Li
-
Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler
Zixuan Hu, Li Shen, Zhenyi Wang, Yongxian Wei, Dacheng Tao
-
Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications
Zixuan Hu, Yongxian Wei, Li Shen, Zhenyi Wang, Lei Li, Chun Yuan, Dacheng Tao
-
Contrastive Knowledge Transfer and Robust Optimization for Secure Alignment of Large Language Models
Jiasen Zheng, Huajun Zhang, Xu Yan, Ran Hao, Chong Peng
-
SilhouetteTell: Practical Video Identification Leveraging Blurred Recordings of Video Subtitles
Guanchong Huang, Song Fang
-
Alik Pramanick, Mayank Bansal, Utkarsh Srivastava, Suklav Ghosh, Arijit Sur
-
C-LEAD: Contrastive Learning for Enhanced Adversarial Defense
Suklav Ghosh, Sonal Kumar, Arijit Sur
-
Rethinking Robust Adversarial Concept Erasure in Diffusion Models
Qinghong Yin, Yu Tian, Yue Zhang
-
A Hybrid Deep Learning and Forensic Approach for Robust Deepfake Detection
Sales Aribe Jr
-
Samarup Bhattacharya, Anubhab Bhattacharya, Abir Chakraborty
-
Chenghao Du, Quanfeng Huang, Tingxuan Tang, Zihao Wang, Yue Xiao
-
Prevalence of Security and Privacy Risk-Inducing Usage of AI-based Conversational Agents
Kathrin Grosse, Nico Ebert
-
Arka Dutta, Sujan Dutta, Rijul Magu, Soumyajit Datta, Munmun De Choudhury, Ashiqur R. KhudaBukhsh
-
Self-HarmLLM: Can Large Language Model Harm Itself?
Heehwan Kim, Sungjune Park, Daeseon Choi
-
Jianli Zhao, Tingchen Fu, Rylan Schaeffer, Mrinank Sharma, Fazl Barez
-
The Oversight Game: Learning to Cooperatively Balance an AI Agent's Safety and Autonomy
William Overman, Mohsen Bayati
-
SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning
Kaiwen Zhou, Ahmed Elgohary, A S M Iftekhar, Amin Saied
-
Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods
Jiali Cheng, Chirag Agarwal, Hadi Amiri
-
Security Risk of Misalignment between Text and Image in Multi-modal Model
Xiaosen Wang, Zhijin Ge, Shaokang Wang
-
SSCL-BW: Sample-Specific Clean-Label Backdoor Watermarking for Dataset Ownership Verification
Yingjia Wang, Ting Qiao, Xing Liu, Chongzuo Li, Sixing Wu, Jianbin Li
-
Robust Graph Condensation via Classification Complexity Mitigation
Jiayi Luo, Qingyun Sun, Beining Yang, Haonan Yuan, Xingcheng Fu, Yanbiao Ma, Jianxin Li, Philip S. Yu
-
Model Inversion with Layer-Specific Modeling and Alignment for Data-Free Continual Learning
Ruilin Tong, Haodong Lu, Yuhang Liu, Dong Gong
-
Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections
David Schmotz, Sahar Abdelnabi, Maksym Andriushchenko
-
On Measuring Localization of Shortcuts in Deep Networks
Nikita Tsoy, Nikola Konstantinov
-
ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models
Weifei Jin, Yuxin Cao, Junjie Su, Minhui Xue, Jie Hao, Ke Xu, Jin Song Dong, Derui Wang
-
PEEL: A Poisoning-Exposing Encoding Theoretical Framework for Local Differential Privacy
Lisha Shuai, Jiuling Dong, Nan Zhang, Shaofeng Tan, Haokun Zhang, Zilong Song, Gaoya Dong, Xiaolong Yang
-
A DRL-Empowered Multi-Level Jamming Approach for Secure Semantic Communication
Weixuan Chen, Qianqian Yang
-
Accurate Target Privacy Preserving Federated Learning Balancing Fairness and Utility
Kangkang Sun, Jun Wu, Minyi Guo, Jianhua Li, Jianwei Huang
-
Broken-Token: Filtering Obfuscated Prompts by Counting Characters-Per-Token
Shaked Zychlinski, Yuval Kainan
-
Fine-Grained Iterative Adversarial Attacks with Limited Computation Budget
Zhichao Hou, Weizhi Gao, Xiaorui Liu
-
Unvalidated Trust: Cross-Stage Vulnerabilities in Large Language Model Architectures
Dominik Schwarz
-
Semantically-Aware LLM Agent to Enhance Privacy in Conversational AI Services
Jayden Serenari, Stephen Lee
-
PF-DAformer: Proximal Femur Segmentation via Domain Adaptive Transformer for Dual-Center QCT
Rochak Dhakal, Chen Zhao, Zixin Shi, Joyce H. Keyak, Tadashi S. Kaneko, Kuan-Jui Su, Hui Shen, Hong-Wen Deng, Weihua Zhou
-
Reasoning Up the Instruction Ladder for Controllable Language Models
Zishuo Zheng, Vidhisha Balachandran, Chan Young Park, Faeze Brahman, Sachin Kumar
-
Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models
Juan Ren, Mark Dras, Usman Naseem
-
Lipschitz-aware Linearity Grafting for Certified Robustness
Yongjin Han, Suhyun Kim
-
Hasan Akgul, Mari Eplik, Javier Rojas, Aina Binti Abdullah, Pieter van der Merwe
-
DeepShield: Fortifying Deepfake Video Detection with Local and Global Forgery Analysis
Yinqi Cai, Jichang Li, Zhaolun Li, Weikai Chen, Rushi Lan, Xi Xie, Xiaonan Luo, Guanbin Li
-
A Unified Bilevel Model for Adversarial Learning and A Case Study
Yutong Zheng, Qingna Li
-
On the Stability of Neural Networks in Deep Learning
Blaise Delattre
-
Spectral Perturbation Bounds for Low-Rank Approximation with Applications to Privacy
Phuc Tran, Nisheeth K. Vishnoi, Van H. Vu
-
Model Inversion Attacks Meet Cryptographic Fuzzy Extractors
Mallika Prabhakar, Louise Xu, Prateek Saxena
-
NetEcho: From Real-World Streaming Side-Channels to Full LLM Conversation Recovery
Zheng Zhang, Guanlong Wu, Sen Deng, Shuai Wang, Yinqian Zhang
-
Emily Herron, Junqi Yin, Feiyi Wang
-
FakeZero: Real-Time, Privacy-Preserving Misinformation Detection for Facebook and X
Soufiane Essahli, Oussama Sarsar, Imane Fouad, Anas Motii, Ahmed Bentajer
-
Robust GNN Watermarking via Implicit Perception of Topological Invariants
Jipeng Li, Yannning Shen
-
Adversarial Pre-Padding: Generating Evasive Network Traffic Against Transformer-Based Classifiers
Quanliang Jing, Xinxin Fan, Yanyan Liu, Jingping Bi
-
Simon Yu, Peilin Yu, Hongbo Zheng, Huajie Shao, Han Zhao, Lui Sha
-
Layer of Truth: Probing Belief Shifts under Continual Pre-Training Poisoning
Svetlana Churina, Niranjan Chebrolu, Kokil Jaidka
-
Guangzhi Su, Shuchang Huang, Yutong Ke, Zhuohang Liu, Long Qian, Kaizhu Huang
-
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
Qiushi Sun, Mukai Li, Zhoumianze Liu, Zhihui Xie, Fangzhi Xu, Zhangyue Yin, Kanzhi Cheng, Zehao Li, Zichen Ding, Qi Liu, Zhiyong Wu, Zhuosheng Zhang, Ben Kao, Lingpeng Kong
-
Causal-Aware Generative Adversarial Networks with Reinforcement Learning
Tu Anh Hoang Nguyen, Dang Nguyen, Tri-Nhan Vo, Thuc Duy Le, Sunil Gupta
-
The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets
Yujun Kim, Chaewon Moon, Chulhee Yun
-
Viktoriia Zinkovich, Anton Antonov, Andrei Spiridonov, Denis Shepelev, Andrey Moskalenko, Daria Pugacheva, Elena Tutubalina, Andrey Kuznetsov, Vlad Shakhuro
-
Relative Scaling Laws for LLMs
William Held, David Hall, Percy Liang, Diyi Yang
-
SPICE: Self-Play In Corpus Environments Improves Reasoning
Bo Liu, Chuanyang Jin, Seungone Kim, Weizhe Yuan, Wenting Zhao, Ilia Kulikov, Xian Li, Sainbayar Sukhbaatar, Jack Lanchantin, Jason Weston
-
Heethanjan Kanagalingam, Thenukan Pathmanathan, Mokeeshan Vathanakumar, Tharmakulasingam Mukunthan
-
AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts
Yufan Liu, Wanqian Zhang, Huashan Chen, Lin Wang, Xiaojun Jia, Zheng Lin, Weiping Wang
-
Enhancing CLIP Robustness via Cross-Modality Alignment
Xingyu Zhu, Beier Zhu, Shuo Wang, Kesen Zhao, Hanwang Zhang
-
Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2
Ziqi Zhou, Yifan Hu, Yufei Song, Zijing Li, Shengshan Hu, Leo Yu Zhang, Dezhong Yao, Long Zheng, Hai Jin
-
A Dual-Branch CNN for Robust Detection of AI-Generated Facial Forgeries
Xin Zhang, Yuqi Song, Fei Zuo
-
A Pragmatic Way to Measure Chain-of-Thought Monitorability
Scott Emmons, Roland S. Zimmermann, David K. Elson, Rohin Shah
-
Mitigating Negative Transfer via Reducing Environmental Disagreement
Hui Sun, Zheng Xie, Hao-Yuan He, Ming Li
-
SPEAR++: Scaling Gradient Inversion via Sparsely-Used Dictionary Learning
Alexander Bakarsky, Dimitar I. Dimitrov, Maximilian Baader, Martin Vechev
-
PRIVET: Privacy Metric Based on Extreme Value Theory
Antoine Szatkownik, Aurélien Decelle, Beatriz Seoane, Nicolas Bereux, Léo Planche, Guillaume Charpiat, Burak Yelmen, Flora Jay, Cyril Furtlehner
-
A Comprehensive Evaluation Framework for Synthetic Trip Data Generation in Public Transport
Yuanyuan Wu, Zhenlin Qin, Zhenliang Ma
-
A Novel XAI-Enhanced Quantum Adversarial Networks for Velocity Dispersion Modeling in MaNGA Galaxies
Sathwik Narkedimilli, N V Saran Kumar, Aswath Babu H, Manjunath K Vanahalli, Manish M, Vinija Jain, Aman Chadha
-
Self-Concordant Perturbations for Linear Bandits
Lucas Lévy, Jean-Lou Valeau, Arya Akhavan, Patrick Rebeschini
-
Vishal Halder, Alexandre Reiffers-Masson, Abdeldjalil Aïssa-El-Bey, Gugan Thoppe
-
Attack on a PUF-based Secure Binary Neural Network
Bijeet Basak, Nupur Patil, Kurian Polachan, Srinivas Vivek
-
Cybersecurity AI Benchmark (CAIBench): A Meta-Benchmark for Evaluating Cybersecurity AI Agents
María Sanz-Gómez, Víctor Mayoral-Vilches, Francesco Balassone, Luis Javier Navarrete-Lozano, Cristóbal R. J. Veas Chavez, Maite del Mundo de Torres
-
Yan Meng, Jiachun Li, Matthew Pillari, Arjun Deopujari, Liam Brennan, Hafsah Shamsie, Haojin Zhu, Yuan Tian
-
Hammering the Diagnosis: Rowhammer-Induced Stealthy Trojan Attacks on ViT-Based Medical Imaging
Banafsheh Saber Latibari, Najmeh Nazari, Hossein Sayadi, Houman Homayoun, Abhijit Mahalanobis
-
Najmeh Nazari, Banafsheh Saber Latibari, Elahe Hosseini, Fatemeh Movafagh, Chongzhou Fang, Hosein Mohammadi Makrani, Kevin Immanuel Gubbi, Abhijit Mahalanobis, Setareh Rafatirad, Hossein Sayadi, Houman Homayoun
-
Learning to Attack: Uncovering Privacy Risks in Sequential Data Releases
Ziyao Cui, Minxing Zhang, Jian Pei
-
scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration
Jianle Sun, Chaoqi Liang, Ran Wei, Peng Zheng, Lei Bai, Wanli Ouyang, Hongliang Yan, Peng Ye
-
Secure Retrieval-Augmented Generation against Poisoning Attacks
Zirui Cheng, Jikai Sun, Anjun Gao, Yueyang Quan, Zhuqing Liu, Xiaohua Hu, Minghong Fang
-
Multi-Environment POMDPs: Discrete Model Uncertainty Under Partial Observability
Eline M. Bovy, Caleb Probine, Marnix Suilen, Ufuk Topcu, Nils Jansen
-
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
Shrestha Datta, Shahriar Kabir Nahin, Anshuman Chhabra, Prasant Mohapatra
-
MCPGuard : Automatically Detecting Vulnerabilities in MCP Servers
Bin Wang, Zexin Liu, Hao Yu, Ao Yang, Yenan Huang, Jing Guo, Huangsheng Cheng, Hui Li, Huiyu Wu
-
QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents
Yuchong Xie, Zesen Liu, Mingyu Luo, Zhixiang Zhang, Kaikai Zhang, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She
-
Aryan Mathur, Asaduddin Ahmed, Pushti Amit Vasoya, Simeon Kandan Sonar, Yasir Z, Madesh Kuppusamy
-
Differential Privacy: Gradient Leakage Attacks in Federated Learning Environments
Miguel Fernandez-de-Retana, Unai Zulaika, Rubén Sánchez-Corcuera, Aitor Almeida
-
Cross-Lingual Summarization as a Black-Box Watermark Removal Attack
Gokul Ganesan
-
Hao Liang, Haifeng Wen, Kaishun Wu, Hong Xing
-
CompressionAttack: Exploiting Prompt Compression as a New Attack Surface in LLM-Powered Agents
Zesen Liu, Zhixiang Zhang, Yuchong Xie, Dongdong She
-
Retracing the Past: LLMs Emit Training Data When They Get Lost
Myeongseob Ko, Nikhil Reddy Billa, Adam Nguyen, Charles Fleming, Ming Jin, Ruoxi Jia
-
T2I-RiskyPrompt: A Benchmark for Safety Evaluation, Attack, and Defense on Text-to-Image Model
Chenyu Zhang, Tairen Zhang, Lanjun Wang, Ruidong Chen, Wenhui Li, Anan Liu
-
Hanyu Zhu, Lance Fiondella, Jiawei Yuan, Kai Zeng, Long Jiao
-
Yingzhi Mao, Chunkang Zhang, Junxiang Wang, Xinyan Guan, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun
-
Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference
Stephen Zhao, Aidan Li, Rob Brekelmans, Roger Grosse
-
SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation
Alec Helbling, Shruti Palaskar, Kundan Krishna, Polo Chau, Leon Gatys, Joseph Yitan Cheng
-
DictPFL: Efficient and Private Federated Learning on Encrypted Gradients
Jiaqi Xue, Mayank Kumar, Yuzhang Shang, Shangqian Gao, Rui Ning, Mengxin Zheng, Xiaoqian Jiang, Qian Lou
-
How Hard is it to Confuse a World Model?
Waris Radji, Odalric-Ambrym Maillard
-
PINN Balls: Scaling Second-Order Methods for PINNs with Domain Decomposition and Adaptive Sampling
Andrea Bonfanti, Ismael Medina, Roman List, Björn Staeves, Roberto Santana, Marco Ellero
-
Probe-based Fine-tuning for Reducing Toxicity
Jan Wehner, Mario Fritz
-
FrameShield: Adversarially Robust Video Anomaly Detection
Mojtaba Nafez, Mobina Poulaei, Nikan Vasei, Bardia Soltani Moakhar, Mohammad Sabokrou, MohammadHossein Rohban
-
Soft Instruction De-escalation Defense
Nils Philipp Walter, Chawin Sitawarin, Jamie Hayes, David Stutz, Ilia Shumailov
-
Doubly-Regressing Approach for Subgroup Fairness
Kyungseon Lee, Kunwoong Kim, Jihu Lee, Dongyoon Yang, Yongdai Kim
-
Jie Zhang, Xiaohong Li, Mengke Zhang, Ruitao Feng, Shanshan Xu, Zhe Hou, Guangdong Bai
-
Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency
Yukun Jiang, Mingjie Li, Michael Backes, Yang Zhang
-
The Trojan Example: Jailbreaking LLMs through Template Filling and Unsafety Reasoning
Mingrui Liu, Sixiao Zhang, Cheng Long, Kwok Yan Lam
-
Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses
Xingwei Zhong, Kar Wai Fok, Vrizlynn L.L. Thing
-
Spatio-Temporal Attention Network for Epileptic Seizure Prediction
Zan Li, Kyongmin Yeo, Wesley Gifford, Lara Marcuse, Madeline Fields, Bülent Yener
-
SAID: Empowering Large Language Models with Self-Activating Internal Defense
Yulong Chen, Yadong Liu, Jiawen Zhang, Mu Li, Chao Huang, Jie Wen
-
Wu Yichao, Wang Yirui, Ding Panpan, Wang Hailong, Zhu Bingqian, Liu Chun
-
Chiyu Chen, Xinhao Song, Yunkai Chai, Yang Yao, Haodong Zhao, Lijun Li, Jie Li, Yan Teng, Gongshen Liu, Yingchun Wang
-
Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models
Tomáš Souček, Sylvestre-Alvise Rebuffi, Pierre Fernandez, Nikola Jovanović, Hady Elsahar, Valeriu Lacatusu, Tuan Tran, Alexandre Mourachko
-
Steering Evaluation-Aware Language Models To Act Like They Are Deployed
Tim Tian Hua, Andrew Qin, Samuel Marks, Neel Nanda
-
AdaDoS: Adaptive DoS Attack via Deep Adversarial Reinforcement Learning in SDN
Wei Shao, Yuhao Wang, Rongguang He, Muhammad Ejaz Ahmed, Seyit Camtepe
-
RAGRank: Using PageRank to Counter Poisoning in CTI LLM Pipelines
Austin Jia, Avaneesh Ramesh, Zain Shamsi, Daniel Zhang, Alex Liu
-
Francesca Padovani, Bastian Bunzeck, Manar Ali, Omar Momen, Arianna Bisazza, Hendrik Buschmeier, Sina Zarrieß
-
BadGraph: A Backdoor Attack Against Latent Diffusion Model for Text-Guided Graph Generation
Liang Ye, Shengqin Chen, Jiazhu Dai
-
Causal Debiasing for Visual Commonsense Reasoning
Jiayi Zou, Gengyun Jia, Bing-Kun Bao
-
Dino-Diffusion Modular Designs Bridge the Cross-Domain Gap in Autonomous Parking
Zixuan Wu, Hengyuan Zhang, Ting-Hsuan Chen, Yuliang Guo, David Paz, Xinyu Huang, Liu Ren
-
MEIcoder: Decoding Visual Stimuli from Neural Activity by Leveraging Most Exciting Inputs
Jan Sobotka, Luca Baroni, Ján Antolík
-
H-SPLID: HSIC-based Saliency Preserving Latent Information Decomposition
Lukas Miklautz, Chengzhi Shi, Andrii Shkabrii, Theodoros Thirimachos Davarakis, Prudence Lam, Claudia Plant, Jennifer Dy, Stratis Ioannidis
-
Adversary-Aware Private Inference over Wireless Channels
Mohamed Seif, Malcolm Egan, Andrea J. Goldsmith, H. Vincent Poor
-
Divyanshu Kumar, Shreyas Jena, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi
-
HHEML: Hybrid Homomorphic Encryption for Privacy-Preserving Machine Learning on Edge
Yu Hin Chan, Hao Yang, Shiyu Shen, Xingyu Fan, Shengzhe Lyu, Patrick S. Y. Hung, Ray C. C. Cheung
-
NeuPerm: Disrupting Malware Hidden in Neural Network Parameters by Leveraging Permutation Symmetry
Daniel Gilkarov, Ran Dubin
-
An Experimental Study of Trojan Vulnerabilities in UAV Autonomous Landing
Reza Ahmari, Ahmad Mohammadi, Vahid Hemmati, Mohammed Mynuddin, Mahmoud Nabil Mahmoud, Parham Kebria, Abdollah Homaifar, Mehrdad Saif
-
Fair Representation Learning with Controllable High Confidence Guarantees via Adversarial Inference
Yuhong Luo, Austin Hoag, Xintong Wang, Philip S. Thomas, Przemyslaw A. Grabowicz
-
Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization
Antônio H. Ribeiro, David Vävinggren, Dave Zachariah, Thomas B. Schön, Francis Bach
-
Can Current Detectors Catch Face-to-Voice Deepfake Attacks?
Nguyen Linh Bao Nguyen, Alsharif Abuadbba, Kristen Moore, Tingming Wu
-
A new measure for dynamic leakage based on quantitative information flow
Luigi D. C. Soares, Mário S. Alvim, Natasha Fernandes
-
A Reinforcement Learning Framework for Robust and Secure LLM Watermarking
Li An, Yujian Liu, Yepeng Liu, Yuheng Bu, Yang Zhang, Shiyu Chang
-
Adversarially-Aware Architecture Design for Robust Medical AI Systems
Alyssa Gerhart, Balaji Iyangar
-
Brent Winslow, Jacqueline Shreibati, Javier Perez, Hao-Wei Su, Nichole Young-Lin, Nova Hammerquist, Daniel McDuff, Jason Guss, Jenny Vafeiadou, Nick Cain, Alex Lin, Erik Schenck, Shiva Rajagopal, Jia-Ru Chung, Anusha Venkatakrishnan, Amy Armento Lee, Maryam Karimzadehgan, Qingyou Meng, Rythm Agarwal, Aravind Natarajan, Tracy Giest
-
LAPRAD: LLM-Assisted PRotocol Attack Discovery
R.Can Aygun, Yehuda Afek, Anat Bremler-Barr, Leonard Kleinrock
-
Collaborative penetration testing suite for emerging generative AI algorithms
Petar Radanliev
-
A New Type of Adversarial Examples
Xingyang Nie, Guojie Xiao, Su Pan, Biao Wang, Huilin Ge, Tao Fang
-
Monitoring LLM-based Multi-Agent Systems Against Corruptions via Node Evaluation
Chengcan Wu, Zhixin Zhang, Mingqian Xu, Zeming Wei, Meng Sun
-
Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent
Yangshijie Zhang, Xinda Wang, Jialin Liu, Wenqiang Wang, Zhicong Ma, Xingxing Jia
-
Machine Text Detectors are Membership Inference Attacks
Ryuto Koike, Liam Dugan, Masahiro Kaneko, Chris Callison-Burch, Naoaki Okazaki
-
Hubble: a Model Suite to Advance the Study of LLM Memorization
Johnny Tian-Zheng Wei, Ameya Godbole, Mohammad Aflah Khan, Ryan Wang, Xiaoyuan Zhu, James Flemings, Nitya Kashyap, Krishna P. Gummadi, Willie Neiswanger, Robin Jia
-
OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform
Thomas Wang, Haowen Li
-
LLM Unlearning with LLM Beliefs
Kemou Li, Qizhou Wang, Yue Wang, Fengpeng Li, Jun Liu, Bo Han, Jiantao Zhou
-
Blackbox Model Provenance via Palimpsestic Membership Inference
Rohith Kuditipudi, Jing Huang, Sally Zhu, Diyi Yang, Christopher Potts, Percy Liang
-
Woo Jae Kim, Kyu Beom Han, Yoonki Cho, Youngju Na, Junsik Jung, Sooel Son, Sung-eui Yoon
-
Can You Trust What You See? Alpha Channel No-Box Attacks on Video Object Detection
Ariana Yi, Ce Zhou, Liyang Xiao, Qiben Yan
-
Subliminal Corruption: Mechanisms, Thresholds, and Interpretability
Reya Vir, Sarvesh Bhatnagar
-
ConvXformer: Differentially Private Hybrid ConvNeXt-Transformer for Inertial Navigation
Omer Tariq, Muhammad Bilal, Muneeb Ul Hassan, Dongsoo Han, Jon Crowcroft
-
Revisiting the Relation Between Robustness and Universality
M. Klabunde, L. Caspari, F. Lemmerich
-
Euodia Dodd, Nataša Krčo, Igor Shilov, Yves-Alexandre de Montjoye
-
HAMLOCK: HArdware-Model LOgically Combined attacK
Sanskar Amgain, Daniel Lobo, Atri Chatterjee, Swarup Bhunia, Fnu Suya
-
Exploring the Effect of DNN Depth on Adversarial Attacks in Network Intrusion Detection Systems
Mohamed ElShehaby, Ashraf Matrawy
-
Defending Against Prompt Injection with DataFilter
Yizhu Wang, Sizhe Chen, Raghad Alkhudair, Basel Alomair, David Wagner
-
AegisMCP: Online Graph Intrusion Detection for Tool-Augmented LLMs on Edge Devices
Zhonghao Zhan, Amir Al Sadi, Krinos Li, Hamed Haddadi
-
Privacy-Preserving Spiking Neural Networks: A Deep Dive into Encryption Parameter Optimisation
Mahitha Pulivathi
-
CircuitGuard: Mitigating LLM Memorization in RTL Code Generation Against IP Leakage
Nowfel Mashnoor, Mohammad Akyash, Hadi Kamali, Kimia Azar
-
LLMs can hide text in other text of the same length.ipynb
Antonio Norelli, Michael Bronstein
-
Ask What Your Country Can Do For You: Towards a Public Red Teaming Model
Wm. Matthew Kennedy, Cigdem Patlak, Jayraj Dave, Blake Chambers, Aayush Dhanotiya, Darshini Ramiah, Reva Schwartz, Jack Hagen, Akash Kundu, Mouni Pendharkar, Liam Baisley, Theodora Skeadas, Rumman Chowdhury
-
Xiang Li, Buxin Su, Chendi Wang, Qi Long, Weijie J. Su
-
Towards Strong Certified Defense with Universal Asymmetric Randomization
Hanbin Hong, Ashish Kundu, Ali Payani, Binghui Wang, Yuan Hong
-
Tushar Nayan, Ziqi Zhang, Ruimin Sun
-
Jia Deng, Jin Li, Zhenhua Zhao, Shaowei Wang
-
Rectifying Shortcut Behaviors in Preference-based Reward Learning
Wenqian Ye, Guangtao Zheng, Aidong Zhang
-
DuoLens: A Framework for Robust Detection of Machine-Generated Multilingual Text and Code
Shriyansh Agrawal, Aidan Lau, Sanyam Shah, Ahan M R, Kevin Zhu, Sunishchal Dev, Vasu Sharma
-
FeatureFool: Zero-Query Fooling of Video Models via Feature Map
Duoxun Tang, Xi Xiao, Guangwu Hu, Kangkang Sun, Xiao Yang, Dongyang Chen, Qing Li, Yongjie Yin, Jiyao Wang
-
Yifei Sun
-
Kuai Yu, Xiaoyu Wu, Peishen Yan, Qingqian Yang, Linshan Jiang, Hao Wang, Yang Hua, Tao Song, Haibing Guan
-
Thomas Hofweber, Jefrey Bergl, Ian Reyes, Amir Sadovnik
-
Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability
Artur Zolkowski, Wen Xing, David Lindner, Florian Tramèr, Erik Jenner
-
Extracting alignment data in open models
Federico Barbero, Xiangming Gu, Christopher A. Choquette-Choo, Chawin Sitawarin, Matthew Jagielski, Itay Yona, Petar Veličković, Ilia Shumailov, Jamie Hayes
-
PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits
Neeladri Bhuiya, Madhav Aggarwal, Diptanshu Purwar
-
CourtGuard: A Local, Multiagent Prompt Injection Classifier
Isaac Wu, Michael Maslowski
-
GUIDE: Enhancing Gradient Inversion Attacks in Federated Learning with Denoising Models
Vincenzo Carletti, Pasquale Foggia, Carlo Mazzocca, Giuseppe Parrella, Mario Vento
-
Structured Debate Improves Corporate Credit Reasoning in Financial AI
Yoonjin Lee, Munhee Kim, Hanbi Choi, Juhyeon Park, Seungho Lyoo, Woojin Park
-
SafeSearch: Do Not Trade Safety for Utility in LLM Search Agents
Qiusi Zhan, Angeline Budiman-Chan, Abdelrahman Zayed, Xingzhi Guo, Daniel Kang, Joo-Kyung Kim
-
Chenxu Li, Zhicai Wang, Yuan Sheng, Xingyu Zhu, Yanbin Hao, Xiang Wang
-
BreakFun: Jailbreaking LLMs via Schema Exploitation
Amirkia Rafiei Oskooei, Mehmet S. Aktas
-
Fit for Purpose? Deepfake Detection in the Real World
Guangyu Lin, Li Lin, Christina P. Walker, Daniel S. Schiff, Shu Hu
-
Ryoto Miyamoto, Xin Fan, Fuyuko Kido, Tsuneo Matsumoto, Hayato Yamana
-
Colliding with Adversaries at ECML-PKDD 2025 Model Robustness Competition 1st Prize Solution
Dimitris Stefanopoulos, Andreas Voskou
-
When Intelligence Fails: An Empirical Study on Why LLMs Struggle with Password Cracking
Mohammad Abdul Rehman, Syed Imad Ali Shah, Abbas Anwar, Noor Islam, Hamid Khan
-
DRO-InstructZero: Distributionally Robust Prompt Optimization for Large Language Models
Yangyang Li
-
Ting Qiao, Xing Liu, Wenke Huang, Jianbin Li, Zhaoxin Fan, Yiming Li
-
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
Shuang Liang, Zhihao Xu, Jialing Tao, Hui Xue, Xiting Wang
-
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
Hanbin Hong, Shuya Feng, Nima Naderloui, Shenao Yan, Jingyu Zhang, Biying Liu, Ali Arastehfard, Heqing Huang, Yuan Hong
-
DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios
Yao Huang, Yitong Sun, Yichi Zhang, Ruochen Zhang, Yinpeng Dong, Xingxing Wei
-
Language Models are Injective and Hence Invertible
Giorgos Nikolaou, Tommaso Mencattini, Donato Crisostomi, Andrea Santilli, Yannis Panagakis, Emanuele Rodola'
-
Unmasking Facial DeepFakes: A Robust Multiview Detection Framework for Natural Images
Sami Belguesmia, Mohand Saïd Allili, Assia Hamadene
-
Stress-Aware Learning under KL Drift via Trust-Decayed Mirror Descent
Gabriel Nixon Raj
-
Yuyuan Feng, Bin Ma, Enyan Dai
-
Adversary-Free Counterfactual Prediction via Information-Regularized Representations
Shiqin Tang, Rong Feng, Shuxin Zhuang, Hongzong Li, Youzhi Zhang
-
Constrained Adversarial Perturbation
Virendra Nishad, Bhaskar Mukhoty, Hilal AlQuabeh, Sandeep K. Shukla, Sayak Ray Chowdhury
-
Blackwell's Approachability for Sequential Conformal Inference
Guillaume Principato, Gilles Stoltz
-
HarmRLVR: Weaponizing Verifiable Rewards for Harmful LLM Alignment
Yuexiao Liu, Lijun Li, Xingjun Wang, Jing Shao
-
Towards Proactive Defense Against Cyber Cognitive Attacks
Bonnie Rushing, Mac-Rufus Umeokolo, Shouhuai Xu
-
Bridging Symmetry and Robustness: On the Role of Equivariance in Enhancing Adversarial Robustness
Longwei Wang, Ifrat Ikhtear Uddin, KC Santosh, Chaowei Zhang, Xiao Qin, Yang Zhou
-
Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks
Trilok Padhi, Pinxian Lu, Abdulkadir Erol, Tanmay Sutar, Gauri Sharma, Mina Sonmez, Munmun De Choudhury, Ugur Kursuncu
-
Bingjie Zhang, Yibo Yang, Renzhe, Dandan Guo, Jindong Gu, Philip Torr, Bernard Ghanem
-
Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies
Mason Nakamura, Abhinav Kumar, Saaduddin Mahmud, Sahar Abdelnabi, Shlomo Zilberstein, Eugene Bagdasarian
-
Jingwen Gu, Yiting He, Zhishuai Liu, Pan Xu
-
TED++: Submanifold-Aware Backdoor Detection via Layerwise Tubular-Neighbourhood Screening
Nam Le, Leo Yu Zhang, Kewen Liao, Shirui Pan, Wei Luo
-
BinCtx: Multi-Modal Representation Learning for Robust Android App Behavior Detection
Zichen Liu, Shao Yang, Xusheng Xiao
-
Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers
Andrew Zhao, Reshmi Ghosh, Vitor Carvalho, Emily Lawton, Keegan Hines, Gao Huang, Jack W. Stokes
-
Stealthy Dual-Trigger Backdoors: Attacking Prompt Tuning in LM-Empowered Graph Foundation Models
Xiaoyu Xue, Yuni Lai, Chenxi Huang, Yulin Zhu, Gaolei Li, Xiaoge Zhang, Kai Zhou
-
Galaxy Morphology Classification with Counterfactual Explanation
Zhuo Cao, Lena Krieger, Hanno Scharr, Ira Assent
-
On the Ability of LLMs to Handle Character-Level Perturbations: How Well and How?
Anyun Zhuo, Xuefei Ning, Ningyuan Li, Yu Wang, Pinyan Lu
-
A Multi-domain Image Translative Diffusion StyleGAN for Iris Presentation Attack Detection
Shivangi Yadav, Arun Ross
-
Structured Universal Adversarial Attacks on Object Detection for Video Sequences
Sven Jacob, Weijia Shao, Gjergji Kasneci
-
Keima Abe, Hayato Muraki, Shuhei Tomoshige, Kenichi Oishi, Hitoshi Iyatomi
-
SteeringTTA: Guiding Diffusion Trajectories for Robust Test-Time-Adaptation
Jihyun Yu, Yoojin Oh, Wonho Bae, Mingyu Kim, Junhyug Noh
-
Backdoor Unlearning by Linear Task Decomposition
Amel Abdelraheem, Alessandro Favero, Gerome Bovet, Pascal Frossard
-
When Flatness Does (Not) Guarantee Adversarial Robustness
Nils Philipp Walter, Linara Adilova, Jilles Vreeken, Michael Kamp
-
Guillaume Rongier, Luk Peeters
-
Redundancy-Aware Test-Time Graph Out-of-Distribution Detection
Yue Hou, He Zhu, Ruomei Liu, Yingke Su, Junran Wu, Ke Xu
-
An Information Asymmetry Game for Trigger-based DNN Model Watermarking
Chaoyue Huang, Gejian Zhao, Hanzhou Wu, Zhihua Xia, Asad Malik
-
Fanchao Meng, Jiaping Gui, Yunbo Li, Yue Wu
-
Certifying optimal MEV strategies with Lean
Massimo Bartoletti, Riccardo Marchesin, Roberto Zunino
-
Lexo: Eliminating Stealthy Supply-Chain Attacks via LLM-Assisted Program Regeneration
Evangelos Lamprou, Julian Dai, Grigoris Ntousakis, Martin C. Rinard, Nikos Vasilakis
-
A Hard-Label Black-Box Evasion Attack against ML-based Malicious Traffic Detection Systems
Zixuan Liu, Yi Zhao, Zhuotao Liu, Qi Li, Chuanpu Fu, Guangmeng Zhou, Ke Xu
-
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
Mor Ventura, Michael Toker, Or Patashnik, Yonatan Belinkov, Roi Reichart
-
Active Honeypot Guardrail System: Probing and Confirming Multi-Turn LLM Jailbreaks
ChenYu Wu, Yi Wang, Yang Liao
-
Deyue Zhang, Dongdong Yang, Junjie Mu, Quancheng Zou, Zonghao Ying, Wenzhuo Xu, Zhao Liu, Xuan Wang, Xiangzheng Zhang
-
Targeted Attacks and Defenses for Distributed Federated Learning in Vehicular Networks
Utku Demir, Tugba Erpek, Yalin E. Sagduyu, Sastry Kompella, Mengran Xue
-
MAGPIE: A benchmark for Multi-AGent contextual PrIvacy Evaluation
Gurusha Juneja, Jayanth Naga Sai Pasupulati, Alon Albalak, Wenyue Hua, William Yang Wang
-
PoTS: Proof-of-Training-Steps for Backdoor Detection in Large Language Models
Issam Seddik, Sami Souihi, Mohamed Tamaazousti, Sara Tucci Piergiovanni
-
SMOTE and Mirrors: Exposing Privacy Leakage from Synthetic Minority Oversampling
Georgi Ganev, Reza Nazari, Rees Davison, Amir Dizche, Xinmin Wu, Ralph Abbey, Jorge Silva, Emiliano De Cristofaro
-
Aofan Liu, Shiyuan Song, Haoxuan Li, Cehao Yang, Yiyan Qi
-
SAJA: A State-Action Joint Attack Framework on Multi-Agent Deep Reinforcement Learning
Weiqi Guo, Guanjun Liu, Ziyuan Zhou
-
TRUSTVIS: A Multi-Dimensional Trustworthiness Evaluation Framework for Large Language Models
Ruoyu Sun, Da Song, Jiayang Song, Yuheng Huang, Lei Ma
-
Injection, Attack and Erasure: Revocable Backdoor Attacks via Machine Unlearning
Baogang Song, Dongdong Zhao, Jianwen Xiang, Qiben Xu, Zizhuo Yu
-
Personal Attribute Leakage in Federated Speech Models
Hamdan Al-Ali, Ali Reza Ghavamipour, Tommaso Caselli, Fatih Turkmen, Zeerak Talat, Hanan Aldarmaki
-
Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control
Shingo Ayabe, Hiroshi Kera, Kazuhiko Kawamoto
-
Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training
Yisen Wang, Yichuan Mo, Hongjun Wang, Junyi Li, Zhouchen Lin
-
In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers
Avihay Cohen
-
Ziqing Lu, Lifeng Lai, Weiyu Xu
-
SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs
Juan Ren, Mark Dras, Usman Naseem
-
Taming the Fragility of KV Cache Eviction in LLM Inference
Yuan Feng, Haoyu Guo, JunLin Lv, S. Kevin Zhou, Xike Xie
-
GAPS: A Clinically Grounded, Automated Benchmark for Evaluating AI Clinicians
Xiuyuan Chen, Tao Sun, Dexin Su, Ailing Yu, Junwei Liu, Zhe Chen, Gangzeng Jin, Xin Wang, Jingnan Liu, Hansong Xiao, Hualei Zhou, Dongjie Tao, Chunxiao Guo, Minghui Yang, Yuan Xia, Jing Zhao, Qianrui Fan, Yanyun Wang, Shuai Zhen, Kezhong Chen, Jun Wang, Zewen Sun, Heng Zhao, Tian Guan, Shaodong Wang, Geyun Chang, Jiaming Deng, Hongchengcheng Chen, Kexin Feng, Ruzhen Li, Jiayi Geng, Changtai Zhao, Jun Wang, Guihu Lin, Peihao Li, Liqi Liu, Peng Wei, Jian Wang, Jinjie Gu, Ping Wang, Fan Yang
-
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
Senyu Fei, Siyin Wang, Junhao Shi, Zihao Dai, Jikun Cai, Pengfang Qian, Li Ji, Xinzhe He, Shiduo Zhang, Zhaoye Fei, Jinlan Fu, Jingjing Gong, Xipeng Qiu
-
Model-agnostic Adversarial Attack and Defense for Vision-Language-Action Models
Haochuan Xu, Yun Sing Koh, Shuhuai Huang, Zirun Zhou, Di Wang, Jun Sakuma, Jingfeng Zhang
-
Akib Mohammed Khan, Bartosz Krawczyk
-
Risk-adaptive Activation Steering for Safe Multimodal Large Language Models
Jonghyun Park, Minhyuk Seo, Jonghyun Choi
-
Selective Adversarial Attacks on LLM Benchmarks
Ivan Dubrovsky, Anastasia Orlova, Illarion Iov, Nina Gubina, Irena Gureeva, Alexey Zaytsev
-
Robust Minimax Boosting with Performance Guarantees
Santiago Mazuelas, Veronica Alvarez
-
From base cases to backdoors: An Empirical Study of Unnatural Crypto-API Misuse
Victor Olaiya, Adwait Nadkarni
-
Tan Le, Van Le, Sachin Shetty
-
Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts
Li Bai, Qingqing Ye, Xinwei Zhang, Sen Zhang, Zi Liang, Jianliang Xu, Haibo Hu
-
Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers
Xin Zhao, Xiaojun Chen, Bingshan Liu, Haoyu Gao, Zhendong Zhao, Yilong Chen
-
Cyber-Resilient System Identification for Power Grid through Bayesian Integration
Shimiao Li, Guannan Qu, Bryan Hooi, Vyas Sekar, Soummya Kar, Larry Pileggi
-
Every Language Model Has a Forgery-Resistant Signature
Matthew Finlayson, Xiang Ren, Swabha Swayamdipta
-
Robust or Suggestible? Exploring Non-Clinical Induction in LLM Drug-Safety Decisions
Siying Liu, Shisheng Zhang, Indu Bala
-
NAPPure: Adversarial Purification for Robust Image Classification under Non-Additive Perturbations
Junjie Nan, Jianing Li, Wei Chen, Mingkun Zhang, Xueqi Cheng
-
Signature in Code Backdoor Detection, how far are we?
Quoc Hung Le, Thanh Le-Cong, Bach Le, Bowen Xu
-
PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features
Wei Zou, Yupei Liu, Yanting Wang, Ying Chen, Neil Gong, Jinyuan Jia
-
Wissam Salhab, Darine Ameyed, Hamid Mcheick, Fehmi Jaafar
-
SafeMT: Multi-turn Safety for Multimodal Language Models
Han Zhu, Juntao Dai, Jiaming Ji, Haoran Li, Chengkun Cai, Pengcheng Wen, Chi-Min Chan, Boyuan Chen, Yaodong Yang, Sirui Han, Yike Guo
-
PromptLocate: Localizing Prompt Injection Attacks
Yuqi Jia, Yupei Liu, Zedian Shao, Jinyuan Jia, Neil Gong
-
Shallow Robustness, Deep Vulnerabilities: Multi-Turn Evaluation of Medical LLMs
Blazej Manczak, Eric Lin, Francisco Eiras, James O' Neill, Vaikkunth Mugunthan
-
LLM-REVal: Can We Trust LLM Reviewers Yet?
Rui Li, Jia-Chen Gu, Po-Nien Kung, Heming Xia, Junfeng liu, Xiangwen Kong, Zhifang Sui, Nanyun Peng
-
Lang Gao, Xuhui Li, Chenxi Wang, Mingzhe Li, Wei Liu, Zirui Song, Jinghui Zhang, Rui Yan, Preslav Nakov, Xiuying Chen
-
StyleDecipher: Robust and Explainable Detection of LLM-Generated Texts with Stylistic Analysis
Siyuan Li, Aodu Wulianghai, Xi Lin, Guangyan Li, Xiang Chen, Jun Wu, Jianhua Li
-
Vision Language Models Map Logos to Text via Semantic Entanglement in the Visual Projector
Sifan Li, Hongkai Chen, Yujun Cai, Qingwen Ye, Liyang Chen, Junsong Yuan, Yiwei Wang
-
Content Anonymization for Privacy in Long-form Audio
Cristina Aggazzotti, Ashi Garg, Zexin Cai, Nicholas Andrews
-
ImageSentinel: Protecting Visual Datasets from Unauthorized Retrieval-Augmented Image Generation
Ziyuan Luo, Yangyi Zhao, Ka Chun Cheung, Simon See, Renjie Wan
-
MS-GAGA: Metric-Selective Guided Adversarial Generation Attack
Dion J. X. Ho, Gabriel Lee Jun Rong, Niharika Shrivastava, Harshavardhan Abichandani, Pai Chet Ng, Xiaoxiao Miao
-
Fairness-Constrained Optimization Attack in Federated Learning
Harsh Kasyap, Minghong Fang, Zhuqing Liu, Carsten Maple, Somanath Tripathy
-
Bowen Fan, Zhilin Guo, Xunkai Li, Yihan Zhou, Bing Zhou, Zhenjun Li, Rong-Hua Li, Guoren Wang
-
Keep Calm and Avoid Harmful Content: Concept Alignment and Latent Manipulation Towards Safer Answers
Ruben Belo, Claudia Soares, Marta Guimaraes
-
KoALA: KL-L0 Adversarial Detector via Label Agreement
Siqi Li, Yasser Shoukry
-
Sample-Efficient Omniprediction for Proper Losses
Isaac Gibbs, Ryan J. Tibshirani
-
Daniel Pulido-Cortázar, Daniel Gibert, Felip Manyà
-
Leaking Queries On Secure Stream Processing Systems
Hung Pham, Viet Vo, Tien Tuan Anh Dinh, Duc Tran, Shuhao Zhang
-
Ye Tian, Yanqiu Yu, Liangliang Song, Zhiquan Liu, Yanbin Wang, Jianguo Sun
-
Targeted Pooled Latent-Space Steganalysis Applied to Generative Steganography, with a Fix
Etienne Levecque, Aurélien Noirault, Tomáš Pevný, Jan Butora, Patrick Bas, Rémi Cogranne
-
Who's Asking? Evaluating LLM Robustness to Inquiry Personas in Factual Question Answering
Nil-Jana Akpinar, Chia-Jung Lee, Vanessa Murdock, Pietro Perona
-
João A. Leite, Arnav Arora, Silvia Gargova, João Luz, Gustavo Sampaio, Ian Roberts, Carolina Scarton, Kalina Bontcheva
-
Pruning Cannot Hurt Robustness: Certified Trade-offs in Reinforcement Learning
James Pedley, Benjamin Etheridge, Stephen J. Roberts, Francesco Quinzan
-
An Investigation of Memorization Risk in Healthcare Foundation Models
Sana Tonekaboni, Lena Stempfle, Adibvafa Fallahpour, Walter Gerych, Marzyeh Ghassemi
-
Reference-Specific Unlearning Metrics Can Hide the Truth: A Reality Check
Sungjun Cho, Dasol Hwang, Frederic Sala, Sangheum Hwang, Kyunghyun Cho, Sungmin Cha
-
Rithwik Gupta, Daniel Muthukrishna, Jeroen Audenaert
-
Local Differential Privacy for Federated Learning with Fixed Memory Usage and Per-Client Privacy
Rouzbeh Behnia, Jeremiah Birrell, Arman Riasi, Reza Ebrahimi, Kaushik Dutta, Thang Hoang
-
Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection
Olga E. Sorokoletova, Francesco Giarrusso, Vincenzo Suriani, Daniele Nardi
-
RAID: Refusal-Aware and Integrated Decoding for Jailbreaking LLMs
Tuan T. Nguyen, John Le, Thai T. Vu, Willy Susilo, Heath Cooper
-
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
Ruida Wang, Jiarui Yao, Rui Pan, Shizhe Diao, Tong Zhang
-
PHANTOM RECALL: When Familiar Puzzles Fool Smart Models
Souradeep Mukhopadhyay, Rishabh Baral, Nimeesh Mahajan, Samhitha Harish, Aswin RRV, Mihir Parmar, Mutsumi Nakamura, Chitta Baral
-
BlackIce: A Containerized Red Teaming Toolkit for AI Security Testing
Caelin Kaplan, Alexander Warnecke, Neil Archibald
-
Countermind: A Multi-Layered Security Architecture for Large Language Models
Dominik Schwarz
-
LLM Knowledge is Brittle: Truthfulness Representations Rely on Superficial Resemblance
Patrick Haller, Mark Ibrahim, Polina Kirichenko, Levent Sagun, Samuel J. Bell
-
Don't Walk the Line: Boundary Guidance for Filtered Generation
Sarah Ball, Andreas Haupt
-
Deep Research Brings Deeper Harm
Shuo Chen, Zonggen Li, Zhen Han, Bailan He, Tong Liu, Haokun Chen, Georg Groh, Philip Torr, Volker Tresp, Jindong Gu
-
Robust Adversarial Reinforcement Learning in Stochastic Games via Sequence Modeling
Xiaohang Tang, Zhuowen Cheng, Satyabrat Kumar
-
High-Probability Bounds For Heterogeneous Local Differential Privacy
Maryam Aliakbarpour, Alireza Fallah, Swaha Roy, Ria Stevens
-
Yuwen Cui, Guangjing Wang, Khanh Vu, Kai Wei, Kehan Shen, Zhengyuan Jiang, Xiao Han, Ning Wang, Zhuo Lu, Yao Liu
-
Deeksha Hareesha Kulal, Chidozie Princewill Arannonu, Afsah Anwar, Nidhi Rastogi, Quamar Niyaz
-
LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings
Ting Li, Yang Yang, Yipeng Yu, Liang Yao, Guoqing Chao, Ruifeng Xu
-
TabVLA: Targeted Backdoor Attacks on Vision-Language-Action Models
Zonghuan Xu, Xiang Zheng, Xingjun Ma, Yu-Gang Jiang
-
Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization
Zihan Wang, Zhiyong Ma, Zhongkui Ma, Shuofeng Liu, Akide Liu, Derui Wang, Minhui Xue, Guangdong Bai
-
DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation
Hyeseon Ahn, Shinwoo Park, Yo-Sub Han
-
PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System
Huayi Wang, Wentao Zhang, Runyi Yu, Tao Huang, Junli Ren, Feiyu Jia, Zirui Wang, Xiaojie Niu, Xiao Chen, Jiahe Chen, Qifeng Chen, Jingbo Wang, Jiangmiao Pang
-
RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation
Vasilije Stambolic, Aritra Dhar, Lukas Cavigelli
-
Attacks by Content: Automated Fact-checking is an AI Security Issue
Michael Schlichtkrull
-
Large Language Models Are Effective Code Watermarkers
Rui Xu, Jiawei Chen, Zhaoxia Yin, Cong Kong, Xinpeng Zhang
-
Multi-View Graph Feature Propagation for Privacy Preservation and Feature Sparsity
Etzion Harari, Moshe Unger
-
Early Detection and Reduction of Memorisation for Domain Adaptation and Instruction Tuning
Dean L. Slack, Noura Al Moubayed
-
Living Off the LLM: How LLMs Will Change Adversary Tactics
Sean Oesch, Jack Hutchins, Luke Koch, Kevin Kurian
-
PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities
Zicheng Liu, Lige Huang, Jie Zhang, Dongrui Liu, Yuan Tian, Jing Shao
-
Adversarial Attacks Leverage Interference Between Features in Superposition
Edward Stevinson, Lucas Prieto, Melih Barsbey, Tolga Birdal
-
Information-Preserving Reformulation of Reasoning Traces for Antidistillation
Jiayu Ding, Lei Cui, Li Dong, Nanning Zheng, Furu Wei
-
LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings
Ting Li, Yang Yang, Yipeng Yu, Liang Yao, Guoqing Chao, Ruifeng Xu
-
Bag of Tricks for Subverting Reasoning-based Safety Guardrails
Shuo Chen, Zhen Han, Haokun Chen, Bailan He, Shengyun Si, Jingpei Wu, Philip Torr, Volker Tresp, Jindong Gu
-
ROFI: A Deep Learning-Based Ophthalmic Sign-Preserving and Reversible Patient Face Anonymizer
Yuan Tian, Min Zhou, Yitong Chen, Fang Li, Lingzi Qi, Shuo Wang, Xieyang Xu, Yu Yu, Shiqiong Xu, Chaoyu Lei, Yankai Jiang, Rongzhao Zhang, Jia Tan, Li Wu, Hong Chen, Xiaowei Liu, Wei Lu, Lin Li, Huifang Zhou, Xuefei Song, Guangtao Zhai, Xianqun Fan
-
CoDefend: Cross-Modal Collaborative Defense via Diffusion Purification and Prompt Optimization
Fengling Zhu, Boshi Liu, Jingyu Hua, Sheng Zhong
-
Exploring and Leveraging Class Vectors for Classifier Editing
Jaeik Kim, Jaeyoung Do
-
The Easy Path to Robustness: Coreset Selection using Sample Hardness
Pranav Ramesh, Arjun Roy, Deepak Ravikumar, Kaushik Roy, Gopalakrishnan Srinivasan
-
Quantifying Information Disclosure During Gradient Descent Using Gradient Uniqueness
Mahmoud Abdelghafar, Maryam Aliakbarpour, Chris Jermaine
-
Qizhou Peng, Yang Zheng, Yu Wen, Yanna Wu, Yingying Du
-
Adversarial Robustness in One-Stage Learning-to-Defer
Yannis Montreuil, Letian Yu, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi
-
CoSPED: Consistent Soft Prompt Targeted Data Extraction and Defense
Yang Zhuochen, Fok Kar Wai, Thing Vrizlynn
-
TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection
Jiahao Liu, Bonan Ruan, Xianglin Yang, Zhiwei Lin, Yan Liu, Yang Wang, Tao Wei, Zhenkai Liang
-
Collaborative Shadows: Distributed Backdoor Attacks in LLM-Based Multi-Agent Systems
Pengyu Zhu, Lijun Li, Yaxing Lyu, Li Sun, Sen Su, Jing Shao
-
Joint Discriminative-Generative Modeling via Dual Adversarial Training
Xuwang Yin, Claire Zhang, Julie Steele, Nir Shavit, Tony T. Wang
-
Exploring and Leveraging Class Vectors for Classifier Editing
Jaeik Kim, Jaeyoung Do
-
Bag of Tricks for Subverting Reasoning-based Safety Guardrails
Shuo Chen, Zhen Han, Haokun Chen, Bailan He, Shengyun Si, Jingpei Wu, Philip Torr, Volker Tresp, Jindong Gu
-
Generative AI for Biosciences: Emerging Threats and Roadmap to Biosecurity
Zaixi Zhang, Souradip Chakraborty, Amrit Singh Bedi, Emilin Mathew, Varsha Saravanan, Le Cong, Alvaro Velasquez, Sheng Lin-Gibson, Megan Blewett, Dan Hendrycs, Alex John London, Ellen Zhong, Ben Raphael, Adji Bousso Dieng, Jian Ma, Eric Xing, Russ Altman, George Church, Mengdi Wang
-
The Irrational Machine: Neurosis and the Limits of Algorithmic Safety
Daniel Howard
-
SASER: Stego attacks on open-source LLMs
Ming Tan, Wei Li, Hu Tao, Hailong Ma, Aodi Liu, Qian Chen, Zilong Wang
-
f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness
Subhodip Panda, Dhruv Tarsadiya, Shashwat Sourav, Prathosh A.P, Sai Praneeth Karimireddy
-
From Detection to Mitigation: Addressing Bias in Deep Learning Models for Chest X-Ray Diagnosis
Clemence Mottez, Louisa Fay, Maya Varma, Sophie Ostmeier, Curtis Langlotz
-
Merlin's Whisper: Enabling Efficient Reasoning in LLMs via Black-box Adversarial Prompting
Heming Xia, Cunxiao Du, Rui Li, Chak Tou Leong, Yongqi Li, Wenjie Li
-
DUAL-Bench: Measuring Over-Refusal and Robustness in Vision-Language Models
Kaixuan Ren, Preslav Nakov, Usman Naseem
-
ImpMIA: Leveraging Implicit Bias for Membership Inference Attack under Realistic Scenarios
Yuval Golbari, Navve Wasserman, Gal Vardi, Michal Irani
-
Weiming Zhao, Xulong Wang, Jun Qi, Yun Yang, Po Yang
-
Meng Xi, Sihan Lv, Yechen Jin, Guanjie Cheng, Naibo Wang, Ying Li, Jianwei Yin
-
The Achilles' Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities
Zixuan Qin, Kunlin Lyu, Qingchen Yu, Yifan Sun, Zhaoxin Fan
-
Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning
Guozhi Liu, Qi Mu, Tiansheng Huang, Xinhua Wang, Li Shen, Weiwei Lin, Zhang Li
-
MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation
Wentian Zhu, Zhen Xiang, Wei Niu, Le Guan
-
ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test
Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh
-
Path Drift in Large Reasoning Models:How First-Person Commitments Override Safety
Yuyi Huang, Runzhe Zhan, Lidia S.Chao, Ailin Tao, Derek F.Wong
-
A-IPO: Adaptive Intent-driven Preference Optimization
Wenqing Wang, Muhammad Asif Ali, Ali Shoker, Ruohan Yang, Junyang Chen, Ying Sha, Huan Wang
-
Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models
Liang Lin, Miao Yu, Moayad Aloqaily, Zhenhong Zhou, Kun Wang, Linsey Pang, Prakhar Mehrotra, Qingsong Wen
-
SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents
Zonghao Ying, Yangguang Shao, Jianle Gan, Gan Xu, Junjie Shen, Wenxin Zhang, Quanchen Zou, Junzheng Shi, Zhenfei Yin, Mingchuan Zhang, Aishan Liu, Xianglong Liu
-
Tight Robustness Certificates and Wasserstein Distributional Attacks for Deep Neural Networks
Bach C. Le, Tung V. Dao, Binh T. Nguyen, Hong T.M. Chu
-
Yue Deng, Francisco Santos, Pang-Ning Tan, Lifeng Luo
-
An information theorist's tour of differential privacy
Anand D. Sarwate, Flavio P. Calmon, Oliver Kosut, Lalitha Sankar
-
Scheming Ability in LLM-to-LLM Strategic Interactions
Thao Pham
-
ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking
Yutao Wu, Xiao Liu, Yinghui Li, Yifeng Gao, Yifan Ding, Jiale Ding, Xiang Zheng, Xingjun Ma
-
Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning
Shangzhe Li, Dongruo Zhou, Weitong Zhang
-
RO-Bench: Large-scale robustness evaluation of MLLMs with text-driven counterfactual videos
Zixi Yang, Jiapeng Li, Muxi Diao, Yinuo Jing, Kongming Liang
-
On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models
Hoigi Seo, Dong Un Kang, Hyunjin Cho, Joohoon Lee, Se Young Chun
-
Junchao Fan, Xiaolin Chang
-
MemLoss: Enhancing Adversarial Training with Recycling Adversarial Examples
Soroush Mahdi, Maryam Amirmazlaghani, Saeed Saravani, Zahra Dehghanian
-
Zhi Yang, Changwu Huang, Ke Tang, Xin Yao
-
On the Implicit Adversariality of Catastrophic Forgetting in Deep Continual Learning
Ze Peng, Jian Zhang, Jintao Guo, Lei Qi, Yang Gao, Yinghuan Shi
-
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
Mikhail Terekhov, Alexander Panfilov, Daniil Dzenhaliou, Caglar Gulcehre, Maksym Andriushchenko, Ameya Prabhu, Jonas Geiping
-
SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG
Xiaonan Si, Meilin Zhu, Simeng Qin, Lijia Yu, Lijun Zhang, Shuaitong Liu, Xinfeng Li, Ranjie Duan, Yang Liu, Xiaojun Jia
-
Group-Adaptive Adversarial Learning for Robust Fake News Detection Against Malicious Comments
Zhao Tong, Chunlin Gong, Yimeng Gu, Haichao Shi, Qiang Liu, Shu Wu, Xiao-Yu Zhang
-
All Code, No Thought: Current Language Models Struggle to Reason in Ciphered Language
Shiyuan Guo, Henry Sleight, Fabien Roger
-
Text Prompt Injection of Vision Language Models
Ruizhe Zhu
-
CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs
Nafiseh Nikeghbal, Amir Hossein Kargaran, Jana Diesner
-
Exploration of Incremental Synthetic Non-Morphed Images for Single Morphing Attack Detection
David Benavente-Rios, Juan Ruiz Rodriguez, Gustavo Gatica
-
Robustness and Regularization in Hierarchical Re-Basin
Benedikt Franke, Florian Heinrich, Markus Lange, Arne Raulf
-
SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG
Xiaonan Si, Meilin Zhu, Simeng Qin, Lijia Yu, Lijun Zhang, Shuaitong Liu, Xinfeng Li, Ranjie Duan, Yang Liu, Xiaojun Jia
-
Yuki Nii, Futa Waseda, Ching-Chun Chang, Isao Echizen
-
All Code, No Thought: Current Language Models Struggle to Reason in Ciphered Language
Shiyuan Guo, Henry Sleight, Fabien Roger
-
Zhi Yang, Changwu Huang, Ke Tang, Xin Yao
-
SAJD: Self-Adaptive Jamming Attack Detection in AI/ML Integrated 5G O-RAN Networks
Md Habibur Rahman, Md Sharif Hossen, Nathan H. Stephenson, Vijay K. Shah, Aloizio Da Silva
-
Safety Game: Balancing Safe and Informative Conversations with Blackbox Agentic AI using LP Solvers
Tuan Nguyen, Long Tran-Thanh
-
Profit Mirage: Revisiting Information Leakage in LLM-based Financial Agents
Xiangyu Li, Yawen Zeng, Xiaofen Xing, Jin Xu, Xiangmin Xu
-
VoiceAgentBench: Are Voice Assistants ready for agentic tasks?
Dhruv Jain, Harshit Shukla, Gautam Rajeev, Ashish Kulkarni, Chandra Khatri, Shubham Agarwal
-
Chain-of-Trigger: An Agentic Backdoor that Paradoxically Enhances Agentic Robustness
Jiyang Qiu, Xinbei Ma, Yunqing Xu, Zhuosheng Zhang, Hai Zhao
-
Rethinking Reasoning: A Survey on Reasoning-based Backdoors in LLMs
Man Hu, Xinyi Wu, Zuofeng Suo, Jinbo Feng, Linghui Meng, Yanhao Jia, Anh Tuan Luu, Shuai Zhao
-
Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents
Renhua Ding, Xiao Yang, Zhengwei Fang, Jun Luo, Kun He, Jun Zhu
-
MetaDefense: Defending Finetuning-based Jailbreak Attack Before and During Generation
Weisen Jiang, Sinno Jialin Pan
-
Fewer Weights, More Problems: A Practical Attack on LLM Pruning
Kazuki Egashira, Robin Staab, Thibaud Gloaguen, Mark Vero, Martin Vechev
-
Backdoor Vectors: a Task Arithmetic View on Backdoor Attacks and Defenses
Stanisław Pawlak, Jan Dubiński, Daniel Marczak, Bartłomiej Twardowski
-
Interpreting LLM-as-a-Judge Policies via Verifiable Global Explanations
Jasmina Gajcin, Erik Miehling, Rahul Nair, Elizabeth Daly, Radu Marinescu, Seshu Tirupathi
-
XuHao Hu, Peng Wang, Xiaoya Lu, Dongrui Liu, Xuanjing Huang, Jing Shao
-
Provably Robust Adaptation for Language-Empowered Foundation Models
Yuni Lai, Xiaoyu Xue, Linghui Shen, Yulun Wu, Gaolei Li, Song Guo, Kai Zhou, Bin Xiao
-
SAFER-AiD: Saccade-Assisted Foveal-peripheral vision Enhanced Reconstruction for Adversarial Defense
Jiayang Liu, Daniel Tso, Yiming Bu, Qinru Qiu
-
Deceptive Exploration in Multi-armed Bandits
I. Arda Vurankaya, Mustafa O. Karabag, Wesley A. Suttle, Jesse Milzman, David Fridovich-Keil, Ufuk Topcu
-
CommandSans: Securing AI Agents with Surgical Precision Prompt Sanitization
Debeshee Das, Luca Beurer-Kellner, Marc Fischer, Maximilian Baader
-
Ragib Amin Nihal, Rui Wen, Kazuhiro Nakadai, Jun Sakuma
-
Haoran Ou, Kangjie Chen, Xingshuo Han, Gelei Deng, Jie Zhang, Han Qiu, Tianwei Zhang
-
VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands
Aofan Liu, Lulu Tang
-
The Framework That Survives Bad Models: Human-AI Collaboration For Clinical Trials
Yao Chen, David Ohlssen, Aimee Readie, Gregory Ligozio, Ruvie Martin, Thibaud Coroller
-
Are LLMs Reliable Rankers? Rank Manipulation via Two-Stage Token Optimization
Tiancheng Xing, Jerry Li, Yixuan Du, Xiyang Hu
-
Ke Guo, Haochen Liu, Xiaojun Wu, Chen Lv
-
A Multi-Agent Framework for Stateful Inference-Time Search
Arshika Lalan, Rajat Ghosh, Aditya Kolsur, Debojyoti Dutta
-
Cocoon: A System Architecture for Differentially Private Training with Correlated Noises
Donghwan Kim, Xin Gu, Jinho Baek, Timothy Lo, Younghoon Min, Kwangsik Shin, Jongryool Kim, Jongse Park, Kiwan Maeng
-
Do Internal Layers of LLMs Reveal Patterns for Jailbreak Detection?
Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis
-
AWM: Accurate Weight-Matrix Fingerprint for Large Language Models
Boyi Zeng, Lin Chen, Ziwei He, Xinbing Wang, Zhouhan Lin
-
Quantifying Data Contamination in Psychometric Evaluations of LLMs
Jongwook Han, Woojung Song, Jonggeun Lee, Yohan Jo
-
Red-Bandit: Test-Time Adaptation for LLM Red-Teaming via Bandit-Guided LoRA Experts
Christos Ziakas, Nicholas Loo, Nishita Jain, Alessandra Russo
-
XLSR-Kanformer: A KAN-Intergrated model for Synthetic Speech Detection
Phuong Tuan Dat, Tran Huy Dat
-
Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)
Junki Mori, Kazuya Kakizaki, Taiki Miyagawa, Jun Sakuma
-
Exposing Citation Vulnerabilities in Generative Engines
Riku Mochizuki, Shusuke Komatsu, Souta Noguchi, Kazuto Ataka
-
RedTWIZ: Diverse LLM Red Teaming via Adaptive Attack Planning
Artur Horal, Daniel Pina, Henrique Paz, Iago Paulo, João Soares, Rafael Ferreira, Diogo Tavares, Diogo Glória-Silva, João Magalhães, David Semedo
-
StyleKeeper: Prevent Content Leakage using Negative Visual Query Guidance
Jaeseok Jeong, Junho Kim, Gayoung Lee, Yunjey Choi, Youngjung Uh
-
Label-frugal satellite image change detection with generative virtual exemplar learning
Hichem Sahbi
-
OBJVanish: Physically Realizable Text-to-3D Adv. Generation of LiDAR-Invisible Objects
Bing Li, Wuqi Wang, Yanan Zhang, Jingzheng Li, Haigen Min, Wei Feng, Xingyu Zhao, Jie Zhang, Qing Guo
-
SpecGuard: Spectral Projection-based Advanced Invisible Watermarking
Inzamamul Alam, Md Tanvir Islam, Khan Muhammad, Simon S. Woo
-
Unsupervised Backdoor Detection and Mitigation for Spiking Neural Networks
Jiachen Li, Bang Wu, Xiaoyu Xia, Xiaoning Liu, Xun Yi, Xiuzhen Zhang
-
SaFeR-VLM: Toward Safety-aware Fine-grained Reasoning in Multimodal Models
Huahui Yi, Kun Wang, Qiankun Li, Miao Yu, Liang Lin, Gongli Xi, Hao Wu, Xuming Hu, Kang Li, Yang Liu
-
Revisiting Mixout: An Overlooked Path to Robust Finetuning
Masih Aminbeidokhti, Heitor Rapela Medeiros, Eric Granger, Marco Pedersoli
-
Is the Hard-Label Cryptanalytic Model Extraction Really Polynomial?
Akira Ito, Takayuki Miura, Yosuke Todo
-
Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness
Tavish McDonald, Bo Lei, Stanislav Fort, Bhavya Kailkhura, Brian Bartoldson
-
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples
Alexandra Souly, Javier Rando, Ed Chapman, Xander Davies, Burak Hasircioglu, Ezzeldin Shereen, Carlos Mougan, Vasilios Mavroudis, Erik Jones, Chris Hicks, Nicholas Carlini, Yarin Gal, Robert Kirk
-
Falsification-Driven Reinforcement Learning for Maritime Motion Planning
Marlon Müller, Florian Finkeldei, Hanna Krasowski, Murat Arcak, Matthias Althoff
-
Code Agent can be an End-to-end System Hacker: Benchmarking Real-world Threats of Computer-use Agent
Weidi Luo, Qiming Zhang, Tianyu Lu, Xiaogeng Liu, Bin Hu, Hung-Chun Chiu, Siyuan Ma, Yizhe Zhang, Xusheng Xiao, Yinzhi Cao, Zhen Xiang, Chaowei Xiao
-
Yixiang Zhang, Xinhao Deng, Zhongyi Gu, Yihao Chen, Ke Xu, Qi Li, Jianping Wu
-
Yuhua Xu, Wei Sun, Chengpei Tang, Jiaxing Lu, Jingying Zhou, Chen Gu
-
Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race
Xutao Mao, Ke Li, Cameron Baird, Ezra Xuanru Tao, Dan Lin
-
Muhammad Usman, Yugyung Lee
-
PEAR: Planner-Executor Agent Robustness Benchmark
Shen Dong, Mingxuan Zhang, Pengfei He, Li Ma, Bhavani Thuraisingham, Hui Liu, Yue Xing
-
A2AS: Agentic AI Runtime Security and Self-Defense
Eugene Neelou, Ivan Novikov, Max Moroz, Om Narayan, Tiffany Saade, Mika Ayenson, Ilya Kabanov, Jen Ozmen, Edward Lee, Vineeth Sai Narajala, Emmanuel Guilherme Junior, Ken Huang, Huseyin Gulsin, Jason Ross, Marat Vyshegorodtsev, Adelin Travers, Idan Habler, Rahul Jadav
-
Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race
Xutao Mao, Ke Li, Cameron Baird, Ezra Xuanru Tao, Dan Lin
-
RGBD Gaze Tracking Using Transformer for Feature Fusion
Tobias J. Bauer
-
Protecting De-identified Documents from Search-based Linkage Attacks
Pierre Lison, Mark Anderson
-
Geometry-Aware Backdoor Attacks: Leveraging Curvature in Hyperbolic Embeddings
Ali Baheri
-
A Survey on Agentic Security: Applications, Threats and Defenses
Asif Shahriar, Md Nafiu Rahman, Sadif Ahmed, Farig Sadeque, Md Rizwan Parvez
-
Text-to-Image Models Leave Identifiable Signatures: Implications for Leaderboard Security
Ali Naseh, Anshuman Suri, Yuefeng Peng, Harsh Chaudhari, Alina Oprea, Amir Houmansadr
-
Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning
Yuxuan Bai, Gauri Pradhan, Marlon Tobaben, Antti Honkela
-
Automated Repeatable Adversary Threat Emulation with Effects Language (EL)
Suresh K. Damodaran, Paul D. Rowe
-
Vipul Goyal, Justin Raizes
-
LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback
Raffaele Mura, Giorgio Piras, Kamilė Lukošiūtė, Maura Pintor, Amin Karbasi, Battista Biggio
-
Toward a Safer Web: Multilingual Multi-Agent LLMs for Mitigating Adversarial Misinformation Attacks
Nouar Aldahoul, Yasir Zaki
-
Adversarial-Resilient RF Fingerprinting: A CNN-GAN Framework for Rogue Transmitter Detection
Raju Dhakal, Prashant Shekhar, Laxima Niure Kandel
-
RareAgent: Self-Evolving Reasoning for Drug Repurposing in Rare Diseases
Lang Qin, Zijian Gan, Xu Cao, Pengcheng Jiang, Yankai Jiang, Jiawei Han, Kaishun Wu, Jintai Chen
-
The Role of Federated Learning in Improving Financial Security: A Survey
Cade Houston Kennedy, Amr Hilal, Morteza Momeni
-
LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback
Raffaele Mura, Giorgio Piras, Kamilė Lukošiūtė, Maura Pintor, Amin Karbasi, Battista Biggio
-
Permutation-Invariant Representation Learning for Robust and Privacy-Preserving Feature Selection
Rui Liu, Tao Zhe, Yanjie Fu, Feng Xia, Ted Senator, Dongjie Wang
-
Khartik Uppalapati, Shakeel Abdulkareem, Bora Yimenicioglu
-
A Calibration-Free Fixed Point of Curved Boolean Logic Matching the Fine-Structure Constant
Maximilian R. P. von Liechtenstein
-
AutoDAN-Reasoning: Enhancing Strategies Exploration based Jailbreak Attacks with Test-Time Scaling
Xiaogeng Liu, Chaowei Xiao
-
Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment
Nevan Wichers, Aram Ebtekar, Ariana Azarbal, Victor Gillioz, Christine Ye, Emil Ryd, Neil Rathi, Henry Sleight, Alex Mallen, Fabien Roger, Samuel Marks
-
Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment
Nevan Wichers, Aram Ebtekar, Ariana Azarbal, Victor Gillioz, Christine Ye, Emil Ryd, Neil Rathi, Henry Sleight, Alex Mallen, Fabien Roger, Samuel Marks
-
SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models
Peigui Qi, Kunsheng Tang, Wenbo Zhou, Weiming Zhang, Nenghai Yu, Tianwei Zhang, Qing Guo, Jie Zhang
-
Rounding-Guided Backdoor Injection in Deep Learning Model Quantization
Xiangxiang Chen, Peixin Zhang, Jun Sun, Wenhai Wang, Jingyi Wang
-
SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models
Peigui Qi, Kunsheng Tang, Wenbo Zhou, Weiming Zhang, Nenghai Yu, Tianwei Zhang, Qing Guo, Jie Zhang
-
Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time
Daniel Tan, Anders Woodruff, Niels Warncke, Arun Jose, Maxime Riché, David Demitri Africa, Mia Taylor
-
Agentic Misalignment: How LLMs Could Be Insider Threats
Aengus Lynch, Benjamin Wright, Caleb Larson, Stuart J. Ritchie, Soren Mindermann, Evan Hubinger, Ethan Perez, Kevin Troy
-
SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
Buyun Liang, Liangzu Peng, Jinqi Luo, Darshan Thaker, Kwan Ho Ryan Chan, René Vidal
-
Shahriar Kabir Nahin, Hadi Askari, Muhao Chen, Anshuman Chhabra
-
Detecting Malicious Pilot Contamination in Multiuser Massive MIMO Using Decision Trees
Pedro Ivo da Cruz, Dimitri Silva, Tito Spadini, Ricardo Suyama, Murilo Bellezoni Loiola
-
LoRA Patching: Exposing the Fragility of Proactive Defenses against Deepfakes
Zuomin Qu, Yimao Guo, Qianyue Hu, Wei Lu
-
Adversarial Agent Collaboration for C to Rust Translation
Tianyu Li, Ruishi Li, Bo Wang, Brandon Paulsen, Umang Mathur, Prateek Saxena
-
Uncertainty Quantification In Surface Landmines and UXO Classification Using MC Dropout
Sagar Lekhak, Emmett J. Ientilucci, Dimah Dera, Susmita Ghosh
-
Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain
Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru, Nicolas Chapados, Quentin Cappart, Alexandre Lacoste, Krishnamurthy Dj Dvijotham, Alexandre Drouin
-
Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs
Zhixin Xie, Xurui Song, Jun Luo
-
Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs
Fatmazohra Rezkellah, Ramzi Dakhmouche
-
Xinzhe Huang, Wenjing Hu, Tianhang Zheng, Kedong Xiu, Xiaojun Jia, Di Wang, Zhan Qin, Kui Ren
-
Fine-Tuning on Noisy Instructions: Effects on Generalization and Performance
Ahmed Alajrami, Xingwei Tan, Nikolaos Aletras
-
Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks
Yubo Li, Ramayya Krishnan, Rema Padman
-
How to Set $β_1, β_2$ in Adam: An Online Learning Perspective
Quan Nguyen
-
Kedong Xiu, Churui Zeng, Tianhang Zheng, Xinzhe Huang, Xiaojun Jia, Di Wang, Puning Zhao, Zhan Qin, Kui Ren
-
Vicinity-Guided Discriminative Latent Diffusion for Privacy-Preserving Domain Adaptation
Jing Wang, Wonho Bae, Jiahong Chen, Wenxu Wang, Junhyug Noh
-
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
Xin-Qiang Cai, Wei Wang, Feng Liu, Tongliang Liu, Gang Niu, Masashi Sugiyama
-
Eliciting Secret Knowledge from Language Models
Bartosz Cywiński, Emil Ryd, Rowan Wang, Senthooran Rajamanoharan, Neel Nanda, Arthur Conmy, Samuel Marks
-
Private Online Learning against an Adaptive Adversary: Realizable and Agnostic Settings
Bo Li, Wei Wang, Peng Ye
-
ZQBA: Zero Query Black-box Adversarial Attack
Joana C. Costa, Tiago Roxo, Hugo Proença, Pedro R. M. Inácio
-
Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors
Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang, Yun-Nung Chen
-
Baseline Systems For The 2025 Low-Resource Audio Codec Challenge
Yusuf Ziya Isik, Rafał Łaganowski
-
A Generalized Information Bottleneck Theory of Deep Learning
Charles Westphal, Stephen Hailes, Mirco Musolesi
-
Less is More: Towards Simple Graph Contrastive Learning
Yanan Zhao, Feng Ji, Jingyang Dai, Jiaze Ma, Wee Peng Tay
-
Stealthy Yet Effective: Distribution-Preserving Backdoor Attacks on Graph Classification
Xiaobao Wang, Ruoxiao Sun, Yujun Zhang, Bingdao Feng, Dongxiao He, Luzhi Wang, Di Jin
-
Xiang Zhang, Kun Wei, Xu Yang, Chenghao Xu, Su Yan, Cheng Deng
-
AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models
Zihao Zhu, Xinyu Wu, Gehan Hu, Siwei Lyu, Ke Xu, Baoyuan Wu
-
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
Yichi Zhang, Yue Ding, Jingwen Yang, Tianwei Luo, Dongbai Li, Ranjie Duan, Qiang Liu, Hang Su, Yinpeng Dong, Jun Zhu
-
UniAPL: A Unified Adversarial Preference Learning Framework for Instruct-Following
FaQiang Qian, WeiKun Zhang, Ziliang Wang, Kang An, Xuhui Zheng, Liangjian Wen, Mengya Gao, Yong Dai, Yichao Wu
-
Stable Forgetting: Bounded Parameter-Efficient Unlearning in LLMs
Arpit Garg, Hemanth Saratchandran, Ravi Garg, Simon Lucey
-
Metamorphic Testing for Audio Content Moderation Software
Wenxuan Wang, Yongjiang Wu, Junyuan Zhang, Shuqing Li, Yun Peng, Wenting Chen, Shuai Wang, Michael R. Lyu
-
Adversarial Reinforcement Learning Framework for ESP Cheater Simulation
Inkyu Park, Jeong-Gwan Lee, Taehwan Kwon, Juheon Choi, Seungku Kim, Junsu Kim, Kimin Lee
-
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
Zherui Li, Zheng Nie, Zhenhong Zhou, Yufei Guo, Yue Liu, Yitong Zhang, Yu Cheng, Qingsong Wen, Kun Wang, Jiaheng Zhang
-
HarmMetric Eval: Benchmarking Metrics and Judges for LLM Harmfulness Assessment
Langqi Yang, Tianhang Zheng, Kedong Xiu, Yixuan Chen, Di Wang, Puning Zhao, Zhan Qin, Kui Ren
-
Community detection robustness of graph neural networks
Jaidev Goel, Pablo Moriano, Ramakrishnan Kannan, Yulia R. Gel
-
Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
Longxiang He, Deheng Ye, Junbo Tan, Xueqian Wang, Li Shen
-
Scalable GANs with Transformers
Sangeek Hyun, MinKyu Lee, Jae-Pil Heo
-
SecInfer: Preventing Prompt Injection via Inference-time Scaling
Yupei Liu, Yanting Wang, Yuqi Jia, Jinyuan Jia, Neil Zhenqiang Gong
-
GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs
Aryan Yazdan Parast, Parsa Hosseini, Hesam Asadollahzadeh, Arshia Soltani Moakhar, Basim Azam, Soheil Feizi, Naveed Akhtar
-
Sanitize Your Responses: Mitigating Privacy Leakage in Large Language Models
Wenjie Fu, Huandong Wang, Junyao Gao, Guoan Wan, Tao Jiang
-
SemanticShield: LLM-Powered Audits Expose Shilling Attacks in Recommender Systems
Kaihong Li, Huichi Zhou, Bin Ma, Fangjun Huang
-
DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense
Amira Guesmi, Muhammad Shafique
-
TokenSwap: Backdoor Attack on the Compositional Understanding of Large Vision-Language Models
Zhifang Zhang, Qiqi Tao, Jiaqi Lv, Na Zhao, Lei Feng, Joey Tianyi Zhou
-
VAGUEGAN: Stealthy Poisoning and Backdoor Attacks on Image Generative Pipelines
Mostafa Mohaimen Akand Faisal, Rabeya Amin Jhuma
-
MANI-Pure: Magnitude-Adaptive Noise Injection for Adversarial Purification
Xiaoyi Huang, Junwei Wu, Kejia Zhang, Carl Yang, Zhiming Luo
-
Score-based Membership Inference on Diffusion Models
Mingxing Rao, Bowen Qu, Daniel Moyer
-
H+: An Efficient Similarity-Aware Aggregation for Byzantine Resilient Federated Learning
Shiyuan Zuo, Rongfei Fan, Cheng Zhan, Jie Xu, Puning Zhao, Han Hu
-
Distributionally Robust Federated Learning with Outlier Resilience
Zifan Wang, Xinlei Yi, Xenia Konti, Michael M. Zavlanos, Karl H. Johansson
-
Guided Uncertainty Learning Using a Post-Hoc Evidential Meta-Model
Charmaine Barker, Daniel Bethell, Simos Gerasimou
-
Learning in an Echo Chamber: Online Learning with Replay Adversary
Daniil Dmitriev, Harald Eskelund Franck, Carolin Heinzler, Amartya Sanyal
-
FuncPoison: Poisoning Function Library to Hijack Multi-agent Autonomous Driving Systems
Yuzhen Long, Songze Li
-
Takedown: How It's Done in Modern Coding Agent Exploits
Eunkyu Lee, Donghyeon Kim, Wonyoung Kim, Insu Yun
-
When MCP Servers Attack: Taxonomy, Feasibility, and Mitigation
Weibo Zhao, Jiahao Liu, Bonan Ruan, Shaofei Li, Zhenkai Liang
-
GSPR: Aligning LLM Safeguards as Generalizable Safety Policy Reasoners
Haoran Li, Yulin Chen, Jingru Zeng, Hao Peng, Huihao Jing, Wenbin Hu, Xi Yang, Ziqian Zeng, Sirui Han, Yangqiu Song
-
PRIVMARK: Private Large Language Models Watermarking with MPC
Thomas Fargues, Ye Dong, Tianwei Zhang, Jin-Song Dong
-
Tereza Burianová, Martin Perešíni, Ivan Homoliak
-
Formalization Driven LLM Prompt Jailbreaking via Reinforcement Learning
Zhaoqi Wang, Daqing He, Zijian Zhang, Xin Li, Liehuang Zhu, Meng Li, Jiamou Liu
-
SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents
Jianshuo Dong, Sheng Guo, Hao Wang, Zhuotao Liu, Tianwei Zhang, Ke Xu, Minlie Huang, Han Qiu
-
Quant Fever, Reasoning Blackholes, Schrodinger's Compliance, and More: Probing GPT-OSS-20B
Shuyi Lin, Tian Lu, Zikai Wang, Bo Wen, Yibo Zhao, Cheng Tan
-
Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence
Yuqiao Meng, Luoxi Tang, Feiyang Yu, Jinyuan Jia, Guanhua Yan, Ping Yang, Zhaohan Xi
-
BioVessel-Net and RetinaMix: Unsupervised Retinal Vessel Segmentation from OCTA Images
Cheng Huang, Weizheng Xie, Fan Gao, Yutong Liu, Ruoling Wu, Zeyu Han, Jingxi Qiu, Xiangxiang Wang, Zhenglin Yang, Hao Wang, Yongbin Yu
-
Generalizable Speech Deepfake Detection via Information Bottleneck Enhanced Adversarial Alignment
Pu Huang, Shouguang Wang, Siya Yao, Mengchu Zhou
-
Accuracy-Robustness Trade Off via Spiking Neural Network Gradient Sparsity Trail
Nhan T. Luu
-
HFuzzer: Testing Large Language Models for Package Hallucinations via Phrase-based Fuzzing
Yukai Zhao, Menghan Wu, Xing Hu, Xin Xia
-
Adversarial Diffusion for Robust Reinforcement Learning
Daniele Foffano, Alessio Russo, Alexandre Proutiere
-
Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack
Yukun Chen, Boheng Li, Yu Yuan, Leyi Qi, Yiming Li, Tianwei Zhang, Zhan Qin, Kui Ren
-
Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
Simon Schrodi, Elias Kempf, Fazl Barez, Thomas Brox
-
Preserving Cross-Modal Stability for Visual Unlearning in Multimodal Scenarios
Jinghan Xu Yuyang Zhang Qixuan Cai Jiancheng Chen Keqiu Li
-
Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE
Guancheng Wan, Lucheng Fu, Haoxin Liu, Yiqiao Jin, Hui Yi Leong, Eric Hanchen Jiang, Hejia Geng, Jinhe Bi, Yunpu Ma, Xiangru Tang, B. Aditya Prakash, Yizhou Sun, Wei Wang
-
Efficient Domain-Adaptive Multi-Task Dense Prediction with Vision Foundation Models
Beomseok Kang, Niluthpol Chowdhury Mithun, Mikhail Sizintsev, Han-Pang Chiu, Supun Samarasekera
-
Efthymios Tsaprazlis, Tiantian Feng, Anil Ramakrishna, Rahul Gupta, Shrikanth Narayanan
-
Djamel Eddine Boukhari
-
You Zhou, Lijiang Chen, Shuchang Lyu, Guangxia Cui, Wenpei Bai, Zheng Zhou, Meng Li, Guangliang Cheng, Huiyu Zhou, Qi Zhao
-
Bridging the Task Gap: Multi-Task Adversarial Transferability in CLIP and Its Derivatives
Kuanrong Liu, Siyuan Liang, Cheng Qian, Ming Zhang, Xiaochun Cao
-
StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data
Yixu Wang, Yan Teng, Yingchun Wang, Xingjun Ma
-
FedDAPL: Toward Client-Private Generalization in Federated Learning
Soroosh Safari Loaliyan, Jose-Luis Ambite, Paul M. Thompson, Neda Jahanshad, Greg Ver Steeg
-
Merge Now, Regret Later: The Hidden Cost of Model Merging is Adversarial Transferability
Ankit Gangwal, Aaryan Ajay Sharma
-
Visual CoT Makes VLMs Smarter but More Fragile
Chunxue Xu, Yiwei Wang, Yujun Cai, Bryan Hooi, Songze Li
-
Influence-Guided Concolic Testing of Transformer Robustness
Chih-Duo Hong, Yu Wang, Yao-Chen Chang, Fang Yu
-
Sheikh Md Mushfiqur Rahman, Nasir Eisty
-
AutoML in Cybersecurity: An Empirical Study
Sherif Saad, Kevin Shi, Mohammed Mamun, Hythem Elmiligi
-
A First Look at Privacy Risks of Android Task-executable Voice Assistant Applications
Shidong Pan, Yikai Ge, Xiaoyu Sun
-
GPM: The Gaussian Pancake Mechanism for Planting Undetectable Backdoors in Differential Privacy
Haochen Sun, Xi He
-
Binary Diff Summarization using Large Language Models
Meet Udeshi, Venkata Sai Charan Putrevu, Prashanth Krishnamurthy, Prashant Anantharaman, Sean Carrick, Ramesh Karri, Farshad Khorrami
-
Analyzing and Evaluating Unbiased Language Model Watermark
Yihan Wu, Xuehao Cui, Ruibo Chen, Heng Huang
-
Performance of Machine Learning Methods for Gravity Inversion: Successes and Challenges
Vahid Negahdari, Shirin Samadi Bahrami, Seyed Reza Moghadasi, Mohammad Reza Razvan
-
Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia
Davi Bastos Costa, Renato Vicente
-
LLM Watermark Evasion via Bias Inversion
Jeongyeon Hwang, Sangdon Park, Jungseul Ok
-
DPFNAS: Differential Privacy-Enhanced Federated Neural Architecture Search for 6G Edge Intelligence
Yang Lv, Jin Cao, Ben Niu, Zhe Sun, Fengwei Wang, Fenghua Li, Hui Li
-
Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
Zi Liang, Qingqing Ye, Xuan Liu, Yanyun Wang, Jianliang Xu, Haibo Hu
-
Patch Rebirth: Toward Fast and Transferable Model Inversion of Vision Transformers
Seongsoo Heo, Dong-Wan Choi
-
Adaptive Token-Weighted Differential Privacy for LLMs: Not All Tokens Require Equal Protection
Manjiang Yu, Priyanka Singh, Xue Li, Yang Cao
-
Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing
Rohit Chowdhury, Aniruddha Bala, Rohan Jaiswal, Siddharth Roheda
-
A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models
Wonje Jeung, Sangyeon Yoon, Yoonjun Cho, Dongjae Jeon, Sangwoo Shin, Hyesoo Hong, Albert No
-
Jonas Ngnawé, Maxime Heuillet, Sabyasachi Sahoo, Yann Pequignot, Ola Ahmad, Audrey Durand, Frédéric Precioso, Christian Gagné
-
Dual-Space Smoothness for Robust and Balanced LLM Unlearning
Han Yan, Zheyuan Liu, Meng Jiang
-
Factor Decorrelation Enhanced Data Removal from Deep Predictive Models
Wenhao Yang, Lin Li, Xiaohui Tao, Kaize Shi
-
ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search
Zeyu Shen, Basileal Imana, Tong Wu, Chong Xiang, Prateek Mittal, Aleksandra Korolova
-
Wonhyuk Lee, Youngchol Kim, Yunjin Park, Junhyung Moon, Dongyoung Jeong, Wanjin Park
-
MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction
Sepideh Abedini, Shubhankar Mohapatra, D. B. Emerson, Masoumeh Shafieinejad, Jesse C. Cresswell, Xi He
-
Zhiqiang Tian, Weigang Li, Chunhua Deng, Junwei Hu, Yongqiang Wang, Wenping Liu
-
Real-World Transferable Adversarial Attack on Face-Recognition Systems
Andrey Kaznacheev, Matvey Mikhalchuk, Andrey Kuznetsov, Aleksandr Petiushko, Anton Razzhigaev
-
Ming-Tsung Hsu, Fang-Yu Hsu, Yi-Ting Lin, Kai-Heng Chien, Jun-Ren Chen, Cheng-Hsiang Su, Yi-Chen Ou, Chiou-Ting Hsu, Pei-Kai Huang
-
Nikolas McNeal, N. Apurva Ratan Murty
-
GuardNet: Graph-Attention Filtering for Jailbreak Defense in Large Language Models
Javad Forough, Mohammad Maheri, Hamed Haddadi
-
CoSIFL: Collaborative Secure and Incentivized Federated Learning with Differential Privacy
Zhanhong Xie, Meifan Zhang, Lihua Yin
-
NanoFlux: Adversarial Dual-LLM Evaluation and Distillation For Multi-Domain Reasoning
Raviteja Anantha, Soheil Hor, Teodor Nicola Antoniu, Layne C. Price
-
Xiangchen Meng, Yangdi Lyu
-
Bartosz Burgiel
-
Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety
Junliang Liu, Jingyu Xiao, Wenxin Tang, Wenxuan Wang, Zhixian Wang, Minrui Zhang, Shuanghe Yu
-
Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models
Miao Yu, Zhenhong Zhou, Moayad Aloqaily, Kun Wang, Biwei Huang, Stephen Wang, Yueming Jin, Qingsong Wen
-
You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors
Bochuan Cao, Changjiang Li, Yuanpu Cao, Yameng Ge, Ting Wang, Jinghui Chen
-
Active Attacks: Red-teaming LLMs via Adaptive Environments
Taeyoung Yun, Pierre-Luc St-Charles, Jinkyoo Park, Yoshua Bengio, Minsu Kim
-
Benchmarking and Mitigate Psychological Sycophancy in Medical Vision-Language Models
Zikun Guo, Xinyue Xu, Pei Xiang, Shu Yang, Xin Han, Di Wang, Lijie Hu
-
Aravindhan G, Yuvaraj Govindarajulu, Parin Shah
-
The Rogue Scalpel: Activation Steering Compromises LLM Safety
Anton Korznikov, Andrey Galichin, Alexey Dontsov, Oleg Y. Rogov, Ivan Oseledets, Elena Tutubalina
-
Jailbreaking on Text-to-Video Models via Scene Splitting Strategy
Wonjun Lee, Haon Park, Doehyeon Lee, Bumsub Ham, Suhyun Kim
-
Evaluating the Limits of Large Language Models in Multilingual Legal Reasoning
Antreas Ioannou, Andreas Shiamishis, Nora Hollenstein, Nezihe Merve Gürel
-
Mixture of Detectors: A Compact View of Machine-Generated Text Detection
Sai Teja Lekkala, Yadagiri Annepaka, Arun Kumar Challa, Samatha Reddy Machireddy, Partha Pakray, Chukhu Chunka
-
Context Parametrization with Compositional Adapters
Josip Jukić, Martin Tutek, Jan Šnajder
-
SBFA: Single Sneaky Bit Flip Attack to Break Large Language Models
Jingkai Guo, Chaitali Chakrabarti, Deliang Fan
-
Deepfakes: we need to re-think the concept of "real" images
Janis Keuper, Margret Keuper
-
FailureAtlas:Mapping the Failure Landscape of T2I Models via Active Exploration
Muxi Chen, Zhaohua Zhang, Chenchen Zhao, Mingyang Chen, Wenyu Jiang, Tianwen Jiang, Jianhuan Zhuo, Yu Tang, Qiuyong Xiao, Jihong Zhang, Qiang Xu
-
RAPID^3: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer
Wangbo Zhao, Yizeng Han, Zhiwei Tang, Jiasheng Tang, Pengfei Zhou, Kai Wang, Bohan Zhuang, Zhangyang Wang, Fan Wang, Yang You
-
Text Adversarial Attacks with Dynamic Outputs
Wenqiang Wang, Siyuan Liang, Xiao Yan, Xiaochun Cao
-
Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models
Xinhao Zhong, Yimin Zhou, Zhiqi Zhang, Junhao Li, Yi Sun, Bin Chen, Shu-Tao Xia, Ke Xu
-
Zubov-Net: Adaptive Stability for Neural ODEs Reconciling Accuracy with Robustness
Chaoyang Luo, Yan Zou, Nanjing Huang
-
Concept-SAE: Active Causal Probing of Visual Model Behavior
Jianrong Ding, Muxi Chen, Chenchen Zhao, Qiang Xu
-
Non-Linear Trajectory Modeling for Multi-Step Gradient Inversion Attacks in Federated Learning
Li Xia, Zheng Liu, Sili Huang, Wei Tang, Xuan Liu
-
Countering adversarial evasion in regression analysis
David Benfield, Phan Tu Vuong, Alain Zemkoho
-
A Law of Data Reconstruction for Random Features (and Beyond)
Leonardo Iurada, Simone Bombari, Tatiana Tommasi, Marco Mondelli
-
Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning
Nakyeong Yang, Dong-Kyum Kim, Jea Kwon, Minsung Kim, Kyomin Jung, Meeyoung Cha
-
Nonlinear Optimization with GPU-Accelerated Neural Network Constraints
Robert Parker, Oscar Dowson, Nicole LoGiudice, Manuel Garcia, Russell Bent
-
"Your AI, My Shell": Demystifying Prompt Injection Attacks on Agentic AI Coding Editors
Yue Liu, Yanjie Zhao, Yunbo Lyu, Ting Zhang, Haoyu Wang, David Lo
-
Collusion-Driven Impersonation Attack on Channel-Resistant RF Fingerprinting
Zhou Xu, Guyue Li, Zhe Peng, Aiqun Hu
-
Privacy Mechanism Design based on Empirical Distributions
Leonhard Grosse, Sara Saeidian, Mikael Skoglund, Tobias J. Oechtering
-
Gaurav Bagwe, Saket S. Chaturvedi, Xiaolong Ma, Xiaoyong Yuan, Kuang-Ching Wang, Lan Zhang
-
Hassen Dhrif
-
Defending MoE LLMs against Harmful Fine-Tuning via Safety Routing Alignment
Jaehan Kim, Minkyoo Song, Seungwon Shin, Sooel Son
-
Seeing Isn't Believing: Context-Aware Adversarial Patch Synthesis via Conditional GAN
Roie Kazoom, Alon Goldberg, Hodaya Cohen, Ofer Hadar
-
Boundary on the Table: Efficient Black-Box Decision-Based Attacks for Structured Data
Roie Kazoom, Yuval Ratzabi, Etamar Rothstein, Ofer Hadar
-
Observation-Free Attacks on Online Learning to Rank
Sameep Chattopadhyay, Nikhil Karamchandani, Sharayu Mohair
-
Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
Yuanzhi Zhu, Xi Wang, Stéphane Lathuilière, Vicky Kalogeiton
-
Unsupervised Speech Enhancement using Data-defined Priors
Dominik Klement, Matthew Maciejewski, Sanjeev Khudanpur, Jan Černocký, Lukáš Burget
-
ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents
Hwan Chang, Yonghyun Jun, Hwanhee Lee
-
Concept activation vectors: a unifying view and adversarial attacks
Ekkehard Schnoor, Malik Tiomoko, Jawher Said, Alex Jung, Wojciech Samek
-
Model Context Protocol for Vision Systems: Audit, Security, and Protocol Extensions
Aditi Tiwari, Akshit Bhalla, Darshan Prasad
-
Eduardo Chielle, Manaar Alam, Jinting Liu, Jovan Kascelan, Michail Maniatakos
-
AntiFLipper: A Secure and Efficient Defense Against Label-Flipping Attacks in Federated Learning
Aashnan Rahman, Abid Hasan, Sherajul Arifin, Faisal Haque Bappy, Tahrim Hossain, Tariqul Islam, Abu Raihan Mostofa Kamal, Md. Azam Hossain
-
Defending MoE LLMs against Harmful Fine-Tuning via Safety Routing Alignment
Jaehan Kim, Minkyoo Song, Seungwon Shin, Sooel Son
-
On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations
Jianing Guo, Zhenhong Wu, Chang Tu, Yiyao Ma, Xiangqi Kong, Zhiqian Liu, Jiaming Ji, Shuning Zhang, Yuanpei Chen, Kai Chen, Xianglong Liu, Qi Dou, Yaodong Yang, Huijie Zhao, Weifeng Lv, Simin Li
-
Memory Self-Regeneration: Uncovering Hidden Knowledge in Unlearned Models
Agnieszka Polowczyk, Alicja Polowczyk, Joanna Waczyńska, Piotr Borycki, Przemysław Spurek
-
SAGE: A Realistic Benchmark for Semantic Understanding
Samarth Goel, Reagan J. Lee, Kannan Ramchandran
-
A Framework for Rapidly Developing and Deploying Protection Against Large Language Model Attacks
Adam Swanda, Amy Chang, Alexander Chen, Fraser Burch, Paul Kassianik, Konstantin Berlin
-
Addressing Gradient Misalignment in Data-Augmented Training for Robust Speech Deepfake Detection
Duc-Tuan Truong, Tianchi Liu, Junjie Li, Ruijie Tao, Kong Aik Lee, Eng Siong Chng
-
DAC-LoRA: Dynamic Adversarial Curriculum for Efficient and Robust Few-Shot Adaptation
Ved Umrajkar
-
Trustworthy Semantic Communication for Vehicular Networks: Challenges and Solutions
Yanghe Pan, Yuntao Wang, Shaolong Guo, Chengyu Yin, Ruidong Li, Zhou Su, Yuan Wu
-
Security-aware Semantic-driven ISAC via Paired Adversarial Residual Networks
Yu Liu, Boxiang He, Fanggang Wang
-
Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools
Ping He, Changjiang Li, Binbin Zhao, Tianyu Du, Shouling Ji
-
The Use of the Simplex Architecture to Enhance Safety in Deep-Learning-Powered Autonomous Systems
Federico Nesti, Niko Salamini, Mauro Marinoni, Giorgio Maria Cicero, Gabriele Serra, Alessandro Biondi, Giorgio Buttazzo
-
Vision Transformers: the threat of realistic adversarial patches
Kasper Cools, Clara Maathuis, Alexander M. van Oers, Claudia S. Hübner, Nikos Deligiannis, Marijke Vandewal, Geert De Cubber
-
Evading Overlapping Community Detection via Proxy Node Injection
Dario Loi, Matteo Silvestri, Fabrizio Silvestri, Gabriele Tolomei
-
No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks
Yehonatan Refael, Guy Smorodinsky, Ofir Lindenbaum, Itay Safran
-
RedHerring Attack: Testing the Reliability of Attack Detection
Jonathan Rusert
-
Overcoming Black-box Attack Inefficiency with Hybrid and Dynamic Select Algorithms
Abhinay Shankar Belde, Rohit Ramkumar, Jonathan Rusert
-
Zero-Shot Privacy-Aware Text Rewriting via Iterative Tree Search
Shuo Huang, Xingliang Yuan, Gholamreza Haffari, Lizhen Qu
-
Weixuan Wang, Minghao Wu, Barry Haddow, Alexandra Birch
-
Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models
Chantal Shaib, Vinith M. Suriyakumar, Levent Sagun, Byron C. Wallace, Marzyeh Ghassemi
-
Jieli Zhu, Vi Ngoc-Nha Tran
-
Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond
Dingzirui Wang, Xuanliang Zhang, Keyan Xu, Qingfu Zhu, Wanxiang Che, Yang Deng
-
Wenkai Guo, Xuefeng Liu, Haolin Wang, Jianwei Niu, Shaojie Tang, Jing Yuan
-
CLUE: Conflict-guided Localization for LLM Unlearning Framework
Hang Chen, Jiaying Zhu, Xinyu Yang, Wenya Wang
-
Poisoning Prompt-Guided Sampling in Video Large Language Models
Yuxin Cao, Wei Song, Jingling Xue, Jin Song Dong
-
The Unanticipated Asymmetry Between Perceptual Optimization and Assessment
Jiabei Zhang, Qi Wang, Siyu Wu, Du Chen, Tianhe Wu
-
A Single Neuron Works: Precise Concept Erasure in Text-to-Image Diffusion Models
Qinqin He, Jiaqi Weng, Jialing Tao, Hui Xue
-
The Unwinnable Arms Race of AI Image Detection
Till Aczel, Lorenzo Vettor, Andreas Plesner, Roger Wattenhofer
-
FERD: Fairness-Enhanced Data-Free Robustness Distillation
Zhengxiao Li, Liming Lu, Xu Zheng, Siyuan Liang, Zhenghan Chen, Yongbin Zhou, Shuchao Pang
-
Sparse Representations Improve Adversarial Robustness of Neural Network Classifiers
Killian Steunou, Sigurd Saue, Théo Druilhe
-
The Impact of Audio Watermarking on Audio Anti-Spoofing Countermeasures
Zhenshan Zhang, Xueping Zhang, Yechen Wang, Liwei Jin, Ming Li
-
FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction
Runqi Lin, Alasdair Paren, Suqin Yuan, Muyang Li, Philip Torr, Adel Bibi, Tongliang Liu
-
EvoMail: Self-Evolving Cognitive Agents for Adaptive Spam and Phishing Email Defense
Wei Huang, De-Tian Chu, Lin-Yuan Bai, Wei Kang, Hai-Tao Zhang, Bo Li, Zhi-Mo Han, Jing Ge, Hai-Feng Lin
-
Optimal Robust Recourse with $L^p$-Bounded Model Change
Phone Kyaw, Kshitij Kayastha, Shahin Jabbari
-
Cryptographic Backdoor for Neural Networks: Boon and Bane
Anh Tu Ngo, Anupam Chattopadhyay, Subhamoy Maitra
-
Are Modern Speech Enhancement Systems Vulnerable to Adversarial Attacks?
Rostislav Makarov, Lea Schönherr, Timo Gerkmann
-
RLCracker: Exposing the Vulnerability of LLM Watermarks with Adaptive RL Attacks
Hanbo Huang, Yiran Zhang, Hao Zheng, Xuan Gong, Yihan Li, Lin Liu, Shiyu Liang
-
TRiCo: Triadic Game-Theoretic Co-Training for Robust Semi-Supervised Learning
Hongyang He, Xinyuan Song, Yangfan He, Zeyu Zhang, Yanshu Li, Haochen You, Lifan Sun, Wenqiao Zhang
-
Saurabh Kataria, Davood Fattahi, Minxiao Wang, Ran Xiao, Matthew Clark, Timothy Ruchti, Mark Mai, Xiao Hu
-
Functional Encryption in Secure Neural Network Training: Data Leakage and Practical Mitigations
Alexandru Ioniţă, Andreea Ioniţă
-
Aurosweta Mahapatra, Ismail Rasim Ulgen, Berrak Sisman
-
Bidirectional Intention Inference Enhances LLMs' Defense Against Multi-Turn Jailbreak Attacks
Haibo Tong, Dongcheng Zhao, Guobin Shen, Xiang He, Dachuan Lin, Feifei Zhao, Yi Zeng
-
Responsible Diffusion: A Comprehensive Survey on Safety, Ethics, and Trust in Diffusion Models
Kang Wei, Xin Yuan, Fushuo Huo, Chuan Ma, Long Yuan, Songze Li, Ming Ding, Dacheng Tao
-
Huizhen Shu, Xuying Li, Zhuo Li
-
CON-QA: Privacy-Preserving QA using cloud LLMs in Contract Domain
Ajeet Kumar Singh, Rajsabi Surya, Anurag Tripathi, Santanu Choudhury, Sudhir Bisane
-
Steerable Adversarial Scenario Generation through Test-Time Preference Alignment
Tong Nie, Yuewen Mei, Yihong Tang, Junlin He, Jie Sun, Haotian Shi, Wei Ma, Jian Sun
-
bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs
Wence Ji, Jiancan Wu, Aiying Li, Shuyi Zhang, Junkang Wu, An Zhang, Xiang Wang, Xiangnan He
-
Zhixiao Wu, Yao Lu, Jie Wen, Hao Sun, Qi Zhou, Guangming Lu
-
Lubos Mjachky, Ivan Homoliak
-
Beyond Sharp Minima: Robust LLM Unlearning via Feedback-Guided Multi-Point Optimization
Wenhan Wu, Zheyuan Liu, Chongyang Gao, Ren Wang, Kaize Ding
-
RAG Security and Privacy: Formalizing the Threat Model and Attack Surface
Atousa Arzanipour, Rouzbeh Behnia, Reza Ebrahimi, Kaushik Dutta
-
Benchmarking Gaslighting Attacks Against Speech Large Language Models
Jinyang Wu, Bin Zhu, Xiandong Zou, Qiquan Zhang, Xu Fang, Pan Zhou
-
Yixun Zhang, Feng Zhou, Jianqin Yin
-
FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models
Xin Wang, Jie Li, Zejia Weng, Yixu Wang, Yifeng Gao, Tianyu Pang, Chao Du, Yan Teng, Yingchun Wang, Zuxuan Wu, Xingjun Ma, Yu-Gang Jiang
-
Zhifang Zhang, Jiahan Zhang, Shengjie Zhou, Qi Wei, Shuo He, Feng Liu, Lei Feng
-
Xuekang Zhu, Ji-Zhe Zhou, Kaiwen Feng, Chenfan Qu, Yunfei Wang, Liting Zhou, Jian liu
-
Smaller is Better: Enhancing Transparency in Vehicle AI Systems via Pruning
Sanish Suwal, Shaurya Garg, Dipkamal Bhusal, Michael Clifford, Nidhi Rastogi
-
Universal Camouflage Attack on Vision-Language Models for Autonomous Driving
Dehong Kong, Sifan Yu, Siyuan Liang, Jiawei Liang, Jianhou Gan, Aishan Liu, Wenqi Ren
-
Puning Zhao, Zhikun Zhang, Bo Sun, Li Shen, Liang Zhang, Shaowei Wang, Zhe Liu
-
On the Fragility of Contribution Score Computation in Federated Learning
Balazs Pejo, Marcell Frank, Krisztian Varga, Peter Veliczky
-
Generative Model Inversion Through the Lens of the Manifold Hypothesis
Xiong Peng, Bo Han, Fengfei Yu, Tongliang Liu, Feng Liu, Mingyuan Zhou
-
Staying on the Manifold: Geometry-Aware Noise Injection
Albert Kjøller Jacobsen, Johanna Marie Gegenfurtner, Georgios Arvanitidis
-
Monitoring Violations of Differential Privacy over Time
Önder Askin, Tim Kutta, Holger Dette
-
FlyTrap: Physical Distance-Pulling Attack Towards Camera-based Autonomous Target Tracking Systems
Shaoyuan Xie, Mohamad Habib Fakih, Junchi Lu, Fayzah Alshammari, Ningfei Wang, Takami Sato, Halima Bouzidi, Mohammad Abdullah Al Faruque, Qi Alfred Chen
-
Are Neural Networks Collision Resistant?
Marco Benedetti, Andrej Bogdanov, Enrico M. Malatesta, Marc Mézard, Gianmarco Perrupato, Alon Rosen, Nikolaj I. Schwartzbach, Riccardo Zecchina
-
Tharcisse Ndayipfukamiye, Jianguo Ding, Doreen Sebastian Sarwatt, Adamu Gaston Philipo, Huansheng Ning
-
Understanding and Improving Adversarial Robustness of Neural Probabilistic Circuits
Weixin Chen, Han Zhao
-
Perspectra: Choosing Your Experts Enhances Critical Thinking in Multi-Agent Research Ideation
Yiren Liu, Viraj Shah, Sangho Suh, Pao Siangliulue, Tal August, Yun Huang
-
Every Character Counts: From Vulnerability to Defense in Phishing Detection
Maria Chiper, Radu Tudor Ionescu
-
Bridging Privacy and Utility: Synthesizing anonymized EEG with constraining utility functions
Kay Fuhrmeister, Arne Pelzer, Fabian Radke, Julia Lechinger, Mahzad Gharleghi, Thomas Köllmer, Insa Wolf
-
Efficiently Attacking Memorization Scores
Tue Do, Varun Chandrasekaran, Daniel Alabi
-
Differential Privacy of Network Parameters from a System Identification Perspective
Andrew Campbell, Anna Scaglione, Hang Liu, Victor Elvira, Sean Peisert, Daniel Arnold
-
Ren-Yi Huang, Dumindu Samaraweera, Prashant Shekhar, J. Morris Chang
-
JaiLIP: Jailbreaking Vision-Language Models via Loss Guided Image Perturbation
Md Jueal Mia, M. Hadi Amini
-
Dynamic Dual-level Defense Routing for Continual Adversarial Training
Wenxuan Wang, Chenglei Wang, Xuelin Qian
-
SafeSteer: Adaptive Subspace Steering for Efficient Jailbreak Defense in Vision-Language Models
Xiyu Zeng, Siyuan Liang, Liming Lu, Haotian Zhu, Enguang Liu, Jisheng Dang, Yongbin Zhou, Shuchao Pang
-
Large Language Models for Real-World IoT Device Identification
Rameen Mahmood, Tousif Ahmed, Sai Teja Peddinti, Danny Yuxing Huang
-
TIMED: Adversarial and Autoregressive Refinement of Diffusion-Based Time Series Generation
MohammadReza EskandariNasab, Shah Muhammad Hamdi, Soukaina Filali Boubrahimi
-
The Pareto Frontier of Resilient Jet Tagging
Rikab Gambhir, Matt LeBlanc, Yuanchen Zhou
-
Stochastic Path Planning in Correlated Obstacle Fields
Li Zhou, Elvan Ceyhan
-
Improving Credit Card Fraud Detection through Transformer-Enhanced GAN Oversampling
Kashaf Ul Emaan
-
The Secret Agenda: LLMs Strategically Lie and Our Current Safety Tools Are Blind
Caleb DeLeeuw, Gaurav Chawla, Aniket Sharma, Vanessa Dietze
-
Defending against Stegomalware in Deep Neural Networks with Permutation Symmetry
Birk Torpmann-Hagen, Michael A. Riegler, Pål Halvorsen, Dag Johansen
-
Why Speech Deepfake Detectors Won't Generalize: The Limits of Detection in an Open World
Visar Berisha, Prad Kadambi, Isabella Lenz
-
SAEmnesia: Erasing Concepts in Diffusion Models with Sparse Autoencoders
Enrico Cassano, Riccardo Renzulli, Marco Nurisso, Mirko Zaffaroni, Alan Perotti, Marco Grangetto
-
Localizing Adversarial Attacks To Produces More Imperceptible Noise
Pavan Reddy, Aditya Sanjay Gujral
-
Diversity Boosts AI-Generated Text Detection
Advik Raj Basani, Pin-Yu Chen
-
Uncovering Privacy Vulnerabilities through Analytical Gradient Inversion Attacks
Tamer Ahmed Eltaras, Qutaibah Malluhi, Alessandro Savino, Stefano Di Carlo, Adnan Qayyum
-
Rule Encoding and Compliance in Large Language Models: An Information-Theoretic Analysis
Joachim Diederich
-
DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces
Tianshuo Zhang, Li Gao, Siran Peng, Xiangyu Zhu, Zhen Lei
-
Revealing Adversarial Smart Contracts through Semantic Interpretation and Uncertainty Estimation
Yating Liu, Xing Su, Hao Wu, Sijin Li, Yuxi Cheng, Fengyuan Xu, Sheng Zhong
-
When Ads Become Profiles: Uncovering the Invisible Risk of Web Advertising at Scale with LLMs
Baiyu Chen, Benjamin Tag, Hao Xue, Daniel Angus, Flora Salim
-
Is It Certainly a Deepfake? Reliability Analysis in Detection & Generation Ecosystem
Neslihan Kose, Anthony Rhodes, Umur Aybars Ciftci, Ilke Demir
-
Distributionally Robust Safety Verification of Neural Networks via Worst-Case CVaR
Masako Kishida
-
Privacy in Action: Towards Realistic Privacy Mitigation and Evaluation for LLM-Powered Agents
Shouju Wang, Fenglin Yu, Xirui Liu, Xiaoting Qin, Jue Zhang, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan
-
Saeid Sheikhi, Panos Kostakos, Lauri Loven
-
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM
Alexander Panfilov, Evgenii Kortukov, Kristina Nikolić, Matthias Bethge, Sebastian Lapuschkin, Wojciech Samek, Ameya Prabhu, Maksym Andriushchenko, Jonas Geiping
-
D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models
Satyapriya Krishna, Andy Zou, Rahul Gupta, Eliot Krzysztof Jones, Nick Winter, Dan Hendrycks, J. Zico Kolter, Matt Fredrikson, Spyros Matsoukas
-
An Unlearning Framework for Continual Learning
Sayanta Adhikari, Vishnuprasadh Kumaravelu, P. K. Srijith
-
Budgeted Adversarial Attack against Graph-Based Anomaly Detection in Sensor Networks
Sanju Xaviar, Omid Ardakanian
-
SilentStriker:Toward Stealthy Bit-Flip Attacks on Large Language Models
Haotian Xu, Qingsong Peng, Jie Shi, Huadi Zheng, Yu Li, Cheng Zhuo
-
Lipschitz-Based Robustness Certification for Recurrent Neural Networks via Convex Relaxation
Paul Hamelbeck, Johannes Schiffer
-
Shilling Recommender Systems by Generating Side-feature-aware Fake User Profiles
Yuanrong Wang, Yingpeng Du
-
TextCrafter: Optimization-Calibrated Noise for Defending Against Text Embedding Inversion
Duoxun Tang, Xinhang Jiang, Jiajun Niu
-
B-Privacy: Defining and Enforcing Privacy in Weighted Voting
Samuel Breckenridge, Dani Vilardell, Andrés Fábrega, Amy Zhao, Patrick McCorry, Rafael Solari, Ari Juels
-
Synth-MIA: A Testbed for Auditing Privacy Leakage in Tabular Data Synthesis
Joshua Ward, Xiaofeng Lin, Chi-Hua Wang, Guang Cheng
-
Quickest Change Detection in Continuous-Time in Presence of a Covert Adversary
Amir Reza Ramtin, Philippe Nain, Don Towsley
-
Yu-Kai Shih, You-Kai Kang
-
Shilling Recommender Systems by Generating Side-feature-aware Fake User Profiles
Yuanrong Wang, Yingpeng Du
-
TextCrafter: Optimization-Calibrated Noise for Defending Against Text Embedding Inversion
Duoxun Tang, Xinhang Jiang, Jiajun Niu
-
The Illusion of Readiness in Health AI
Yu Gu, Jingjing Fu, Xiaodong Liu, Jeya Maria Jose Valanarasu, Noel CF Codella, Reuben Tan, Qianchu Liu, Ying Jin, Sheng Zhang, Jinyu Wang, Rui Wang, Lei Song, Guanghui Qin, Naoto Usuyama, Cliff Wong, Hao Cheng, HoHin Lee, Praneeth Sanapathi, Sarah Hilado, Tristan Naumann, Javier Alvarez-Valle, Jiang Bian, Mu Wei, Khalil Malik, Lidong Zhou, Jianfeng Gao, Eric Horvitz, Matthew P. Lungren, Doug Burger, Eric Topol, Hoifung Poon, Paul Vozila
-
Ilham Wicaksono, Zekun Wu, Rahul Patel, Theo King, Adriano Koshiyama, Philip Treleaven
-
AdaptiveGuard: Towards Adaptive Runtime Safety for LLM-Powered Software
Rui Yang, Michael Fu, Chakkrit Tantithamthavorn, Chetan Arora, Gunel Gulmammadova, Joey Chua
-
Jiahe Qian, Yaoyu Fang, Ziqiao Weng, Xinkun Wang, Lee A. Cooper, Bo Zhou
-
Localizing Malicious Outputs from CodeLLM
Mayukh Borana, Junyi Liang, Sai Sathiesh Rajan, Sudipta Chattopadhyay
-
SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions
Massa Baali, Sarthak Bisht, Francisco Teixeira, Kateryna Shapovalenko, Rita Singh, Bhiksha Raj
-
TraceHiding: Scalable Machine Unlearning for Mobility Data
Ali Faraji, Manos Papagelis
-
Xuan Chen, Shiwei Feng, Zikang Xiong, Shengwei An, Yunshu Mao, Lu Yan, Guanhong Tao, Wenbo Guo, Xiangyu Zhang
-
Seeing is Deceiving: Mirror-Based LiDAR Spoofing for Autonomous Vehicle Deception
Selma Yahia, Ildi Alla, Girija Bangalore Mohan, Daniel Rau, Mridula Singh, Valeria Loscri
-
Rui Yang, Michael Fu, Chakkrit Tantithamthavorn, Chetan Arora, Gunel Gulmammadova, Joey Chua
-
Lightweight MobileNetV1+GRU for ECG Biometric Authentication: Federated and Adversarial Evaluation
Dilli Hang Rai, Sabin Kafley
-
MARS: A Malignity-Aware Backdoor Defense in Federated Learning
Wei Wan, Yuxuan Ning, Zhicong Huang, Cheng Hong, Shengshan Hu, Ziqi Zhou, Yechao Zhang, Tianqing Zhu, Wanlei Zhou, Leo Yu Zhang
-
Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models
Xingkai Peng, Jun Jiang, Meng Tong, Shuai Li, Weiming Zhang, Nenghai Yu, Kejiang Chen
-
Xuan Chen, Shiwei Feng, Zikang Xiong, Shengwei An, Yunshu Mao, Lu Yan, Guanhong Tao, Wenbo Guo, Xiangyu Zhang
-
Can an Individual Manipulate the Collective Decisions of Multi-Agents?
Fengyuan Liu, Rui Zhao, Shuo Chen, Guohao Li, Philip Torr, Lei Han, Jindong Gu
-
Train to Defend: First Defense Against Cryptanalytic Neural Network Parameter Extraction Attacks
Ashley Kurian, Aydin Aysu
-
V-CECE: Visual Counterfactual Explanations via Conceptual Edits
Nikolaos Spanos, Maria Lymperaiou, Giorgos Filandrianos, Konstantinos Thomas, Athanasios Voulodimos, Giorgos Stamou
-
FakeChain: Exposing Shallow Cues in Multi-Step Deepfake Detection
Minji Heo, Simon S. Woo
-
MoRoVoc: A Large Dataset for Geographical Variation Identification of the Spoken Romanian Language
Andrei-Marius Avram, Ema-Ioana Bănescu, Anda-Teodora Robea, Dumitru-Clementin Cercel, Mihaela-Claudia Cercel
-
Hanting Li, Huaao Tang, Jianhong Han, Tianxiong Zhou, Jiulong Cui, Haizhen Xie, Yan Chen, Jie Hu
-
A Novel Metric for Detecting Memorization in Generative Models for Brain MRI Synthesis
Antonio Scardace, Lemuel Puglisi, Francesco Guarnera, Sebastiano Battiato, Daniele Ravì
-
ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents
Yichen Wang, Hangtao Zhang, Hewen Pan, Ziqi Zhou, Xianlong Wang, Peijin Guo, Lulu Xue, Shengshan Hu, Minghui Li, Leo Yu Zhang
-
SOLAR: Switchable Output Layer for Accuracy and Robustness in Once-for-All Training
Shaharyar Ahmed Khan Tareen, Lei Fan, Xiaojing Yuan, Qin Lin, Bin Hu
-
FairTune: A Bias-Aware Fine-Tuning Framework Towards Fair Heart Rate Prediction from PPG
Lovely Yeswanth Panchumarthi, Saurabh Kataria, Yi Wu, Xiao Hu, Alex Fedorov, Hyunjung Gloria Kwak
-
Delving into Cryptanalytic Extraction of PReLU Neural Networks
Yi Chen, Xiaoyang Dong, Ruijie Ma, Yantian Shen, Anyu Wang, Hongbo Yu, Xiaoyun Wang
-
"Digital Camouflage": The LLVM Challenge in LLM-Based Malware Detection
Ekin Böke, Simon Torka
-
Stress Testing Deliberative Alignment for Anti-Scheming Training
Bronson Schoen, Evgenia Nitishinskaya, Mikita Balesni, Axel Højmark, Felix Hofstätter, Jérémy Scheurer, Alexander Meinke, Jason Wolfe, Teun van der Weij, Alex Lloyd, Nicholas Goldowsky-Dill, Angela Fan, Andrei Matveiakin, Rusheb Shah, Marcus Williams, Amelia Glaese, Boaz Barak, Wojciech Zaremba, Marius Hobbhahn
-
Krati Saxena, Federico Jurado Ruiz, Guido Manzi, Dianbo Liu, Alex Lamb
-
Reward Hacking Mitigation using Verifiable Composite Rewards
Mirza Farhan Bin Tarek, Rahmatollah Beheshti
-
Robust Vision-Language Models via Tensor Decomposition: A Defense Against Adversarial Attacks
Het Patel, Muzammil Allie, Qian Zhang, Jia Chen, Evangelos E. Papalexakis
-
DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm
Xiaowei Zhu, Yubing Ren, Fang Fang, Qingfeng Tan, Shi Wang, Yanan Cao
-
Sparse-Autoencoder-Guided Internal Representation Unlearning for Large Language Models
Tomoya Yamashita, Akira Ito, Yuuki Yamanaka, Masanori Yamada, Takayuki Miura, Toshiki Shibahara
-
SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual Connection
Maithili Joshi, Palash Nandi, Tanmoy Chakraborty
-
Backdoor Mitigation via Invertible Pruning Masks
Kealan Dunnett, Reza Arablouei, Dimity Miller, Volkan Dedeoglu, Raja Jurdak
-
PolyJuice Makes It Real: Black-Box, Universal Red Teaming for Synthetic Image Detectors
Sepehr Dehdashtian, Mashrur M. Morshed, Jacob H. Seidman, Gaurav Bharaj, Vishnu Naresh Boddeti
-
Zhangqi Jiang, Tingjin Luo, Xu Yang, Xinyan Liang
-
Randomized Smoothing Meets Vision-Language Models
Emmanouil Seferis, Changshun Wu, Stefanos Kollias, Saddek Bensalem, Chih-Hong Cheng
-
Zhengxing Li, Guangmingmei Yang, Jayaram Raghuram, David J. Miller, George Kesidis
-
Adversarially Robust Assembly Language Model for Packed Executables Detection
Shijia Li, Jiang Ming, Lanqing Liu, Longwei Yang, Ni Zhang, Chunfu Jia
-
Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE
Xinpeng Liu, Junming Liu, Peiyu Liu, Han Zheng, Qinying Wang, Mathias Payer, Shouling Ji, Wenhai Wang
-
Inference Attacks on Encrypted Online Voting via Traffic Analysis
Anastasiia Belousova, Francesco Marchiori, Mauro Conti
-
Dongyang Zhan, Kai Tan, Lin Ye, Xiangzhan Yu, Hongli Zhang, Zheng He
-
Secure Confidential Business Information When Sharing Machine Learning Models
Yunfan Yang, Jiarong Xu, Hongzhe Zhang, Xiao Fang
-
Evaluating CxG Generalisation in LLMs via Construction-Based NLI Fine Tuning
Tom Mackintosh, Harish Tayyar Madabushi, Claire Bonial
-
Overfitting in Adaptive Robust Optimization
Karl Zhu, Dimitris Bertsimas
-
Davide Ettori, Nastaran Darabi, Sina Tayebati, Ranganath Krishnan, Mahesh Subedar, Omesh Tickoo, Amit Ranjan Trivedi
-
SynBench: A Benchmark for Differentially Private Text Generation
Yidan Sun, Viktor Schlegel, Srinivasan Nandakumar, Iqra Zahid, Yuping Wu, Yulong Wu, Hao Li, Jie Zhang, Warren Del-Pinto, Goran Nenadic, Siew Kei Lam, Anil Anthony Bharath
-
Enhancing Retrieval Augmentation via Adversarial Collaboration
Letian Zhang, Guanghao Meng, Xudong Ren, Yiming Wang, Shu-Tao Xia
-
Sentinel Agents for Secure and Trustworthy Agentic AI in Multi-Agent Systems
Diego Gosmar, Deborah A. Dahl
-
LLM Jailbreak Detection for (Almost) Free!
Guorui Chen, Yifan Xia, Xiaojun Jia, Zhijiang Li, Philip Torr, Jindong Gu
-
Enterprise AI Must Enforce Participant-Aware Access Control
Shashank Shreedhar Bhatt, Tanmay Rajore, Khushboo Aggarwal, Ganesh Ananthanarayanan, Ranveer Chandra, Nishanth Chandran, Suyash Choudhury, Divya Gupta, Emre Kiciman, Sumit Kumar Pandey, Srinath Setty, Rahul Sharma, Teijia Zhao
-
Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection
Yihao Guo, Haocheng Bian, Liutong Zhou, Ze Wang, Zhaoyi Zhang, Francois Kawala, Milan Dean, Ian Fischer, Yuantao Peng, Noyan Tokgozoglu, Ivan Barrientos, Riyaaz Shaik, Rachel Li, Chandru Venkataraman, Reza Shifteh Far, Moses Pawar, Venkat Sundaranatha, Michael Xu, Frank Chu
-
Reveal and Release: Iterative LLM Unlearning with Self-generated Data
Linxi Xie, Xin Teng, Shichang Ke, Hongyi Wen, Shengjie Wang
-
Siyu Yan, Long Zeng, Xuecheng Wu, Chengcheng Han, Kongcheng Zhang, Chong Peng, Xuezhi Cao, Xunliang Cai, Chenjuan Guo
-
[Re] Improving Interpretation Faithfulness for Vision Transformers
Izabela Kurek, Wojciech Trejter, Stipe Frkovic, Andro Erdelez
-
Discrete optimal transport is a strong audio adversarial attack
Anton Selitskiy, Akib Shahriyar, Jishnuraj Prakasan
-
Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning
Simin Li, Zheng Yuwei, Zihao Mao, Linhao Wang, Ruixiao Xu, Chengdong Ma, Xin Yu, Yuqing Ma, Qi Dou, Xin Wang, Jie Luo, Bo An, Yaodong Yang, Weifeng Lv, Xianglong Liu
-
Watermarking and Anomaly Detection in Machine Learning Models for LORA RF Fingerprinting
Aarushi Mahajan, Wayne Burleson
-
Ruijie Hou, Yueyang Jiao, Hanxu Hu, Yingming Li, Wai Lam, Huajian Zhang, Hongyuan Lu
-
AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt
Saket S. Chaturvedi, Gaurav Bagwe, Lan Zhang, Xiaoyong Yuan
-
Edge-Aware Normalized Attention for Efficient and Detail-Preserving Single Image Super-Resolution
Penghao Rao, Tieyong Zeng
-
Geometric Image Synchronization with Deep Watermarking
Pierre Fernandez, Tomáš Souček, Nikola Jovanović, Hady Elsahar, Sylvestre-Alvise Rebuffi, Valeriu Lacatusu, Tuan Tran, Alexandre Mourachko
-
Xingchen Wang, Feijie Wu, Chenglin Miao, Tianchun Li, Haoyu Hu, Qiming Cao, Jing Gao, Lu Su
-
CUFG: Curriculum Unlearning Guided by the Forgetting Gradient
Jiaxing Miao, Liang Hu, Qi Zhang, Lai Zhong Yuan, Usman Naseem
-
STEP: Structured Training and Evaluation Platform for benchmarking trajectory prediction models
Julian F. Schumann, Anna Mészáros, Jens Kober, Arkady Zgonnikov
-
Yigit E. Yildirim, Samet Demir, Zafer Dogan
-
Yuanbo Xie, Yingjie Zhang, Tianyun Liu, Duohe Ma, Tingwen Liu
-
Evil Vizier: Vulnerabilities of LLM-Integrated XR Systems
Yicheng Zhang, Zijian Huang, Sophie Chen, Erfan Shayegani, Jiasi Chen, Nael Abu-Ghazaleh
-
Acoustic Simulation Framework for Multi-channel Replay Speech Detection
Michael Neri, Tuomas Virtanen
-
ORCA: Agentic Reasoning For Hallucination and Adversarial Robustness in Vision-Language Models
Chung-En Johnny Yu, Hsuan-Chih (Neil)Chen, Brian Jalaian, Nathaniel D. Bastian
-
Impact of Phonetics on Speaker Identity in Adversarial Voice Attack
Daniyal Kabir Dar, Qiben Yan, Li Xiao, Arun Ross
-
Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages
Yujia Hu, Ming Shan Hee, Preslav Nakov, Roy Ka-Wei Lee
-
Red Teaming Multimodal Language Models: Evaluating Harm Across Prompt Modalities and Models
Madison Van Doren, Casey Ford, Emily Dix
-
Stochastic Sample Approximations of (Local) Moduli of Continuity
Rodion Nazarov, Allen Gehret, Robert Shorten, Jakub Marecek
-
Adversarial generalization of unfolding (model-based) networks
Vicky Kouni
-
Assessing metadata privacy in neuroimaging
Emilie Kibsgaard, Anita Sue Jwa, Christopher J Markiewicz, David Rodriguez Gonzalez, Judith Sainz Pardo, Russell A. Poldrack, Cyril R. Pernet
-
Benchmarking and Improving LLM Robustness for Personalized Generation
Chimaobi Okite, Naihao Deng, Kiran Bodipati, Huaidian Hou, Joyce Chai, Rada Mihalcea
-
Semantic Representation Attack against Aligned Large Language Models
Jiawei Lian, Jianhong Pan, Lefan Wang, Yi Wang, Shaohui Mei, Lap-Pui Chau
-
Adversarial generalization of unfolding (model-based) networks
Vicky Kouni
-
DSCC-HS: A Dynamic Self-Reinforcing Framework for Hallucination Suppression in Large Language Models
Xiao Zheng
-
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning
Zhaoyang Chu, Yao Wan, Zhikun Zhang, Di Wang, Zhou Yang, Hongyu Zhang, Pan Zhou, Xuanhua Shi, Hai Jin, David Lo
-
Differential Privacy in Federated Learning: Mitigating Inference Attacks with Randomized Response
Ozer Ozturk, Busra Buyuktanir, Gozde Karatas Baydogmus, Kazim Yildiz
-
Privacy-Aware In-Context Learning for Large Language Models
Bishnu Bhusal, Manoj Acharya, Ramneet Kaur, Colin Samplawski, Anirban Roy, Adam D. Cobb, Rohit Chadha, Susmit Jha
-
StyleProtect: Safeguarding Artistic Identity in Fine-tuned Diffusion Models
Qiuyu Tang, Joshua Krinsky, Aparna Bharati
-
Wenkui Yang, Jie Cao, Junxian Duan, Ran He
-
Niruthiha Selvanayagam, Ted Kurti
-
Secure UAV-assisted Federated Learning: A Digital Twin-Driven Approach with Zero-Knowledge Proofs
Md Bokhtiar Al Zami, Md Raihan Uddin, Dinh C. Nguyen
-
ParaAegis: Parallel Protection for Flexible Privacy-preserved Federated Learning
Zihou Wu, Yuecheng Li, Tianchi Liao, Jian Lou, Chuan Chen
-
Differentially private federated learning for localized control of infectious disease dynamics
Raouf Kerkouche, Henrik Zunker, Mario Fritz, Martin J. Kühn
-
Defending Diffusion Models Against Membership Inference Attacks via Higher-Order Langevin Dynamics
Benjamin Sterling, Yousef El-Laham, Mónica F. Bugallo
-
Mert Gürbüzbalaban, Yasa Syed, Necdet Serhat Aybat
-
Baolei Zhang, Haoran Xin, Yuxi Chen, Zhuqing Liu, Biao Yi, Tong Li, Lihai Nie, Zheli Liu, Minghong Fang
-
Cybersecurity AI: Humanoid Robots as Attack Vectors
Víctor Mayoral-Vilches
-
VCBench: Benchmarking LLMs in Venture Capital
Rick Chen, Joseph Ternasky, Afriyie Samuel Kwesi, Ben Griffin, Aaron Ontoyin Yin, Zakari Salifu, Kelvin Amoaba, Xianling Mu, Fuat Alican, Yigit Ihlamur
-
A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness
Xuan Luo, Yue Wang, Zefeng He, Geng Tu, Jing Li, Ruifeng Xu
-
RLBind: Adversarial-Invariant Cross-Modal Alignment for Unified Robust Embeddings
Yuhong Lu
-
Bihao Zhan, Jie Zhou, Junsong Li, Yutao Yang, Shilian Chen, Qianjun Pan, Xin Li, Wen Wu, Xingjiao Wu, Qin Chen, Hang Yan, Liang He
-
RepIt: Representing Isolated Targets to Steer Language Models
Vincent Siu, Nathan W. Henry, Nicholas Crispino, Yang Liu, Dawn Song, Chenguang Wang
-
DisorientLiDAR: Physical Attacks on LiDAR-based Localization
Yizhen Lao, Yu Zhang, Ziting Wang, Chengbo Wang, Yifei Xue, Wanpeng Shao
-
CIARD: Cyclic Iterative Adversarial Robustness Distillation
Liming Lu, Shuchao Pang, Xu Zheng, Xiang Gu, Anan Du, Yunhuai Liu, Yongbin Zhou
-
A Systematic Evaluation of Parameter-Efficient Fine-Tuning Methods for the Security of Code LLMs
Kiho Lee, Jungkon Kim, Doowon Kim, Hyoungshick Kim
-
Jinjie Shen, Yaxiong Wang, Lechao Cheng, Nan Pu, Zhun Zhong
-
Defense-to-Attack: Bypassing Weak Defenses Enables Stronger Jailbreaks in Vision-Language Models
Yunhan Zhao, Xiang Zheng, Xingjun Ma
-
Jailbreaking Large Language Models Through Content Concretization
Johan Wahréus, Ahmed Hussain, Panos Papadimitratos
-
Sy-FAR: Symmetry-based Fair Adversarial Robustness
Haneen Najjar, Eyal Ronen, Mahmood Sharif
-
MIA-EPT: Membership Inference Attack via Error Prediction for Tabular Data
Eyal German, Daniel Samira, Yuval Elovici, Asaf Shabtai
-
JANUS: A Dual-Constraint Generative Framework for Stealthy Node Injection Attacks
Jiahao Zhang, Xiaobing Pei, Zhaokun Zhong, Wenqiang Hao, Zhenghao Tang
-
Shaz Furniturewala, Arkaitz Zubiaga
-
Empowering LLMs with Parameterized Skills for Adversarial Long-Horizon Planning
Sijia Cui, Shuai Xu, Aiyao He, Yanna Wang, Bo Xu
-
Do Natural Language Descriptions of Model Activations Convey Privileged Information?
Millicent Li, Alberto Mario Ceballos Arroyo, Giordano Rogers, Naomi Saphra, Byron C. Wallace
-
When Inverse Data Outperforms: Exploring the Pitfalls of Mixed Data in Multi-Stage Fine-Tuning
Mengyi Deng, Xin Li, Tingyu Zhu, Zhicheng Yang, Zhijiang Guo, Wei Wang
-
Agent4FaceForgery: Multi-Agent LLM Framework for Realistic Face Forgery Detection
Yingxin Lai, Zitong Yu, Jun Wang, Linlin Shen, Yong Xu, Xiaochun Cao
-
End4: End-to-end Denoising Diffusion for Diffusion-Based Inpainting Detection
Fei Wang, Xuecheng Wu, Zheng Zhang, Danlei Huang, Yuheng Huang, BoWang
-
Sanjeda Akter, Ibne Farabi Shihab, Anuj Sharma
-
BAPFL: Exploring Backdoor Attacks Against Prototype-based Federated Learning
Honghong Zeng, Jiong Lou, Zhe Wang, Hefeng Zhou, Chentao Wu, Wei Zhao, Jie Li
-
On the Out-of-Distribution Backdoor Attack for Federated Learning
Jiahao Xu, Zikai Zhang, Rui Hu
-
Zhen Li, Zijian Zhang, Wenjin Yang, Pengbo Wang, Zhaoqi Wang, Meng Li, Yan Wu, Xuyang Liu, Jing Sun, Liehuang Zhu
-
Bridging Threat Models and Detections: Formal Verification via CADP
Dumitru-Bogdan Prelipcean, Cătălin Dima
-
Artem Savkin, Thomas Lapotre, Kevin Strauss, Uzair Akbar, Federico Tombari
-
Valuation of Exotic Options and Counterparty Games Based on Conditional Diffusion
Helin Zhao, Junchi Shen
-
Onat Gungor, Roshan Sood, Harold Wang, Tajana Rosing
-
FedMentor: Domain-Aware Differential Privacy for Heterogeneous Federated LLMs in Mental Health
Nobin Sarwar, Shubhashis Roy Dipta
-
Beyond Data Privacy: New Privacy Risks for Large Language Models
Yuntao Du, Zitao Li, Ninghui Li, Bolin Ding
-
Vaidehi Patil, Elias Stengel-Eskin, Mohit Bansal
-
A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks
S M Asif Hossain, Ruksat Khan Shayoni, Mohd Ruhul Ameen, Akif Islam, M. F. Mridha, Jungpil Shin
-
Towards mitigating information leakage when evaluating safety monitors
Gerard Boxo, Aman Neelappa, Shivam Raval
-
SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs
Vincent Siu, Nicholas Crispino, David Park, Nathan W. Henry, Zhun Wang, Yang Liu, Dawn Song, Chenguang Wang
-
Li Cuihong, Huang Xiaowen, Yin Chuanhuan, Sang Jitao
-
Wei Cai, Shujuan Liu, Jian Zhao, Ziyan Shi, Yusheng Zhao, Yuchen Yuan, Tianle Zhang, Chi Zhang, Xuelong Li
-
Inducing Uncertainty for Test-Time Privacy
Muhammad H. Ashiq, Peter Triantafillou, Hung Yun Tseng, Grigoris G. Chrysos
-
Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check
Chentao Cao, Xiaojun Xu, Bo Han, Hang Li
-
Collapse of Irrelevant Representations (CIR) Ensures Robust and Non-Disruptive LLM Unlearning
Filip Sondej, Yushi Yang
-
Navid Hashemi, Samuel Sasaki, Diego Manzanas Lopez, Ipek Oguz, Meiyi Ma, Taylor T. Johnson
-
James C. Ward, Alex Bott, Connor York, Edmund R. Hunt
-
Poison to Detect: Detection of Targeted Overfitting in Federated Learning
Soumia Zohra El Mestari, Maciej Krzysztof Zuziak, Gabriele Lenzini
-
Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference
Synthia Wang, Sai Teja Peddinti, Nina Taft, Nick Feamster
-
A Controllable 3D Deepfake Generation Framework with Gaussian Splatting
Wending Liu, Siyun Liang, Huy H. Nguyen, Isao Echizen
-
Robust Concept Erasure in Diffusion Models: A Theoretical Perspective on Security and Robustness
Zixuan Fu, Yan Ren, Finn Carter, Chenyue Wen, Le Ku, Daheng Yu, Emily Davis, Bo Zhang
-
DRAG: Data Reconstruction Attack using Guided Diffusion
Wa-Kin Lei, Jun-Cheng Chen, Shang-Tse Chen
-
DARD: Dice Adversarial Robustness Distillation against Adversarial Attacks
Jing Zou, Shungeng Zhang, Meikang Qiu, Chong Li
-
From Autoencoders to CycleGAN: Robust Unpaired Face Manipulation via Adversarial Learning
Collin Guo
-
Removal Attack and Defense on AI-generated Content Latent-based Watermarking
De Zhang Lee, Han Fang, Hanyi Wang, Ee-Chien Chang
-
A Practical Adversarial Attack against Sequence-based Deep Learning Malware Classifiers
Kai Tan, Dongyang Zhan, Lin Ye, Hongli Zhang, Binxing Fang
-
NeuroStrike: Neuron-Level Attacks on Aligned LLMs
Lichao Wu, Sasha Behrouzi, Mohamadreza Rostami, Maximilian Thang, Stjepan Picek, Ahmad-Reza Sadeghi
-
Efficient Byzantine-Robust Privacy-Preserving Federated Learning via Dimension Compression
Xian Qin, Xue Yang, Xiaohu Tang
-
MORABLES: A Benchmark for Assessing Abstract Moral Reasoning in LLMs with Fables
Matteo Marcuzzo, Alessandro Zangari, Andrea Albarelli, Jose Camacho-Collados, Mohammad Taher Pilehvar
-
Geometric Red-Teaming for Robotic Manipulation
Divyam Goel, Yufei Wang, Tiancheng Wu, Guixiu Qiao, Pavel Piliptchak, David Held, Zackory Erickson
-
Amulet: a Python Library for Assessing Interactions Among ML Defenses and Risks
Asim Waheed, Vasisht Duddu, Rui Zhang, Sebastian Szyller, N. Asokan
-
Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time
Yifan Lan, Yuanpu Cao, Weitong Zhang, Lu Lin, Jinghui Chen
-
Secure Human Oversight of AI: Exploring the Attack Surface of Human Oversight
Jonas C. Ditz, Veronika Lazar, Elmar Lichtmeß, Carola Plesch, Matthias Heck, Kevin Baum, Markus Langer
-
Redefining Website Fingerprinting Attacks With Multiagent LLMs
Chuxu Song, Dheekshith Dev Manohar Mekala, Hao Wang, Richard Martin
-
Gustavo Sandoval, Denys Fenchenko, Junyao Chen
-
Free-MAD: Consensus-Free Multi-Agent Debate
Yu Cui, Hang Fu, Haibin Zhang, Licheng Wang, Cong Zuo
-
Membership Inference Attacks on Recommender System: A Survey
Jiajie He, Yuechun Gu, Keke Chen, Xintong Chen
-
ENJ: Optimizing Noise with Genetic Algorithms to Jailbreak LSMs
Yibo Zhang, Liang Lin
-
Feature Space Topology Control via Hopkins Loss
Einari Vaaras, Manu Airaksinen
-
Simin Chen, Jinjun Peng, Yixin He, Junfeng Yang, Baishakhi Ray
-
From Firewalls to Frontiers: AI Red-Teaming is a Domain-Specific Evolution of Cyber Red-Teaming
Anusha Sinha, Keltin Grimes, James Lucassen, Michael Feffer, Nathan VanHoudnos, Zhiwei Steven Wu, Hoda Heidari
-
When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity
Shiyao Cui, Xijia Feng, Yingkang Wang, Junxiao Yang, Zhexin Zhang, Biplab Sikdar, Hongning Wang, Han Qiu, Minlie Huang
-
RanAT4BIE: Random Adversarial Training for Biomedical Information Extraction
Jian Chen, Shengyi Lv, Leilei Su
-
Beyond Sliders: Mastering the Art of Diffusion-based Image Manipulation
Yufei Tang, Daiheng Gao, Pingyu Wu, Wenbo Zhou, Bang Zhang, Weiming Zhang
-
Gao Yu Lee, Tanmoy Dam, Md Meftahul Ferdaus, Daniel Puiu Poenar, Vu N.Duong
-
Realistic Environmental Injection Attacks on GUI Agents
Yitong Zhang, Ximo Li, Liyi Cai, Jia Li
-
Stabilizing Data-Free Model Extraction
Dat-Thinh Nguyen, Kim-Hung Le, Nhien-An Le-Khac
-
On the Escaping Efficiency of Distributed Adversarial Training Algorithms
Ying Cao, Kun Yuan, Ali H. Sayed
-
Qingzhao Zhang, Shaocheng Luo, Z. Morley Mao, Miroslav Pajic, Michael K. Reiter
-
Doan Minh Trung, Tien Duc Anh Hao, Luong Hoang Minh, Nghi Hoang Khoa, Nguyen Tan Cam, Van-Hau Pham, Phan The Duy
-
Tao Wang, Yushu Zhang, Xiangli Xiao, Kun Xu, Lin Yuan, Wenying Wen, Yuming Fang
-
MAUI: Reconstructing Private Client Data in Federated Transfer Learning
Ahaan Dabholkar, Atul Sharma, Z. Berkay Celik, Saurabh Bagchi
-
Syed Emad Uddin Shubha, Tasnuva Farheen
-
Hybrid Quantum-Classical Model for Image Classification
Muhammad Adnan Shahzad
-
Self-Evolving LLMs via Continual Instruction Tuning
Jiazheng Kang, Le Huang, Cheng Hou, Zhe Zhao, Zhenxiang Yan, Chuan Shi, Ting Bai
-
Pathological Truth Bias in Vision-Language Models
Yash Thube
-
Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding
Seongho Joo, Hyukhun Koh, Kyomin Jung
-
Public Data Assisted Differentially Private In-Context Learning
Seongho Joo, Hyukhun Koh, Kyomin Jung
-
Farhan Sadik, Christopher L. Newman, Stuart J. Warden, Rachel K. Surowiec
-
Robustifying Diffusion-Denoised Smoothing Against Covariate Shift
Ali Hedayatnia, Mostafa Tavassolipour, Babak Nadjar Araabi, Abdol-Hossein Vahabie
-
A Modern Look at Simplicity Bias in Image Classification Tasks
Xiaoguang Chang, Teng Wang, Changyin Sun
-
A Biosecurity Agent for Lifecycle LLM Biosecurity Alignment
Meiyin Meng, Zaixi Zhang
-
Hailong Yang, Renhuo Zhao, Guanjin Wang, Zhaohong Deng
-
Adaptive Token Merging for Efficient Transformer Semantic Communication at the Edge
Omar Erak, Omar Alhussein, Hatem Abou-Zeid, Mehdi Bennis, Sami Muhaidat
-
Adversarial robustness through Lipschitz-Guided Stochastic Depth in Neural Networks
Laith Nayal, Mahmoud Mousatat, Bader Rasheed
-
Immunizing Images from Text to Image Editing via Adversarial Cross-Attention
Matteo Trippodo, Federico Becattini, Lorenzo Seidenari
-
Mohammad Hasan Narimani, Mostafa Tavassolipour
-
Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications
Janis Keuper
-
When Your Reviewer is an LLM: Biases, Divergence, and Prompt Injection Risks in Peer Review
Changjia Zhu, Junjie Xiong, Renkai Ma, Zhicong Lu, Yao Liu, Lingyao Li
-
Machine Unlearning for Responsible and Adaptive AI in Education
Betty Mayeku, Sandra Hummel, Parisa Memarmoshrefi
-
LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems
Vitor Hugo Galhardo Moia, Igor Jochem Sanz, Gabriel Antonio Fontes Rebello, Rodrigo Duarte de Meneses, Briland Hitaj, Ulf Lindqvist
-
Privacy-Preserving Decentralized Federated Learning via Explainable Adaptive Differential Privacy
Fardin Jalil Piran, Zhiling Chen, Yang Zhang, Qianyu Zhou, Jiong Tang, Farhad Imani
-
Jingyu Tang, Chaoran Chen, Jiawen Li, Zhiping Zhang, Bingcan Guo, Ibrahim Khalilov, Simret Araya Gebreegziabher, Bingsheng Yao, Dakuo Wang, Yanfang Ye, Tianshi Li, Ziang Xiao, Yaxing Yao, Toby Jia-Jun Li
-
Safety and Security Analysis of Large Language Models: Risk Profile and Harm Potential
Charankumar Akiri, Harrison Simpson, Kshitiz Aryal, Aarav Khanna, Maanak Gupta
-
Side-channel Inference of User Activities in AR/VR Using GPU Profiling
Seonghun Son, Chandrika Mukherjee, Reham Mohamed Aburas, Berk Gulmezoglu, Z. Berkay Celik
-
JU-NLP at Touché: Covert Advertisement in Conversational AI-Generation and Detection Strategies
Arka Dutta, Agrik Majumdar, Sombrata Biswas, Dipankar Das, Sivaji Bandyopadhyay
-
Enabling Regulatory Multi-Agent Collaboration: Architecture, Challenges, and Solutions
Qinnan Hu, Yuntao Wang, Yuan Gao, Zhou Su, Linkang Du
-
Towards Confidential and Efficient LLM Inference with Dual Privacy Protection
Honglan Yu, Yibin Wang, Feifei Dai, Dong Liu, Haihui Fan, Xiaoyan Gu
-
Character-Level Perturbations Disrupt LLM Watermarks
Zhaoxi Zhang, Xiaomei Zhang, Yanjun Zhang, He Zhang, Shirui Pan, Bo Liu, Asif Qumer Gill, Leo Yu Zhang
-
Prompt Pirates Need a Map: Stealing Seeds helps Stealing Prompts
Felix Mächtle, Ashwath Shetty, Jonas Sander, Nils Loose, Sören Pirk, Thomas Eisenbarth
-
OpenFake: An Open Dataset and Platform Toward Large-Scale Deepfake Detection
Victor Livernoche, Akshatha Arodi, Andreea Musulan, Zachary Yang, Adam Salvail, Gaétan Marceau Caron, Jean-François Godbout, Reihaneh Rabbany
-
Steering MoE LLMs via Expert (De)Activation
Mohsen Fayyaz, Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Ryan Rossi, Trung Bui, Hinrich Schütze, Nanyun Peng
-
Balancing Utility and Privacy: Dynamically Private SGD with Random Projection
Zhanhong Jiang, Md Zahid Hasan, Nastaran Saadati, Aditya Balu, Chao Liu, Soumik Sarkar
-
ProDiGy: Proximity- and Dissimilarity-Based Byzantine-Robust Federated Learning
Sena Ergisi, Luis Maßny, Rawad Bitar
-
Representation-Aware Distributionally Robust Optimization: A Knowledge Transfer Framework
Zitao Wang, Nian Si, Molei Liu
-
ZORRO: Zero-Knowledge Robustness and Privacy for Split Learning (Full Version)
Nojan Sheybani, Alessandro Pegoraro, Jonathan Knauer, Phillip Rieger, Elissa Mollakuqe, Farinaz Koushanfar, Ahmad-Reza Sadeghi
-
Images in Motion?: A First Look into Video Leakage in Collaborative Deep Learning
Md Fazle Rasul, Alanood Alqobaisi, Bruhadeshwar Bezawada, Indrakshi Ray
-
Chengyu Yang, Rishik Reddy Yesgari, Chengjun Liu
-
Jiaqi Weng, Han Zheng, Hanyu Zhang, Qinqin He, Jialing Tao, Hui Xue, Zhixuan Chu, Xiting Wang
-
The Coding Limits of Robust Watermarking for Generative Models
Danilo Francati, Yevin Nikhel Goonatilake, Shubham Pawar, Daniele Venturi, Giuseppe Ateniese
-
Symmetry-Guided Multi-Agent Inverse Reinforcement Learnin
Yongkai Tian, Yirong Qi, Xin Yu, Wenjun Wu, Jie Luo
-
Adversarial Attacks Against Automated Fact-Checking: A Survey
Fanzhen Liu, Alsharif Abuadbba, Kristen Moore, Surya Nepal, Cecile Paris, Jia Wu, Jian Yang, Quan Z. Sheng
-
Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations
Ron F. Del Rosario, Klaudia Krawiecka, Christian Schroeder de Witt
-
X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates
Hyunjun Kim, Junwoo Ha, Sangyoon Yu, Haon Park
-
Vivek Oommen, Siavash Khodakarami, Aniruddha Bora, Zhicheng Wang, George Em Karniadakis
-
Seongho Kim, Sejong Ryu, Hyoukjun You, Je Hyeong Hong
-
Ensemble Distribution Distillation for Self-Supervised Human Activity Recognition
Matthew Nolan, Lina Yao, Robert Davidson
-
Perfectly-Private Analog Secure Aggregation in Federated Learning
Delio Jaramillo-Velez, Charul Rajput, Ragnar Freij-Hollanti, Camilla Hollanti, Alexandre Graell i Amat
-
Shun Takagi, Satoshi Hasegawa
-
Tight Privacy Audit in One Run
Zihang Xiang, Tianhao Wang, Hanshen Xiao, Yuan Tian, Di Wang
-
Approximate Algorithms for Verifying Differential Privacy with Gaussian Distributions
Bishnu Bhusal, Rohit Chadha, A. Prasad Sistla, Mahesh Viswanathan
-
Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates
Sreejeet Maity, Aritra Mitra
-
Group Distributionally Robust Machine Learning under Group Level Distributional Uncertainty
Xenia Konti, Yi Shen, Zifan Wang, Karl Henrik Johansson, Michael J. Pencina, Nicoleta J. Economou-Zavlanos, Michael M. Zavlanos
-
Quantum Error Correction in Adversarial Regimes
Rahul Arvind, Nikhil Bansal, Dax Enshan Koh, Tobias Haug, Kishor Bharti
-
AVEC: Bootstrapping Privacy for Local LLMs
Madhava Gaikwad
-
X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates
Hyunjun Kim, Junwoo Ha, Sangyoon Yu, Haon Park
-
How Far Are We from True Unlearnability?
Kai Ye, Liangcai Su, Chenxiong Qian
-
Nearest Neighbor Projection Removal Adversarial Training
Himanshu Singh, A. V. Subramanyam, Shivank Rajput, Mohan Kankanhalli
-
Hammer and Anvil: A Principled Defense Against Backdoors in Federated Learning
Lucas Fenaux, Zheng Wang, Jacob Yan, Nathan Chung, Florian Kerschbaum
-
Sketched Gaussian Mechanism for Private Federated Learning
Qiaobo Li, Zhijie Chen, Arindam Banerjee
-
SAGE: Sample-Aware Guarding Engine for Robust Intrusion Detection Against Adversarial Attacks
Jing Chen, Onat Gungor, Zhengli Shang, Tajana Rosing
-
Asynchronous Gossip Algorithms for Rank-Based Statistical Methods
Anna Van Elst, Igor Colin, Stephan Clémençon
-
Kamel Kamel, Hridoy Sankar Dutta, Keshav Sood, Sunil Aryal
-
Meryem Malak Dif, Mouhamed Amine Bouchiha, Abdelaziz Amara Korba, Yacine Ghamri-Doudane
-
When Secure Isn't: Assessing the Security of Machine Learning Model Sharing
Gabriele Digregorio, Marco Di Gennaro, Stefano Zanero, Stefano Longari, Michele Carminati
-
Sequentially Auditing Differential Privacy
Tomás González, Mateo Dulce-Rubio, Aaditya Ramdas, Mónica Ribero
-
From Noise to Narrative: Tracing the Origins of Hallucinations in Transformers
Praneet Suresh, Jack Stanley, Sonia Joseph, Luca Scimeca, Danilo Bzdok
-
Mind Your Server: A Systematic Study of Parasitic Toolchain Attacks on the MCP Ecosystem
Shuli Zhao, Qinsheng Hou, Zihan Zhan, Yanhao Wang, Yuchong Xie, Yu Guo, Libo Chen, Shenghong Li, Zhi Xue
-
Imitative Membership Inference Attack
Yuntao Du, Yuetian Chen, Hanshen Xiao, Bruno Ribeiro, Ninghui Li
-
RetinaGuard: Obfuscating Retinal Age in Fundus Images for Biometric Privacy Preserving
Zhengquan Luo, Chi Liu, Dongfu Xiao, Zhen Yu, Yueye Wang, Tianqing Zhu
-
Beyond I'm Sorry, I Can't: Dissecting Large Language Model Refusal
Nirmalendu Prakash, Yeo Wei Jie, Amir Abdullah, Ranjan Satapathy, Erik Cambria, Roy Ka Wei Lee
-
Robust DDoS-Attack Classification with 3D CNNs Against Adversarial Methods
Landon Bragg, Nathan Dorsey, Josh Prior, John Ajit, Ben Kim, Nate Willis, Pablo Rivas
-
Uncovering the Vulnerability of Large Language Models in the Financial Domain via Risk Concealment
Gang Cheng, Haibo Jin, Wenbin Zhang, Haohan Wang, Jun Zhuang
-
If generative AI is the answer, what is the question?
Ambuj Tewari
-
AntiDote: Bi-level Adversarial Training for Tamper-Resistant LLMs
Debdeep Sanyal, Manodeep Ray, Murari Mandal
-
EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System
Pavan Reddy, Aditya Sanjay Gujral
-
Exploit Tool Invocation Prompt for Tool Behavior Hijacking in LLM-Based Agentic System
Yu Liu, Yuchong Xie, Mingyu Luo, Zesen Liu, Zhixiang Zhang, Kaikai Zhang, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She
-
Decoding Latent Attack Surfaces in LLMs: Prompt Injection via HTML in Web Summarization
Ishaan Verma
-
Yours or Mine? Overwriting Attacks Against Neural Audio Watermarking
Lingfeng Yao, Chenpei Huang, Shengyao Wang, Junpei Xue, Hanqing Guo, Jiang Liu, Phone Lin, Tomoaki Ohtsuki, Miao Pan
-
Zhenhua Xu, Xixiang Zhao, Xubin Yue, Shengwei Tian, Changting Lin, Meng Han
-
Taniya Gidatkar, Oluwaseun Ajao, Matthew Shardlow
-
Contextuality, Holonomy and Discrete Fiber Bundles in Group-Valued Boltzmann Machines
Jean-Pierre Magnot
-
Pre-Forgettable Models: Prompt Learning as a Native Mechanism for Unlearning
Rutger Hendrix, Giovanni Patanè, Leonardo G. Russo, Simone Carnemolla, Giovanni Bellitto, Federica Proietto Salanitri, Concetto Spampinato, Matteo Pennisi
-
A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models
Yanbo Wang, Yongcan Yu, Jian Liang, Ran He
-
NeuroBreak: Unveil Internal Jailbreak Mechanisms in Large Language Models
Chuhan Zhang, Ye Zhang, Bowen Shi, Yuyou Gan, Tianyu Du, Shouling Ji, Dazhan Deng, Yingcai Wu
-
Detecting Regional Spurious Correlations in Vision Transformers via Token Discarding
Solha Kang, Esla Timothy Anzaku, Wesley De Neve, Arnout Van Messem, Joris Vankerschaver, Francois Rameau, Utku Ozbulak
-
False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize
Cheng Wang, Zeming Wei, Qin Liu, Muhao Chen
-
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?
Qinyan Zhang, Xinping Lei, Ruijie Miao, Yu Fu, Haojie Fan, Le Chang, Jiafan Hou, Dingling Zhang, Zhongfei Hou, Ziqiang Yang, Changxin Pu, Fei Hu, Jingkai Liu, Mengyun Liu, Yang Liu, Xiang Gao, Jiaheng Liu, Tong Yang, Zaiyuan Wang, Ge Zhang, Wenhao Huang
-
Shakiba Amirshahi, Amin Bigdeli, Charles L. A. Clarke, Amira Ghenai
-
Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios
Jingen Qu, Lijun Li, Bo Zhang, Yichen Yan, Jing Shao
-
Formal Verification of Local Robustness of a Classification Algorithm for a Spatial Use Case
Delphine Longuet, Amira Elouazzani, Alejandro Penacho Riveiros, Nicola Bastianello
-
Privacy Risks in Time Series Forecasting: User- and Record-Level Membership Inference
Nicolas Johansson, Tobias Olsson, Daniel Nilsson, Johan Östman, Fazeleh Hoseini
-
Qifeng Tan, Shusen Yang, Xuebin Ren, Yikai Zhang
-
Peekaboo, I See Your Queries: Passive Attacks Against DSSE Via Intermittent Observations
Hao Nie, Wei Wang, Peng Xu, Wei Chen, Laurence T. Yang, Mauro Conti, Kaitai Liang
-
An Automated, Scalable Machine Learning Model Inversion Assessment Pipeline
Tyler Shumaker, Jessica Carpenter, David Saranchak, Nathaniel D. Bastian
-
Between a Rock and a Hard Place: Exploiting Ethical Reasoning to Jailbreak LLMs
Shei Pern Chua, Zhen Leng Thai, Teh Kai Jun, Xiao Li, Xiaolin Hu
-
Variational Gaussian Mixture Manifold Models for Client-Specific Federated Personalization
Sai Puppala, Ismail Hossain, Md Jahangir Alam, Sajedul Talukder
-
Xin Tong, Zhi Lin, Jingya Wang, Meng Han, Bo Jin
-
Visible Yet Unreadable: A Systematic Blind Spot of Vision Language Models Across Writing Systems
Jie Zhang, Ting Xu, Gelei Deng, Runyi Hu, Han Qiu, Tianwei Zhang, Qing Guo, Ivor Tsang
-
Yunbo Long, Liming Xu, Lukas Beckenbauer, Yuhan Liu, Alexandra Brintrup
-
Ismail Hossain, Sai Puppala, Md Jahangir Alam, Sajedul Talukder
-
ANNIE: Be Careful of Your Robots
Yiyang Huang, Zixuan Wang, Zishen Wan, Yapeng Tian, Haobo Xu, Yinhe Han, Yiming Gan
-
Alma M. Liezenga, Stefan Wijnja, Puck de Haan, Niels W. T. Brink, Jip J. van Stijn, Yori Kamphuis, Klamer Schutte
-
On the MIA Vulnerability Gap Between Private GANs and Diffusion Models
Ilana Sebag, Jean-Yves Franceschi, Alain Rakotomamonjy, Alexandre Allauzen, Jamal Atif
-
DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling
Yubo Gao, Renbo Tu, Gennady Pekhimenko, Nandita Vijaykumar
-
SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models
Jigang Fan, Zhenghong Zhou, Ruofan Jin, Le Cong, Mengdi Wang, Zaixi Zhang
-
Tzuhsuan Huang, Cheng Yu Yeo, Tsai-Ling Huang, Hong-Han Shuai, Wen-Huang Cheng, Jun-Cheng Chen
-
Background Matters Too: A Language-Enhanced Adversarial Framework for Person Re-Identification
Kaicong Huang, Talha Azfar, Jack M. Reilly, Thomas Guggisberg, Ruimin Ke
-
High Cursive Complex Character Recognition using GAN External Classifier
S M Rafiuddin
-
Backdoor Poisoning Attack Against Face Spoofing Attack Detection Methods
Shota Iwamatsu, Koichi Ito, Takafumi Aoki
-
Hania Ghouse, Muzammil Behzad
-
Kaoru Otsuka, Yuki Takezawa, Makoto Yamada
-
LSAM: Asynchronous Distributed Training with Landscape-Smoothed Sharpness-Aware Minimization
Yunfei Teng, Sixin Zhang
-
Can LLMs Lie? Investigation beyond Hallucination
Haoran Huan, Mihir Prabhudesai, Mengning Wu, Shantanu Jaiswal, Deepak Pathak
-
EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic Fingerprint
Zhenhua Xu, Meng Han, Wenpeng Xing
-
Somiya Chhillar, Mary K. Righi, Rebecca E. Sutter, Evgenios M. Kornaropoulos
-
Federated Learning: An approach with Hybrid Homomorphic Encryption
Pedro Correia, Ivan Silva, Ivone Amorim, Eva Maia, Isabel Praça
-
PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming
Wesley Hanwen Deng, Sunnie S. Y. Kim, Akshita Jha, Ken Holstein, Motahhare Eslami, Lauren Wilcox, Leon A Gatys
-
Learning an Adversarial World Model for Automated Curriculum Generation in MARL
Brennen Hill
-
Stealth by Conformity: Evading Robust Aggregation through Adaptive Poisoning
Ryan McGaughey, Jesus Martinez del Rincon, Ihsen Alouani
-
Silent Until Sparse: Backdoor Attacks on Semi-Structured Sparsity
Wei Guo, Maura Pintor, Ambra Demontis, Battista Biggio
-
Prototype-Guided Robust Learning against Backdoor Attacks
Wei Guo, Maura Pintor, Ambra Demontis, Battista Biggio
-
From Injection to Defense: Constructing Edit-Based Fingerprints for Large Language Models
Yue Li, Xin Yi, Dongsheng Shi, Yongyi Cui, Gerard de Melo, Linlin Wang
-
SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models
Jigang Fan, Zhenghong Zhou, Ruofan Jin, Le Cong, Mengdi Wang, Zaixi Zhang
-
Kesen Wang, Daulet Toibazar, Pedro J. Moreno
-
Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models
Sandipana Dowerah, Atharva Kulkarni, Ajinkya Kulkarni, Hoan My Tran, Joonas Kalda, Artem Fedorchenko, Benoit Fauve, Damien Lolive, Tanel Alumäe, Matthew Magimai Doss
-
Halima Bouzidi, Haoyu Liu, Mohammad Abdullah Al Faruque
-
Sai Teja Reddy Adapala
-
Jian Chen, Jiabao Dou, Jinbao Tian, Yunqi Yang, Zhou Li
-
PIR-RAG: A System for Private Information Retrieval in Retrieval-Augmented Generation
Baiqiang Wang, Qian Lou, Mengxin Zheng, Dongfang Zhao
-
Privacy-Preserving Reasoning with Knowledge-Distilled Parametric Retrieval Augmented Generation
Jinwen Chen, Hainan Zhang, Liang Pang, Yongxin Tong, Haibo Zhou, Yuan Zhan, Wei Lin, Zhiming Zheng
-
Distributed Gossip-GAN for Low-overhead CSI Feedback Training in FDD mMIMO-OFDM Systems
Yuwen Cao, Guijun Liu, Tomoaki Ohtsuki, Howard H. Yang, Tony Q. S. Quek
-
Deep opacity and AI: A threat to XAI and to privacy protection mechanisms
Vincent C. Müller
-
Partially Functional Dynamic Backdoor Diffusion-based Causal Model
Xinwen Liu, Lei Qian, Song Xi Chen, Niansheng Tang
-
When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment
Hanqi Yan, Hainiu Xu, Siya Qi, Shu Yang, Yulan He
-
When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment
Hanqi Yan, Hainiu Xu, Siya Qi, Shu Yang, Yulan He
-
Fengchao Chen, Tingmin Wu, Van Nguyen, Carsten Rudolph
-
GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability
Zhenghao He, Sanchit Sinha, Guangzhi Xiong, Aidong Zhang
-
PromptSleuth: Detecting Prompt Injection via Semantic Intent Invariance
Mengxiao Wang, Yuxuan Zhang, Guofei Gu
-
Enhancing Resilience for IoE: A Perspective of Networking-Level Safeguard
Guan-Yan Yang, Jui-Ning Chen, Farn Wang, Kuo-Hui Yeh
-
Safe-Control: A Safety Patch for Mitigating Unsafe Content in Text-to-Image Generation Models
Xiangtao Meng, Yingkai Dong, Ning Yu, Li Wang, Zheng Li, Shanqing Guo
-
Phased One-Step Adversarial Equilibrium for Video Diffusion Models
Jiaxiang Cheng, Bing Ma, Xuhua Ren, Hongyi Henry Jin, Kai Yu, Peng Zhang, Wenyue Li, Yuan Zhou, Tianxiang Zheng, Qinglin Lu
-
FakeParts: a New Family of AI-Generated DeepFakes
Ziyi Liu, Firas Gabetni, Awais Hussain Sani, Xi Wang, Soobash Daiboo, Gaetan Brison, Gianni Franchi, Vicky Kalogeiton
-
Network-Level Prompt and Trait Leakage in Local Research Agents
Hyejun Jeong, Mohammadreza Teymoorianfard, Abhinav Kumar, Amir Houmansadr, Eugene Bagdasarian
-
Pruning Weights but Not Truth: Safeguarding Truthfulness While Pruning LLMs
Yao Fu, Runchao Li, Xianxuan Long, Haotian Yu, Xiaotian Han, Yu Yin, Pan Li
-
Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents
Zhixin Lin, Jungang Li, Shidong Pan, Yibo Shi, Yue Yao, Dongliang Xu
-
AR$^2$: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models
Cheng-Kai Yeh, Hsing-Wang Lee, Chung-Hung Kuo, Hen-Hsen Huang
-
Sheng Liu, Qiang Sheng, Danding Wang, Yang Li, Guang Yang, Juan Cao
-
Language Models Identify Ambiguities and Exploit Loopholes
Jio Choi, Mohit Bansal, Elias Stengel-Eskin
-
AEGIS : Automated Co-Evolutionary Framework for Guarding Prompt Injections Schema
Ting-Chun Liu, Ching-Yu Hsu, Kuan-Yi Lee, Chi-An Fu, Hung-yi Lee
-
Robustness Assessment and Enhancement of Text Watermarking for Google's SynthID
Xia Han, Qi Li, Jianbing Ni, Mohammad Zulkernine
-
Robustness is Important: Limitations of LLMs for Data Fitting
Hejia Liu, Mochen Yang, Gediminas Adomavicius
-
PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality
Nanxi Li, Zhengyue Zhao, Chaowei Xiao
-
Membership Inference Attacks on LLM-based Recommender Systems
Jiajie He, Yuechun Gu, Min-Chun Chen, Keke Chen
-
Auditing Approximate Machine Unlearning for Differentially Private Models
Yuechun Gu, Jiajie He, Keke Chen
-
FLAegis: A Two-Layer Defense Framework for Federated Learning Against Poisoning Attacks
Enrique Mármol Campos, Aurora González Vidal, José Luis Hernández Ramos, Antonio Skarmeta
-
SegReConcat: A Data Augmentation Method for Voice Anonymization Attack
Ridwan Arefeen, Xiaoxiao Miao, Rong Tong, Aik Beng Ng, Simon See
-
Enhancing Model Privacy in Federated Learning with Random Masking and Quantization
Zhibo Xu, Jianhao Zhu, Jingwen Xu, Changze Lv, Zisu Huang, Xiaohua Wang, Muling Wu, Qi Qian, Xiaoqing Zheng, Xuanjing Huang
-
Tackling Federated Unlearning as a Parameter Estimation Problem
Antonio Balordi, Lorenzo Manini, Fabio Stella, Alessio Merlo
-
Attackers Strike Back? Not Anymore - An Ensemble of RL Defenders Awakens for APT Detection
Sidahmed Benabderrahmane, Talal Rahwan
-
SecureV2X: An Efficient and Privacy-Preserving System for Vehicle-to-Everything (V2X) Applications
Joshua Lee, Ali Arastehfard, Weiran Liu, Xuegang Ban, Yuan Hong
-
UniC-RAG: Universal Knowledge Corruption Attacks to Retrieval-Augmented Generation
Runpeng Geng, Yanting Wang, Ying Chen, Jinyuan Jia
-
Stephen Meisenbacher, Alexandra Klymenko, Andreea-Elena Bodea, Florian Matthes
-
Flatness-aware Curriculum Learning via Adversarial Difficulty
Hiroaki Aizawa, Yoshikazu Hayashi
-
A Closer Look at Edema Area Segmentation in SD-OCT Images Using Adversarial Framework
Yuhui Tao, Yizhe Zhang, Qiang Chen
-
Can we make NeRF-based visual localization privacy-preserving?
Maxime Pietrantoni, Martin Humenberger, Torsten Sattler, Gabriela Csurka
-
Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models
Rui Zhang, Zihan Wang, Tianli Yang, Hongwei Li, Wenbo Jiang, Qingchuan Zhao, Yang Liu, Guowen Xu
-
Saddle Hierarchy in Dense Associative Memory
Robin Thériault, Daniele Tantari
-
Get Global Guarantees: On the Probabilistic Nature of Perturbation Robustness
Wenchuan Mu, Kwan Hui Lim
-
A Tight Context-aware Privacy Bound for Histogram Publication
Sara Saeidian, Ata Yavuzyılmaz, Leonhard Grosse, Georg Schuppe, Tobias J. Oechtering
-
Memorization in Graph Neural Networks
Adarsh Jamadandi, Jing Xu, Adam Dziedzic, Franziska Boenisch
-
Membership Inference Attacks on LLM-based Recommender Systems
Jiajie He, Yuechun Gu, Min-Chun Chen, Keke Chen
-
On Surjectivity of Neural Networks: Can you elicit any behavior from your model?
Haozhe Jiang, Nika Haghtalab
-
Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models
Qiming Guo, Jinwen Tang, Xingran Huang
-
Robustness Feature Adapter for Efficient Adversarial Training
Quanwei Wu, Jun Guo, Wei Wang, Yi Wang
-
Speculative Safety-Aware Decoding
Xuekang Wang, Shengyu Zhu, Xueqi Cheng
-
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo
-
Vocoder-Projected Feature Discriminator
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo
-
Learning from Few Samples: A Novel Approach for High-Quality Malcode Generation
Haijian Ma, Daizong Liu, Xiaowen Cai, Pan Zhou, Yulai Xie
-
ISACL: Internal State Analyzer for Copyrighted Training Data Leakage
Guangwei Zhang, Qisheng Su, Jiateng Liu, Cheng Qian, Yanzhou Pan, Yanjie Fu, Denghui Zhang
-
CATformer: Contrastive Adversarial Transformer for Image Super-Resolution
Qinyi Tian, Spence Cox, Laura E. Dalton
-
SCOUT: Semi-supervised Camouflaged Object Detection by Utilizing Text and Adaptive Data Selection
Weiqi Yan, Lvhai Chen, Shengchuan Zhang, Yan Zhang, Liujuan Cao
-
Does simple trump complex? Comparing strategies for adversarial robustness in DNNs
William Brooks, Marelie H. Davel, Coenraad Mouton
-
FedGreed: A Byzantine-Robust Loss-Based Aggregation Method for Federated Learning
Emmanouil Kritharakis, Antonios Makris, Dusan Jakovetic, Konstantinos Tserpes
-
Quantum-Classical Hybrid Framework for Zero-Day Time-Push GNSS Spoofing Detection
Abyad Enan, Mashrur Chowdhury, Sagar Dasgupta, Mizanur Rahman
-
PhantomLint: Principled Detection of Hidden LLM Prompts in Structured Documents
Toby Murray
-
ClearMask: Noise-Free and Naturalness-Preserving Protection Against Voice Deepfake Attacks
Yuanda Wang, Bocheng Chen, Hanqing Guo, Guangjing Wang, Weikang Ding, Qiben Yan
-
Backprompting: Leveraging Synthetic Production Data for Health Advice Guardrails
Kellen Tan Cheng, Anna Lisa Gentile, Chad DeLuca, Guang-Jie Ren
-
Analise de Desaprendizado de Maquina em Modelos de Classificacao de Imagens Medicas
Andreza M. C. Falcao, Filipe R. Cordeiro
-
Training Language Model Agents to Find Vulnerabilities with CTF-Dojo
Terry Yue Zhuo, Dingmin Wang, Hantian Ding, Varun Kumar, Zijian Wang
-
Privacy-Preserving Federated Learning Framework for Risk-Based Adaptive Authentication
Yaser Baseri, Abdelhakim Senhaji Hafid, Dimitrios Makrakis, Hamidreza Fereidouni
-
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Keke Lian, Bin Wang, Lei Zhang, Libo Chen, Junjie Wang, Ziming Zhao, Yujiu Yang, Miaoqian Lin, Haotong Duan, Haoran Zhao, Shuang Liao, Mingda Guo, Jiazheng Quan, Yilu Zhong, Chenhao He, Zichuan Chen, Jie Wu, Haoling Li, Zhaoxuan Li, Jiongchi Yu, Hui Li, Dong Zhang
-
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
Mia Taylor, James Chua, Jan Betley, Johannes Treutlein, Owain Evans
-
Kaiwen Zuo, Zelin Liu, Raman Dutt, Ziyang Wang, Zhongtian Sun, Yeming Wang, Fan Mo, Pietro Liò
-
Exposing Privacy Risks in Graph Retrieval-Augmented Generation
Jiale Liu, Jiahao Zhang, Suhang Wang
-
Agent-Testing Agent: A Meta-Agent for Automated Testing and Evaluation of Conversational AI Agents
Sameer Komoravolu, Khalil Mrini
-
Activation Transport Operators
Andrzej Szablewski, Marek Masiak
-
Bryan Chen Zhengyu Tan, Daniel Wai Kit Chin, Zhengyuan Liu, Nancy F. Chen, Roy Ka-Wei Lee
-
Advancing Weakly-Supervised Change Detection in Satellite Images via Adversarial Class Prompting
Zhenghui Zhao, Chen Wu, Di Wang, Hongruixuan Chen, Cuiqun Chen, Zhuo Zheng, Bo Du, Liangpei Zhang
-
Uncovering and Mitigating Destructive Multi-Embedding Attacks in Deepfake Proactive Forensics
Lixin Jia, Haiyang Sun, Zhiqing Guo, Yunfeng Diao, Dan Ma, Gaobo Yang
-
AdaGAT: Adaptive Guidance Adversarial Training for the Robustness of Deep Neural Networks
Zhenyu Liu, Huizhi Liang, Xinrun Li, Vaclav Snasel, Varun Ojha
-
Defending Deepfake via Texture Feature Perturbation
Xiao Zhang, Changfang Chen, Tianyi Wang
-
Sharpness-Aware Geometric Defense for Robust Out-Of-Distribution Detection
Jeng-Lin Li, Ming-Ching Chang, Wei-Chao Chen
-
MetaFed: Advancing Privacy, Performance, and Sustainability in Federated Metaverse Systems
Muhammet Anil Yagiz, Zeynep Sude Cengiz, Polat Goktas
-
Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias
Shir Bernstein, David Beste, Daniel Ayzenshteyn, Lea Schonherr, Yisroel Mirsky
-
FRAME : Comprehensive Risk Assessment Framework for Adversarial Machine Learning Threats
Avishag Shapira, Simon Shigol, Asaf Shabtai
-
Adversarial Examples Are Not Bugs, They Are Superposition
Liv Gorton, Owen Lewis
-
Risk Assessment and Security Analysis of Large Language Models
Xiaoyan Zhang, Dongyang Lyu, Xiaoqi Li
-
SoK: Cybersecurity Assessment of Humanoid Ecosystem
Priyanka Prakash Surve, Asaf Shabtai, Yuval Elovici
-
LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions
Maojia Song, Tej Deep Pala, Weisheng Jin, Amir Zadeh, Chuan Li, Dorien Herremans, Soujanya Poria
-
WildSpoof Challenge Evaluation Plan
Yihan Wu, Jee-weon Jung, Hye-jin Shim, Xin Cheng, Xin Wang
-
Hyunjun Kim, Junwoo Ha, Sangyoon Yu, Haon Park
-
NAT: Learning to Attack Neurons for Enhanced Adversarial Transferability
Krishna Kanth Nakka, Alexandre Alahi
-
Unveiling the Latent Directions of Reflection in Large Language Models
Fu-Chieh Chang, Yu-Ting Lee, Pei-Yuan Wu
-
Towards Safeguarding LLM Fine-tuning APIs against Cipher Attacks
Jack Youstra, Mohammed Mahfoud, Yang Yan, Henry Sleight, Ethan Perez, Mrinank Sharma
-
Carlos Soto
-
SALMAN: Stability Analysis of Language Models Through the Maps Between Graph-based Manifolds
Wuxinlin Cheng, Yupeng Cao, Jinwen Wu, Koduvayur Subbalakshmi, Tian Han, Zhuo Feng
-
STA-GANN: A Valid and Generalizable Spatio-Temporal Kriging Approach
Yujie Li, Zezhi Shao, Chengqing Yu, Tangwen Qian, Zhao Zhang, Yifan Du, Shaoming He, Fei Wang, Yongjun Xu
-
An Investigation of Visual Foundation Models Robustness
Sandeep Gupta, Roberto Passerone
-
From Confidence to Collapse in LLM Factual Robustness
Alina Fastowski, Bardh Prenkaj, Gjergji Kasneci
-
LLMSymGuard: A Symbolic Safety Guardrail Framework Leveraging Interpretable Jailbreak Concepts
Darpan Aswal, Céline Hudelot
-
Yu Yan, Sheng Sun, Zhe Wang, Yijun Lin, Zenghao Duan, zhifei zheng, Min Liu, Zhiyi yin, Jianping Zhang
-
HAMSA: Hijacking Aligned Compact Models via Stealthy Automation
Alexey Krylov, Iskander Vagizov, Dmitrii Korzh, Maryam Douiba, Azidine Guezzaz, Vladimir Kokh, Sergey D. Erokhin, Elena V. Tutubalina, Oleg Y. Rogov
-
Guangyu Yang, Jinghong Chen, Jingbiao Mei, Weizhe Lin, Bill Byrne
-
Domain Adaptation via Feature Refinement
Savvas Karatsiolis, Andreas Kamilaris
-
PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting
Hohyun Na, Seunghoo Hong, Simon S. Woo
-
Benchmarking the Robustness of Agentic Systems to Adversarially-Induced Harms
Jonathan Nöther, Adish Singla, Goran Radanovic
-
Quality control in sublinear time: a case study via random graphs
Cassandra Marcussen, Ronitt Rubinfeld, Madhu Sudan
-
Evaluating the Defense Potential of Machine Unlearning against Membership Inference Attacks
Aristeidis Sidiropoulos, Christos Chrysanthos Nikolaidis, Theodoros Tsiolakis, Nikolaos Pavlidis, Vasilis Perifanis, Pavlos S. Efraimidis
-
How to Beat Nakamoto in the Race
Shu-Jie Cao, Dongning Guo
-
Guarding Your Conversations: Privacy Gatekeepers for Secure Interactions with Cloud-Based AI Models
GodsGift Uzor, Hasan Al-Qudah, Ynes Ineza, Abdul Serwadda
-
A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems
Kamel Kamel, Keshav Sood, Hridoy Sankar Dutta, Sunil Aryal
-
Aligning Distributionally Robust Optimization with Practical Deep Learning Needs
Dmitrii Feoktistov, Igor Ignashin, Andrey Veprikov, Nikita Borovko, Alexander Bogdanov, Savelii Chezhegov, Aleksandr Beznosikov
-
Nesrine Benchoubane, Olfa Ben Yahia, William Ferguson, Gurkan Gur, Sumit Chakravarty, Gregory Falco, Gunes Karabulut Kurt
-
Conflict-Aware Soft Prompting for Retrieval-Augmented Generation
Eunseong Choi, June Park, Hyeri Lee, Jongwuk Lee
-
Hengyu An, Jinghuai Zhang, Tianyu Du, Chunyi Zhou, Qingming Li, Tao Lin, Shouling Ji
-
VideoEraser: Concept Erasure in Text-to-Video Diffusion Models
Naen Xu, Jinghuai Zhang, Changjiang Li, Zhi Chen, Chunyi Zhou, Qingming Li, Tianyu Du, Shouling Ji
-
Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation
Yichi Zhang, Yao Huang, Yifan Wang, Yitong Sun, Chang Liu, Zhe Zhao, Zhengwei Fang, Huanran Chen, Xiao Yang, Xingxing Wei, Hang Su, Yinpeng Dong, Jun Zhu
-
Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection
Chengcan Wu, Zeming Wei, Huanran Chen, Yinpeng Dong, Meng Sun
-
Towards a 3D Transfer-based Black-box Attack via Critical Feature Guidance
Shuchao Pang, Zhenghan Chen, Shen Zhang, Liming Lu, Siyuan Liang, Anan Du, Yongbin Zhou
-
A Study of Privacy-preserving Language Modeling Approaches
Pritilata Saha, Abhirup Sinha
-
SafetyFlow: An Agent-Flow System for Automated LLM Safety Benchmarking
Xiangyang Zhu, Yuan Tian, Chunyi Li, Kaiwei Zhang, Wei Sun, Guangtao Zhai
-
SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models
Peng Ding, Wen Sun, Dailin Li, Wei Zou, Jiaming Wang, Jiajun Chen, Shujian Huang
-
Retrieval-Augmented Review Generation for Poisoning Recommender Systems
Shiyi Yang, Xinshu Li, Guanglin Zhou, Chen Wang, Xiwei Xu, Liming Zhu, Lina Yao
-
Adversarial Attacks against Neural Ranking Models via In-Context Learning
Amin Bigdeli, Negar Arabzadeh, Ebrahim Bagheri, Charles L. A. Clarke
-
Adversarial Agent Behavior Learning in Autonomous Driving Using Deep Reinforcement Learning
Arjun Srinivasan, Anubhav Paras, Aniket Bera
-
Fast globally optimal Truncated Least Squares point cloud registration with fixed rotation axis
Ivo Ivanov, Carsten Markgraf
-
DoSReMC: Domain Shift Resilient Mammography Classification using Batch Normalization Adaptation
Uğurcan Akyüz, Deniz Katircioglu-Öztürk, Emre K. Süslü, Burhan Keleş, Mete C. Kaya, Gamze Durhan, Meltem G. Akpınar, Figen B. Demirkazık, Gözde B. Akar
-
SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks
Xiangman Li, Xiaodong Wu, Qi Li, Jianbing Ni, Rongxing Lu
-
Mini-Batch Robustness Verification of Deep Neural Networks
Saar Tzour-Shaday, Dana Drachsler Cohen
-
Kiarash Kazari, Ezzeldin Shereen, György Dán
-
BadFU: Backdoor Federated Learning through Adversarial Machine Unlearning
Bingguang Lu, Hongsheng Hu, Yuantian Miao, Shaleeza Sohail, Chaoxiang He, Shuo Wang, Xiao Chen
-
Strategic Sample Selection for Improved Clean-Label Backdoor Attacks in Text Classification
Onur Alp Kirci, M. Emre Gursoy
-
Foe for Fraud: Transferable Adversarial Attacks in Credit Card Fraud Detection
Jan Lum Fok, Qingwen Zeng, Shiping Chen, Oscar Fawkes, Huaming Chen
-
Jiangfan Liu, Yongkang Guo, Fangzhi Zhong, Tianyuan Zhang, Zonglei Jing, Siyuan Liang, Jiakai Wang, Mingchuan Zhang, Aishan Liu, Xianglong Liu
-
Adversarial Hospital-Invariant Feature Learning for WSI Patch Classification
Mengliang Zhang, Jacob M. Luber
-
Improving Fairness in Graph Neural Networks via Counterfactual Debiasing
Zengyi Wo, Chang Liu, Yumeng Wang, Minglai Shao, Wenjun Wang
-
Sajib Biswas, Mao Nishino, Samuel Jacob Chacko, Xiuwen Liu
-
Distributional Adversarial Attacks and Training in Deep Hedging
Guangyi He, Tobias Sutter, Lukas Gonon
-
Xuezheng Qin, Ruwei Huang, Xiaolong Tang, Feng Li
-
A Lightweight Incentive-Based Privacy-Preserving Smart Metering Protocol for Value-Added Services
Farid Zaredar, Morteza Amini
-
Farid Zaredar, Morteza Amini
-
TAIGen: Training-Free Adversarial Image Generation via Diffusion Models
Susim Roy, Anubhooti Jain, Mayank Vatsa, Richa Singh
-
A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives
Kaixiang Zhao, Lincan Li, Kaize Ding, Neil Zhenqiang Gong, Yue Zhao, Yushun Dong
-
MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs
Ruyi Ding, Tianhong Xu, Xinyi Shen, Aidong Adam Ding, Yunsi Fei
-
Paired-Sampling Contrastive Framework for Joint Physical-Digital Face Attack Detection
Andrei Balykin, Anvar Ganiev, Denis Kondranin, Kirill Polevoda, Nikolai Liudkevich, Artem Petrov
-
Side Effects of Erasing Concepts from Diffusion Models
Shaswati Saha, Sourajit Saha, Manas Gaur, Tejas Gokhale
-
Aura-CAPTCHA: A Reinforcement Learning and GAN-Enhanced Multi-Modal CAPTCHA System
Joydeep Chandra, Prabal Manhas, Ramanjot Kaur, Rashi Sahay
-
Robust Estimation Under Heterogeneous Corruption Rates
Syomantak Chaudhuri, Jerry Li, Thomas A. Courtade
-
Potential and challenges of generative adversarial networks for super-resolution in 4D Flow MRI
Oliver Welin Odeback, Arivazhagan Geetha Balasubramanian, Jonas Schollenberger, Edward Ferdiand, Alistair A. Young, C. Alberto Figueroa, Susanne Schnell, Outi Tammisola, Ricardo Vinuesa, Tobias Granberg, Alexander Fyrdahl, David Marlevi
-
Self-Disguise Attack: Induce the LLM to disguise itself for AIGT detection evasion
Yinghan Zhou, Juan Wen, Wanli Peng, Zhengxian Wu, Ziwei Zhang, Yiming Xue
-
Linkage Attacks Expose Identity Risks in Public ECG Data Sharing
Ziyu Wang, Elahe Khatibi, Farshad Firouzi, Sanaz Rahimi Mousavi, Krishnendu Chakrabarty, Amir M. Rahmani
-
Ashwath Vaithinathan Aravindan, Abha Jha, Matthew Salaway, Atharva Sandeep Bhide, Duygu Nur Yaldiz
-
The AI Risk Spectrum: From Dangerous Capabilities to Existential Threats
Markov Grey, Charbel-Raphaël Segerie
-
Daniel M. Jimenez-Gutierrez, Yelizaveta Falkouskaya, Jose L. Hernandez-Ramos, Aris Anagnostopoulos, Ioannis Chatzigiannakis, Andrea Vitaletti
-
Evaluating Identity Leakage in Speaker De-Identification Systems
Seungmin Seo, Oleg Aulov, Afzal Godil, Kevin Mangold
-
Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation
Dongyoon Hahm, Taywon Min, Woogyeol Jin, Kimin Lee
-
CRISP: Persistent Concept Unlearning via Sparse Autoencoders
Tomer Ashuach, Dana Arad, Aaron Mueller, Martin Tutek, Yonatan Belinkov
-
Kaiwei Zhang, Qi Jia, Zijian Chen, Wei Sun, Xiangyang Zhu, Chunyi Li, Dandan Zhu, Guangtao Zhai
-
Enhancing Robustness of Implicit Neural Representations Against Weight Perturbations
Wenyong Zhou, Yuxin Cheng, Zhengwu Liu, Taiqiang Wu, Chen Zhang, Ngai Wong
-
Yiming Cao, Yanjie Li, Kaisheng Liang, Yuni Lai, Bin Xiao
-
Timestep-Compressed Attack on Spiking Neural Networks through Timestep-Level Backpropagation
Donghwa Kang, Doohyun Kim, Sang-Ki Ko, Jinkyu Lee, Hyeongboo Baek, Brent ByungHoon Kang
-
Backdooring Self-Supervised Contrastive Learning by Noisy Alignment
Tuo Chen, Jie Gui, Minjing Dong, Ju Jia, Lanting Fang, Jian Liu
-
Xiaopeng Peng, Heath Gemar, Erin Fleet, Kyle Novak, Abbie Watnik, Grover Swartzlander
-
Text2Weight: Bridging Natural Language and Neural Network Weight Spaces
Bowen Tian, Wenshuo Chen, Zexi Li, Songning Lai, Jiemin Wu, Yutao Yue
-
Heavy-tailed Linear Bandits: Adversarial Robustness, Best-of-both-worlds, and Beyond
Canzhe Zhao, Shinji Ito, Shuai Li
-
FedUP: Efficient Pruning-based Federated Unlearning for Model Poisoning Attacks
Nicolò Romandini, Cristian Borcea, Rebecca Montanari, Luca Foschini
-
Mohamed Elmahallawy, Tie Luo
-
Beneath the Mask: Can Contribution Data Unveil Malicious Personas in Open-Source Projects?
Ruby Nealon
-
Red Teaming Methodology for Design Obfuscation
Yuntao Liu, Abir Akib, Zelin Lu, Qian Xu, Ankur Srivastava, Gang Qu, David Kehlet, Nij Dorairaj
-
CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection
Jiaming Hu, Haoyu Wang, Debarghya Mukherjee, Ioannis Ch. Paschalidis
-
Xin Wu, Fei Teng, Ji Zhang, Xingwang Li, Yuxuan Liang
-
Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS
Can Jin, Yang Zhou, Qixin Zhang, Hongwu Peng, Di Zhang, Marco Pavone, Ligong Han, Zhang-Wei Hong, Tong Che, Dimitris N. Metaxas
-
MMReview: A Multidisciplinary and Multimodal Benchmark for LLM-Based Peer Review Automation
Xian Gao, Jiacheng Ruan, Zongyun Zhang, Jingsheng Gao, Ting Liu, Yuzhuo Fu
-
Two Birds with One Stone: Multi-Task Detection and Attribution of LLM-Generated Text
Zixin Rao, Youssef Mohamed, Shang Liu, Zeyan Liu
-
Noise Robust One-Class Intrusion Detection on Dynamic Graphs
Aleksei Liuliakov, Alexander Schulz, Luca Hermes, Barbara Hammer
-
MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers
Zhiqiang Wang, Yichao Gao, Yanting Wang, Suyuan Liu, Haifeng Sun, Haoran Cheng, Guanquan Shi, Haohua Du, Xiangyang Li
-
CIA+TA Risk Assessment for AI Reasoning Vulnerabilities
Yuksel Aydin
-
Mechanistic Exploration of Backdoored Large Language Model Attention Patterns
Mohammed Abu Baker, Lakshmi Babu-Saheer
-
Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution
Robert Dilworth
-
MMReview: A Multidisciplinary and Multimodal Benchmark for LLM-Based Peer Review Automation
Xian Gao, Jiacheng Ruan, Zongyun Zhang, Jingsheng Gao, Ting Liu, Yuzhuo Fu
-
Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution
Robert Dilworth
-
Systematic Analysis of MCP Security
Yongjian Guo, Puzhuo Liu, Wanlun Ma, Zehang Deng, Xiaogang Zhu, Peng Di, Xi Xiao, Sheng Wen
-
Robust Federated Learning under Adversarial Attacks via Loss-Based Client Clustering
Emmanouil Kritharakis, Dusan Jakovetic, Antonios Makris, Konstantinos Tserpes
-
RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns
Xin Chen, Junchao Wu, Shu Yang, Runzhe Zhan, Zeyu Wu, Ziyang Luo, Di Wang, Min Yang, Lidia S. Chao, Derek F. Wong
-
Drifting Away from Truth: GenAI-Driven News Diversity Challenges LVLM-Based Misinformation Detection
Fanxiao Li, Jiaying Wu, Tingchao Fu, Yunyun Dong, Bingbing Song, Wei Zhou
-
Jaeung Lee, Suhyeon Yu, Yurim Jang, Simon S. Woo, Jaemin Jo
-
Jinyu Lu, Xinrong Sun, Yunting Tao, Tong Ji, Fanyu Kong, Guoqiang Yang
-
The Hidden Cost of Correlation: Rethinking Privacy Leakage in Local Differential Privacy
Sandaru Jayawardana, Sennur Ulukus, Ming Ding, Kanchana Thilakarathna
-
MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies
Weiwei Qi, Shuo Shao, Wei Gu, Tianhang Zheng, Puning Zhao, Zhan Qin, Kui Ren
-
Yangyang Guo, Yangyan Li, Mohan Kankanhalli
-
DAASH: A Meta-Attack Framework for Synthesizing Effective and Stealthy Adversarial Examples
Abdullah Al Nomaan Nafi, Habibur Rahaman, Zafaryab Haider, Tanzim Mahfuz, Fnu Suya, Swarup Bhunia, Prabuddha Chakraborty
-
Efficient Constraint-Aware Flow Matching via Randomized Exploration
Zhengyan Huan, Jacob Boerma, Li-Ping Liu, Shuchin Aeron
-
DAIQ: Auditing Demographic Attribute Inference from Question in LLMs
Srikant Panda, Hitesh Laxmichand Patel, Shahad Al-Khalifa, Amit Agarwal, Hend Al-Khalifa, Sharefah Al-Ghamdi
-
Yue Xia, Tayyebeh Jahani-Nezhad, Rawad Bitar
-
Jianhao Chen, Mayi Xu, Haoyang Chen, Xiaohu Li, Xiangyu Zhang, Jianjie Huang, Zheng Wang, Xiaochun Cao, Tieyun Qian
-
Distribution Matching via Generalized Consistency Models
Sagar Shrestha, Rajesh Shrestha, Tri Nguyen, Subash Timilsina
-
CRoC: Context Refactoring Contrast for Graph Anomaly Detection with Limited Supervision
Siyue Xie, Da Sun Handason Tam, Wing Cheong Lau
-
Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position
Zhixin Xie, Xurui Song, Jun Luo
-
Yahsin Yeh, Yilun Wu, Bokai Ruan, Honghan Shuai
-
EXOTIC: An Exact, Optimistic, Tree-Based Algorithm for Min-Max Optimization
Chinmay Maheshwari, Chinmay Pimpalkhare, Debasish Chatterjee
-
Rethinking Safety in LLM Fine-tuning: An Optimization Perspective
Minseon Kim, Jin Myung Kwak, Lama Alssum, Bernard Ghanem, Philip Torr, David Krueger, Fazl Barez, Adel Bibi
-
Hanwen Cao, Haobo Lu, Xiaosen Wang, Kun He
-
CryptPEFT: Efficient and Private Neural Network Inference via Parameter-Efficient Fine-Tuning
Saisai Xia, Wenhao Wang, Zihao Wang, Yuhui Zhang, Yier Jin, Dan Meng, Rui Hou
-
Adjustable AprilTags For Identity Secured Tasks
Hao Li
-
MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols
Yixuan Yang, Daoyuan Wu, Yufan Chen
-
Passive Hack-Back Strategies for Cyber Attribution: Covert Vectors in Denied Environment
Abraham Itzhak Weinberg
-
MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols
Yixuan Yang, Daoyuan Wu, Yufan Chen
-
Rigorous Feature Importance Scores based on Shapley Value and Banzhaf Index
Xuanxiang Huang, Olivier Létoffé, Joao Marques-Silva
-
Xiaojin Zhang, Mingcong Xu, Yiming Li, Wei Chen, Qiang Yang
-
CAMF: Collaborative Adversarial Multi-agent Framework for Machine Generated Text Detection
Yue Wang, Liesheng Wei, Yuxiang Wang
-
Mitigating Jailbreaks with Intent-Aware LLMs
Wei Jie Yeo, Ranjan Satapathy, Erik Cambria
-
Matthew Hull, Haoyang Yang, Pratham Mehta, Mansi Phute, Aeree Cho, Haorang Wang, Matthew Lau, Wenke Lee, Wilian Lunardi, Martin Andreoni, Polo Chau
-
Amira Guesmi, Bassem Ouni, Muhammad Shafique
-
An Improved Algorithm for Adversarial Linear Contextual Bandits via Reduction
Tim van Erven, Jack Mayo, Julia Olkhovskaya, Chen-Yu Wei
-
Adversarial Robustness in Distributed Quantum Machine Learning
Pouya Kananian, Hans-Arno Jacobsen
-
Ben Nassi, Stav Cohen, Or Yair
-
Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions
Xuyang Guo, Zekai Huang, Zhao Song, Jiahao Zhang
-
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs
Mikhail Seleznyov, Mikhail Chaichuk, Gleb Ershov, Alexander Panchenko, Elena Tutubalina, Oleg Somov
-
Noise Matters: Optimizing Matching Noise for Diffusion Classifiers
Yanghao Wang, Long Chen
-
Semantically Guided Adversarial Testing of Vision Models Using Language Models
Katarzyna Filus, Jorge M. Cruz-Duarte
-
Remove360: Benchmarking Residuals After Object Removal in 3D Gaussian Splatting
Simona Kocour, Assia Benbihi, Torsten Sattler
-
Boosting the Robustness-Accuracy Trade-off of SNNs by Robust Temporal Self-Ensemble
Jihang Wang, Dongcheng Zhao, Ruolin Chen, Qian Zhang, Yi Zeng
-
Robust Convolution Neural ODEs via Contractivity-promoting regularization
Muhammad Zakwan, Liang Xu, Giancarlo Ferrari-Trecate
-
Ruijia Zhang, Xinyan Zhao, Ruixiang Wang, Sigen Chen, Guibin Zhang, An Zhang, Kun Wang, Qingsong Wen
-
Limitation Learning: Catching Adverse Dialog with GAIL
Noah Kasmanoff, Rahul Zalkikar
-
Assessing User Privacy Leakage in Synthetic Packet Traces: An Attack-Grounded Approach
Minhao Jin, Hongyu He, Maria Apostolaki
-
Keke Gai, Dongjue Wang, Jing Yu, Liehuang Zhu, Qi Wu
-
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Huizhen Shu, Xuying Li, Qirui Wang, Yuji Kosuga, Mengqiu Tian, Zhuo Li
-
Contrastive ECOC: Learning Output Codes for Adversarial Defense
Che-Yu Chou, Hung-Hsuan Chen
-
Hybrid Generative Fusion for Efficient and Privacy-Preserving Face Recognition Dataset Generation
Feiran Li, Qianqian Xu, Shilong Bao, Boyu Han, Zhiyong Yang, Qingming Huang
-
Enhancing Fairness in Autoencoders for Node-Level Graph Anomaly Detection
Shouju Wang, Yuchen Song, Sheng'en Li, Dongmian Zou
-
Searching for Privacy Risks in LLM Agents via Simulation
Yanzhe Zhang, Diyi Yang
-
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
Chiyu Zhang, Lu Zhou, Xiaogang Xu, Jiafei Wu, Liming Fang, Zhe Liu
-
Towards Powerful and Practical Patch Attacks for 2D Object Detection in Autonomous Driving
Yuxin Cao, Yedi Zhang, Wentao He, Yifan Liao, Yan Xiao, Chang Li, Zhiyong Huang, Jin Song Dong
-
Pruning and Malicious Injection: A Retraining-Free Backdoor Attack on Transformer Models
Taibiao Zhao, Mingxuan Sun, Hao Wang, Xiaobing Chen, Xiangwei Zhou
-
Oops!... They Stole it Again: Attacks on Split Learning
Tanveer Khan, Antonis Michalas
-
BERTector: Intrusion Detection Based on Joint-Dataset Learning
Haoyang Hu, Xun Huang, Chenyu Wu, Shiwen Liu, Zhichao Lian, Shuangquan Zhang
-
Anyuan Sang, Lu Zhou, Li Yang, Junbo Jia, Huipeng Yang, Pengbin Feng, Jianfeng Ma
-
Bistochastically private release of longitudinal data
Nicolas Ruiz
-
Wenpeng Xing, Zhonghao Qi, Yupeng Qin, Yilin Li, Caini Chang, Jiahui Yu, Changting Lin, Zhenzhen Xie, Meng Han
-
SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth
Wenpeng Xing, Lanyi Wei, Haixiao Hu, Rongchang Li, Mohan Li, Changting Lin, Meng Han
-
Failures to Surface Harmful Contents in Video Large Language Models
Yuxin Cao, Wei Song, Derui Wang, Jingling Xue, Jin Song Dong
-
SHLIME: Foiling adversarial attacks fooling SHAP and LIME
Sam Chauhan, Estelle Duguet, Karthik Ramakrishnan, Hugh Van Deventer, Jack Kruger, Ranjan Subbaraman
-
Javier Muñoz-Haro, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez
-
Contrast Sensitivity in Multimodal Large Language Models: A Psychophysics-Inspired Evaluation
Pablo Hernández-Cámara, Alexandra Gomez-Villa, Jose Manuel Jaén-Lorites, Jorge Vila-Tomás, Valero Laparra, Jesus Malo
-
Keke Gai, Dongjue Wang, Jing Yu, Liehuang Zhu, Qi Wu
-
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
Chiyu Zhang, Lu Zhou, Xiaogang Xu, Jiafei Wu, Liming Fang, Zhe Liu
-
Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference
Zhifan Luo, Shuo Shao, Su Zhang, Lijing Zhou, Yuke Hu, Chenxu Zhao, Zhihao Liu, Zhan Qin
-
NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs
Birong Pan, Mayi Xu, Qiankun Pi, Jianhao Chen, Yuanyuan Zhu, Ming Zhong, Tieyun Qian
-
Generation of Indian Sign Language Letters, Numbers, and Words
Ajeet Kumar Yadav, Nishant Kumar, Rathna G N
-
Demystifying the Role of Rule-based Detection in AI Systems for Windows Malware Detection
Andrea Ponte, Luca Demetrio, Luca Oneto, Ivan Tesfai Ogbu, Battista Biggio, Fabio Roli
-
The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage
Skyler Hallinan, Jaehun Jung, Melanie Sclar, Ximing Lu, Abhilasha Ravichander, Sahana Ramnath, Yejin Choi, Sai Praneeth Karimireddy, Niloofar Mireshghallah, Xiang Ren
-
Slow Tuning and Low-Entropy Masking for Safe Chain-of-Thought Distillation
Ziyang Ma, Qingyue Yuan, Linhai Zhang, Deyu Zhou
-
The Perils of Chart Deception: How Misleading Visualizations Affect Vision-Language Models
Ridwan Mahbub, Mohammed Saidul Islam, Md Tahmid Rahman Laskar, Mizanur Rahman, Mir Tafseer Nayeem, Enamul Hoque
-
IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding
Junxian Li, Beining Xu, Di Zhang
-
CLIP-Flow: A Universal Discriminator for AI-Generated Images Inspired by Anomaly Detection
Zhipeng Yuan, Kai Wang, Weize Quan, Dong-Ming Yan, Tieru Wu
-
Carlos Franzreb, Arnab Das, Tim Polzehl, Sebastian Möller
-
Security Analysis of ChatGPT: Threats and Privacy Risks
Yushan Xiang, Zhongwen Li, Xiaoqi Li
-
Klaudia Krawiecka, Christian Schroeder de Witt
-
Amazon Nova AI Challenge -- Trusted AI: Advancing secure, AI-assisted software development
Sattvik Sahai, Prasoon Goyal, Michael Johnston, Anna Gottardi, Yao Lu, Lucy Hu, Luke Dai, Shaohua Liu, Samyuth Sagi, Hangjie Shi, Desheng Zhang, Lavina Vaz, Leslie Ball, Maureen Murray, Rahul Gupta, Shankar Ananthakrishna
-
Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model
Sushrut Patwardhan, Raghavendra Ramachandra, Sushma Venkatesh
-
Md Sazedur Rahman, Mohamed Elmahallawy, Sanjay Madria, Samuel Frimpong
-
IPG: Incremental Patch Generation for Generalized Adversarial Patch Training
Wonho Lee, Hyunsik Na, Jisu Lee, Daeseon Choi
-
Do Language Models Agree with Human Perceptions of Suspense in Stories?
Glenn Matlin, Devin Zhang, Rodrigo Barroso Loza, Diana M. Popescu, Joni Isbell, Chandreyi Chakraborty, Mark Riedl
-
Wei Cai, Jian Zhao, Yuchu Jiang, Tianle Zhang, Xuelong Li
-
SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling
Shixuan Sun, Siyuan Liang, Ruoyu Chen, Jianjie Huang, Jingzhi Li, Xiaochun Cao
-
AI Security Map: Holistic Organization of AI Security Technologies and Impacts on Stakeholders
Hiroya Kato, Kentaro Kita, Kento Hasegawa, Seira Hidano
-
Aydin Zaboli, Junho Hong
-
Securing Educational LLMs: A Generalised Taxonomy of Attacks on LLMs and DREAD Risk Assessment
Farzana Zahid, Anjalika Sewwandi, Lee Brandon, Vimal Kumar, Roopak Sinha
-
SafeFix: Targeted Model Repair via Controlled Image Generation
Ouyang Xu, Baoming Zhang, Ruiyu Mao, Yunhui Guo
-
EditMF: Drawing an Invisible Fingerprint for Your Large Language Models
Jiaxuan Wu, Yinghan Zhou, Wanli Peng, Yiming Xue, Juan Wen, Ping Zhong
-
Oblivionis: A Lightweight Learning and Unlearning Framework for Federated Large Language Models
Fuyao Zhang, Xinyu Yan, Tiantong Wu, Wenjie Li, Tianxiang Chen, Yang Cao, Ran Yan, Longtao Huang, Wei Yang Bryan Lim, Qiang Yang
-
Attacks and Defenses Against LLM Fingerprinting
Kevin Kurian, Ethan Holland, Sean Oesch
-
Zhiqiang Yang, Renshuai Tao, Xiaolong Zheng, Guodong Yang, Chunjie Zhang
-
Privacy-protected Retrieval-Augmented Generation for Knowledge Graph Question Answering
Yunfeng Ning, Mayi Xu, Jintao Wen, Qiankun Pi, Yuanyuan Zhu, Ming Zhong, Jiawei Jiang, Tieyun Qian
-
MADPromptS: Unlocking Zero-Shot Morphing Attack Detection with Multiple Prompt Aggregation
Eduarda Caldeira, Fadi Boutros, Naser Damer
-
Deep Learning Models for Robust Facial Liveness Detection
Oleksandr Kuznetsov, Emanuele Frontoni, Luca Romeo, Riccardo Rosati, Andrea Maranesi, Alessandro Muscatello
-
Exploring Cross-Stage Adversarial Transferability in Class-Incremental Continual Learning
Jungwoo Kim, Jong-Seok Lee
-
Fre-CW: Targeted Attack on Time Series Forecasting using Frequency Domain Loss
Naifu Feng, Lixing Chen, Junhua Tang, Hua Ding, Jianhua Li, Yang Bai
-
Multi-Target Backdoor Attacks Against Speaker Recognition
Alexandrine Fortier, Sonal Joshi, Thomas Thebaud, Jesus Villalba Lopez, Najim Dehak, Patrick Cardinal
-
Image selective encryption analysis using mutual information in CNN based embedding space
Ikram Messadi, Giulia Cervia, Vincent Itier
-
Evasive Ransomware Attacks Using Low-level Behavioral Adversarial Examples
Manabu Hirano, Ryotaro Kobayashi
-
Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance
Yuchu Jiang, Jian Zhao, Yuchen Yuan, Tianle Zhang, Yao Huang, Yanghao Zhang, Yan Wang, Yanshu Li, Xizhong Guo, Yusheng Zhao, Jun Zhang, Zhi Zhang, Xiaojian Lin, Yixiu Zou, Haoxuan Ma, Yuhu Shang, Yuzhi Hu, Keshu Cai, Ruochen Zhang, Boyuan Chen, Yilan Gao, Ziheng Jiao, Yi Qin, Shuangjun Du, Xiao Tong, Zhekun Liu, Yu Chen, Xuankun Rong, Rui Wang, Yejie Zheng, Zhaoxin Fan, Hongyuan Zhang, Pan Zhou, Lei Jin, Hao Zhao, Xu Yang, Jiaojiao Zhao, Jianshu Li, Joey Tianyi Zhou, Zhi-Qi Cheng, Longtao Huang, Zhiyi Liu, Zheng Zhu, Jianan Li, Gang Wang, Qi Li, Xu-Yao Zhang, Yaodong Yang, Mang Ye, Wenqi Ren, Zhaofeng He, Hang Su, Rongrong Ni, Liping Jing, Xingxing Wei, Junliang Xing, Massimo Alioto, Shengmei Shen, Petia Radeva, Dacheng Tao, Ya-Qin Zhang, Shuicheng Yan, Chi Zhang, Zhongjiang He, Xuelong Li
-
Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems
Yutong Wu, Jie Zhang, Yiming Li, Chao Zhang, Qing Guo, Nils Lukas, Tianwei Zhang
-
Can AI Keep a Secret? Contextual Integrity Verification: A Provable Security Architecture for LLMs
Aayush Gupta
-
Exact Verification of Graph Neural Networks with Incremental Constraint Solving
Minghao Liu, Chia-Hsuan Lu, Marta Kwiatkowska
-
Collective dynamics of strategic classification
Marta C. Couto, Flavia Barsotti, Fernando P. Santos
-
Jeffri Murrugarra-LLerena, Haoran Niu, K. Suzanne Barber, Hal Daumé III, Yang Trista Cao, Paola Cascante-Bonilla
-
Constrained Black-Box Attacks Against Multi-Agent Reinforcement Learning
Amine Andam, Jamal Bentahar, Mustapha Hedabou
-
Securing Agentic AI: Threat Modeling and Risk Analysis for Network Monitoring Agentic AI System
Pallavi Zambare, Venkata Nikhil Thanikella, Ying Liu
-
Search-Time Data Contamination
Ziwen Han, Meher Mankikar, Julian Michael, Zifan Wang
-
Special-Character Adversarial Attacks on Open-Source Language Model
Ephraiem Sarabamoun
-
Maxime Heuillet, Rishika Bhagwatkar, Jonas Ngnawé, Yann Pequignot, Alexandre Larouche, Christian Gagné, Irina Rish, Ola Ahmad, Audrey Durand
-
Privacy Preserving Inference of Personalized Content for Out of Matrix Users
Michael Sun, Tai Vu, Andrew Wang
-
Wenjing Zhang, Ye Hu, Tao Luo, Zhilong Zhang, Mingzhe Chen
-
1-2-3 Check: Enhancing Contextual Privacy in LLM via Multi-Agent Reasoning
Wenkai Li, Liwen Sun, Zhenxiang Guan, Xuhui Zhou, Maarten Sap
-
Best-Effort Policies for Robust Markov Decision Processes
Alessandro Abate, Thom Badings, Giuseppe De Giacomo, Francesco Fabiano
-
BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks
Rui Miao, Yixin Liu, Yili Wang, Xu Shen, Yue Tan, Yiwei Dai, Shirui Pan, Xin Wang
-
Stephan Rabanser
-
BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models
Maozhen Zhang, Mengnan Zhao, Bo Wang
-
Can You Trick the Grader? Adversarial Persuasion of LLM Judges
Yerin Hwang, Dongryeol Lee, Taegwan Kang, Yongil Kim, Kyomin Jung
-
Jinx: Unlimited LLMs for Probing Alignment Failures
Jiahao Zhao, Liwei Dong
-
Runze Wang, Zeli Chen, Zhiyun Song, Wei Fang, Jiajin Zhang, Danyang Tu, Yuxing Tang, Minfeng Xu, Xianghua Ye, Le Lu, Dakai Jin
-
Hongrui Zheng, Yuezun Li, Liejun Wang, Yunfeng Diao, Zhiqing Guo
-
MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization
Animesh Jain, Alexandros Stergiou
-
VOIDFace: A Privacy-Preserving Multi-Network Face Recognition With Enhanced Security
Ajnas Muhammed, Iurri Medvedev, Nuno Gonçalves
-
Mitigating Biases in Surgical Operating Rooms with Geometry
Tony Danjun Wang, Tobias Czempiel, Nassir Navab, Lennart Bastian
-
Nicholas Klein, Hemlata Tak, James Fullwood, Krishna Regmi, Leonidas Spinoulas, Ganesh Sivaraman, Tianxiang Chen, Elie Khoury
-
Yan Wang, Da-Wei Zhou, Han-Jia Ye
-
IPBA: Imperceptible Perturbation Backdoor Attack in Federated Self-Supervised Learning
Jiayao Wang, Yang Song, Zhendong Zhao, Jiale Zhang, Qilin Wu, Junwu Zhu, Dongfang Zhao
-
FairDRL-ST: Disentangled Representation Learning for Fair Spatio-Temporal Mobility Prediction
Sichen Zhao, Wei Shao, Jeffrey Chan, Ziqi Xu, Flora Salim
-
Multi-Turn Jailbreaks Are Simpler Than They Seem
Xiaoxue Yang, Jaeha Lee, Anna-Katharina Dick, Jasper Timm, Fei Xie, Diogo Cruz
-
Multi-Hop Privacy Propagation for Differentially Private Federated Learning in Social Networks
Chenchen Lin, Xuehe Wang
-
EFU: Enforcing Federated Unlearning via Functional Encryption
Samaneh Mohammadi, Vasileios Tsouvalas, Iraklis Symeonidis, Ali Balador, Tanir Ozcelebi, Francesco Flammini, Nirvana Meratnia
-
Robust Anomaly Detection in O-RAN: Leveraging LLMs against Data Manipulation Attacks
Thusitha Dayaratne, Ngoc Duy Pham, Viet Vo, Shangqi Lai, Sharif Abuadbba, Hajime Suzuki, Xingliang Yuan, Carsten Rudolph
-
False Reality: Uncovering Sensor-induced Human-VR Interaction Vulnerability
Yancheng Jiang, Yan Jiang, Ruochen Zhou, Yi-Chao Chen, Xiaoyu Ji, Wenyuan Xu
-
Fully-Fluctuating Participation in Sleepy Consensus
Yuval Efron, Joachim Neu, Toniann Pitassi
-
Vibeke Binz Vallevik, Anne Kjersti C. Befring, Severin Elvatun, Jan Franz Nygaard
-
VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models
Mansi Phute, Ravikumar Balakrishnan
-
Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference
Kexin Chu, Zecheng Lin, Dawei Xiang, Zixu Shen, Jianchang Su, Cheng Chu, Yiwei Yang, Wenhui Zhang, Wenfei Wu, Wei Zhang
-
Designing with Deception: ML- and Covert Gate-Enhanced Camouflaging to Thwart IC Reverse Engineering
Junling Fan, David Koblah, Domenic Forte
-
Towards Effective MLLM Jailbreaking Through Balanced On-Topicness and OOD-Intensity
Zuoou Li, Weitong Zhang, Jingyuan Wang, Shuyuan Zhang, Wenjia Bai, Bernhard Kainz, Mengyun Qiao
-
FIDELIS: Blockchain-Enabled Protection Against Poisoning Attacks in Federated Learning
Jane Carney, Kushal Upreti, Gaby G. Dagher, Tim Andersen
-
William Guo, Adaku Uchendu, Ana Smith
-
Hallucination as a Computational Boundary: A Hierarchy of Inevitability and the Oracle Escape
Quan Shi, Wang Xi, Zenghui Ding, Jianqing Gao, Xianjun Yang
-
A Real-Time, Self-Tuning Moderator Framework for Adversarial Prompt Detection
Ivan Zhang
-
Representation Understanding via Activation Maximization
Hongbo Zhu, Angelo Cangelosi
-
ObfusQAte: A Proposed Framework to Evaluate LLM Robustness on Obfuscated Factual Question Answering
Shubhra Ghosh, Abhilekh Borah, Aditya Kumar Guru, Kripabandhu Ghosh
-
A Spin Glass Characterization of Neural Networks
Jun Li
-
Gradient Surgery for Safe LLM Fine-Tuning
Biao Yi, Jiahao Li, Baolei Zhang, Lihai Nie, Tong Li, Tiansheng Huang, Zheli Liu
-
HaDM-ST: Histology-Assisted Differential Modeling for Spatial Transcriptomics Generation
Xuepeng Liu, Zheng Jiang, Pinan Zhu, Hanyu Liu, Chao Li
-
Rongxuan Peng, Shunquan Tan, Chenqi Kong, Anwei Luo, Alex C. Kot, Jiwu Huang
-
Fading the Digital Ink: A Universal Black-Box Attack Framework for 3DGS Watermarking Systems
Qingyuan Zeng, Shu Jiang, Jiajing Lin, Zhenzhong Wang, Kay Chen Tan, Min Jiang
-
Towards Unveiling Predictive Uncertainty Vulnerabilities in the Context of the Right to Be Forgotten
Wei Qian, Chenxu Zhao, Yangyi Li, Wenqian Ye, Mengdi Huai
-
Enhancing Privacy in Decentralized Min-Max Optimization: A Differentially Private Approach
Yueyang Quan, Chang Wang, Shengjie Zhai, Minghong Fang, Zhuqing Liu
-
Certifiably robust malware detectors by design
Pierre-Francois Gimenez, Sarath Sivaprasad, Mario Fritz
-
Multi-task Adversarial Attacks against Black-box Model with Few-shot Queries
Wenqiang Wang, Yan Xiao, Hao Lin, Yangshijie Zhang, Xiaochun Cao
-
Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models
Badrinath Ramakrishnan, Akshaya Balaji
-
Xianjun Yang, Liqiang Xiao, Shiyang Li, Faisal Ladhak, Hyokun Yun, Linda Ruth Petzold, Yi Xu, William Yang Wang
-
PROPS: Progressively Private Self-alignment of Large Language Models
Noel Teku, Fengwei Tian, Payel Bhattacharjee, Souradip Chakraborty, Amrit Singh Bedi, Ravi Tandon
-
Who's the Evil Twin? Differential Auditing for Undesired Behavior
Ishwar Balappanawar, Venkata Hasith Vattikuti, Greta Kintzley, Ronan Azimi-Mancel, Satvik Golechha
-
Balancing Privacy and Efficiency: Music Information Retrieval via Additive Homomorphic Encryption
William Zerong Wang, Dongfang Zhao
-
Membership and Memorization in LLM Knowledge Distillation
Ziqi Zhang, Ali Shahin Shamsabadi, Hanxiao Lu, Yifeng Cai, Hamed Haddadi
-
Model-Agnostic Sentiment Distribution Stability Analysis for Robust LLM-Generated Texts Detection
Siyuan Li, Xi Lin, Guangyan Li, Zehao Liu, Aodu Wulianghai, Li Ding, Jun Wu, Jianhua Li
-
Adversarial Video Promotion Against Text-to-Video Retrieval
Qiwei Tian, Chenhao Lin, Zhengyu Zhao, Qian Li, Shuai Liu, Chao Shen
-
Membership Inference Attacks with False Discovery Rate Control
Chenxu Zhao, Wei Qian, Aobo Chen, Mengdi Huai
-
Sensory robustness through top-down feedback and neural stochasticity in recurrent vision models
Antonino Greco, Marco D'Alessandro, Karl J. Friston, Giovanni Pezzulo, Markus Siegel
-
SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization
Beilong Tang, Xiaoxiao Miao, Xin Wang, Ming Li
-
Label Inference Attacks against Federated Unlearning
Wei Wang, Xiangyun Tang, Yajie Wang, Yijing Lin, Tao Zhang, Meng Shen, Dusit Niyato, Liehuang Zhu
-
Towards Effective Prompt Stealing Attack against Text-to-Image Diffusion Models
Shiqian Zhao, Chong Wang, Yiming Li, Yihao Huang, Wenjie Qu, Siew-Kei Lam, Yi Xie, Kangjie Chen, Jie Zhang, Tianwei Zhang
-
Context Misleads LLMs: The Role of Context Filtering in Maintaining Safe Alignment of LLMs
Jinhwa Kim, Ian G. Harris
-
The Cost of Thinking: Increased Jailbreak Risk in Large Language Models
Fan Yang
-
LLM Robustness Leaderboard v1 --Technical report
Pierre Peigné - Lefebvre, Quentin Feuillade-Montixi, Tom David, Nicolas Miailhe
-
ETA: Energy-based Test-time Adaptation for Depth Completion
Younjoon Chung, Hyoungseob Park, Patrick Rim, Xiaoran Zhang, Jihe He, Ziyao Zeng, Safa Cicek, Byung-Woo Hong, James S. Duncan, Alex Wong
-
Differentially Private Federated Clustering with Random Rebalancing
Xiyuan Yang, Shengyuan Hu, Soyeon Kim, Tian Li
-
Membership Inference Attack with Partial Features
Xurun Wang, Guangrui Liu, Xinjie Li, Haoyu He, Lin Yao, Weizhe Zhang
-
In-Training Defenses against Emergent Misalignment in Language Models
David Kaczér, Magnus Jørgenvåg, Clemens Vetter, Lucie Flek, Florian Mai
-
FedMeNF: Privacy-Preserving Federated Meta-Learning for Neural Fields
Junhyeog Yun, Minui Hong, Gunhee Kim
-
ScamAgents: How AI Agents Can Simulate Human-Level Scam Calls
Sanket Badhe
-
Sofiane Bouaziz, Adel Hafiane, Raphael Canals, Rachid Nedjai
-
Adversarial Topic-aware Prompt-tuning for Cross-topic Automated Essay Scoring
Chunyun Zhang, Hongyan Zhao, Chaoran Cui, Qilong Song, Zhiqing Lu, Shuai Gong, Kailin Liu
-
Beyond Uniform Criteria: Scenario-Adaptive Multi-Dimensional Jailbreak Evaluation
Lai Jiang, Yuekang Li, Xiaohan Zhang, Youtao Ding, Li Pan
-
Quantifying Conversation Drift in MCP via Latent Polytope
Haoran Shi, Hongwei Yao, Shuo Shao, Shaopeng Jiao, Ziqi Peng, Zhan Qin, Cong Wang
-
Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System
Haorui He, Yupeng Li, Bin Benjamin Zhu, Dacheng Wen, Reynold Cheng, Francis C. M. Lau
-
SAM Encoder Breach by Adversarial Simplicial Complex Triggers Downstream Model Failures
Yi Qin, Rui Wang, Tao Huang, Tong Xiao, Liping Jing
-
SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models
Hanqing Wang, Yuan Tian, Mingyu Liu, Zhenhao Zhang, Xiangyang Zhu
-
FVGen: Accelerating Novel-View Synthesis with Adversarial Video Diffusion Distillation
Wenbin Teng, Gonglin Chen, Haiwei Chen, Yajie Zhao
-
Adaptive Backtracking for Privacy Protection in Large Language Models
Zhihao Yao, Yuxuan Gu, Xiachong Feng, Weitao Ma, Bo Li, Xiaocheng Feng
-
ProvX: Generating Counterfactual-Driven Attack Explanations for Provenance-Based Detection
Weiheng Wu, Wei Qiao, Teng Li, Yebo Feng, Zhuo Ma, Jianfeng Ma, Yang Liu
-
Zhengxian Wu, Juan Wen, Wanli Peng, Haowei Chang, Yinghan Zhou, Yiming Xue
-
When AIOps Become "AI Oops": Subverting LLM-driven IT Operations via Telemetry Manipulation
Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese, Omer Akgul, Athanasios Theocharis, Petros Efstathopoulos
-
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
Kyle O'Brien, Stephen Casper, Quentin Anthony, Tomek Korbak, Robert Kirk, Xander Davies, Ishan Mishra, Geoffrey Irving, Yarin Gal, Stella Biderman
-
Train It and Forget It: Merge Lists are Unnecessary for BPE Inference in Language Models
Tomohiro Sawada, Kartik Goyal
-
Sihan Ma, Qiming Wu, Ruotong Jiang, Frank Burns
-
Learning to Forget with Information Divergence Reweighted Objectives for Noisy Labels
Jeremiah Birrell, Reza Ebrahimi
-
Privacy-Preserving Tabular Synthetic Data Generation Using TabularARGN
Andrey Sidorenko, Paul Tiwald
-
Fine-Grained Safety Neurons with Training-Free Continual Projection to Reduce LLM Fine Tuning Risks
Bing Han, Feifei Zhao, Dongcheng Zhao, Guobin Shen, Ping Wu, Yu Shi, Yi Zeng
-
Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs
Wenpeng Xing, Mohan Li, Chunqiang Hu, Haitao XuNingyu Zhang, Bo Lin, Meng Han
-
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
Zhaomin Wu, Mingzhe Du, See-Kiong Ng, Bingsheng He
-
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
Shuang Liang, Zhihao Xu, Jialing Tao, Hui Xue, Xiting Wang
-
MedMKEB: A Comprehensive Knowledge Editing Benchmark for Medical Multimodal Large Language Models
Dexuan Xu, Jieyi Wang, Zhongyan Chai, Yongzhi Cao, Hanpin Wang, Huamin Zhang, Yu Huang
-
Automatic Image Colorization with Convolutional Neural Networks and Generative Adversarial Networks
Ruiyu Li, Changyuan Qiu, Hangrui Cao, Qihan Ren, Yuqing Qiu
-
Renmiao Chen, Shiyao Cui, Xuancheng Huang, Chengwei Pan, Victor Shea-Jay Huang, QingLin Zhang, Xuan Ouyang, Zhexin Zhang, Hongning Wang, Minlie Huang
-
Zane Xu, Jason Sun
-
Building Effective Safety Guardrails in AI Education Tools
Hannah-Beth Clark, Laura Benton, Emma Searle, Margaux Dowland, Matthew Gregory, Will Gayne, John Roberts
-
Qi Guo, Xiaojun Jia, Shanmin Pang, Simeng Qin, Lin Wang, Ju Jia, Yang Liu, Qing Guo
-
Farah Wahida, M.A.P. Chamikara, Yashothara Shanmugarasa, Mohan Baruwal Chhetri, Thilina Ranbaduge, Ibrahim Khalil
-
Physical Adversarial Camouflage through Gradient Calibration and Regularization
Jiawei Liang, Siyuan Liang, Jianjie Huang, Chenxi Si, Ming Zhang, Xiaochun Cao
-
Keep It Real: Challenges in Attacking Compression-Based Adversarial Purification
Samuel Räber, Till Aczel, Andreas Plesner, Roger Wattenhofer
-
FS-IQA: Certified Feature Smoothing for Robust Image Quality Assessment
Ekaterina Shumitskaya, Dmitriy Vatolin, Anastasia Antsiferova
-
Don't Reach for the Stars: Rethinking Topology for Resilient Federated Learning
Mirko Konstantin, Anirban Mukhopadhyay
-
NT-ML: Backdoor Defense via Non-target Label Training and Mutual Learning
Wenjie Huo, Katinka Wolter
-
Let's Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes
Zachary Robertson, Sanmi Koyejo
-
Thorsten Peinemann, Paula Arnold, Sebastian Berndt, Thomas Eisenbarth, Esfandiar Mohammadi
-
Anti-Jamming Sensing with Distributed Reconfigurable Intelligent Metasurface Antennas
Zhaowei Wang, Yunsong Huang, Weicheng Liu, Hui-Ming Wang
-
Necessity of Block Designs for Optimal Locally Private Distribution Estimation
Abigail Gentle
-
Safety of Embodied Navigation: A Survey
Zixia Wang, Jia Hu, Ronghui Mu
-
Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation
Chi Zhang, Changjia Zhu, Junjie Xiong, Xiaoran Xu, Lingyao Li, Yao Liu, Zhuo Lu
-
Sasa Maric, Rasil Baidar, Robert Abbas, Sam Reisenfeld
-
A Neurosymbolic Framework for Interpretable Cognitive Attack Detection in Augmented Reality
Rongqian Chen, Allison Andreyev, Yanming Xiu, Mahdi Imani, Bin Li, Maria Gorlatova, Gang Tan, Tian Lan
-
RL-MoE: An Image-Based Privacy Preserving Approach In Intelligent Transportation System
Abdolazim Rezaei, Mehdi Sookhak, Mahboobeh Haghparast
-
Robust Market Making: To Quote, or not To Quote
Ziyi Wang, Carmine Ventre, Maria Polukarov
-
Adversarial Attacks and Defenses on Graph-aware Large Language Models (LLMs)
Iyiola E. Olatunji, Franziska Boenisch, Jing Xu, Adam Dziedzic
-
IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with Verifiable Rewards
Xu Guo, Tianyi Liang, Tong Jian, Xiaogui Yang, Ling-I Wu, Chenhui Li, Zhihui Lu, Qipeng Guo, Kai Chen
-
ANPrompt: Anti-noise Prompt Tuning for Vision-Language Models
Yansheng Gao, Yufei Zheng, Jinghan Qu, Zixi Zhu, Yukuan Zhang, Shengsheng Wang
-
Boosting Adversarial Transferability via Residual Perturbation Attack
Jinjia Peng, Zeze Tao, Huibing Wang, Meng Wang, Yang Wang
-
AuthPrint: Fingerprinting Generative Models Against Malicious Model Providers
Kai Yao, Marc Juarez
-
Communication-Learning Co-Design for Differentially Private Over-the-Air Federated Distillation
Zihao Hu, Jia Yan, Ying-Jun Angela Zhang
-
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Zeyi Sun, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiaoyi Dong, Tong Wu, Dahua Lin, Jiaqi Wang
-
Jiayi Wen, Tianxin Chen, Zhirun Zheng, Cheng Huang
-
An Audit and Analysis of LLM-Assisted Health Misinformation Jailbreaks Against LLMs
Ayana Hussain, Patrick Zhao, Nicholas Vincent
-
Assessing Representation Stability for Transformer Models
Bryan E. Tuck, Rakesh M. Verma
-
Guangli Li, Canbiao Wu, Zhen Liang
-
Per-element Secure Aggregation against Data Reconstruction Attacks in Federated Learning
Takumi Suimon, Yuki Koizumi, Junji Takemasa, Toru Hasegawa
-
Prompt Injection Vulnerability of Consensus Generating Applications in Digital Democracy
Jairo Gudiño-Rosero, Clément Contet, Umberto Grandi, César A. Hidalgo
-
PrivDFS: Private Inference via Distributed Feature Sharing against Data Reconstruction Attacks
Zihan Liu, Jiayi Wen, Junru Wu, Xuyang Zou, Shouhong Tan, Zhirun Zheng, Cheng Huang
-
Dynamic User-controllable Privacy-preserving Few-shot Sensing Framework
Ajesh Koyatan Chathoth, Shuhao Yu, Stephen Lee
-
BadTime: An Effective Backdoor Attack on Multivariate Long-Term Time Series Forecasting
Kunlan Xiang, Haomiao Yang, Meng Hao, Wenbo Jiang, Haoxin Wang, Shiyue Huang, Shaofeng Li, Yijing Liu, Ji Guo, Dusit Niyato
-
Do Vision-Language Models Leak What They Learn? Adaptive Token-Weighted Model Inversion Attacks
Ngoc-Bao Nguyen, Sy-Tuyen Ho, Koh Jun Hao, Ngai-Man Cheung
-
Rui Pu, Chaozhuo Li, Rui Ha, Litian Zhang, Lirong Qiu, Xi Zhang
-
T2UE: Generating Unlearnable Examples from Text Descriptions
Xingjun Ma, Hanxun Huang, Tianwei Song, Ye Sun, Yifeng Gao, Yu-Gang Jiang
-
Rui Zou, Mengqi Wei, Yutao Zhu, Jirong Wen, Xin Zhao, Jing Chen
-
VCNet: Recreating High-Level Visual Cortex Principles for Robust Artificial Vision
Brennen A. Hill, Zhang Xinyu, Timothy Putra Prasetio
-
Untraceable DeepFakes via Traceable Fingerprint Elimination
Jiewei Lai, Lan Zhang, Chen Tang, Pengcheng Sun, Xinming Wang, Yunhao Wang
-
VFLAIR-LLM: A Comprehensive Framework and Benchmark for Split Learning of LLMs
Zixuan Gu, Qiufeng Fan, Long Sun, Yang Liu, Xiaojun Ye
-
Attack the Messages, Not the Agents: A Multi-round Adaptive Stealthy Tampering Framework for LLM-MAS
Bingyu Yan, Ziyi Zhou, Xiaoming Zhang, Chaozhuo Li, Ruilin Zeng, Yirui Qi, Tianbo Wang, Litian Zhang
-
Xinwei Liu, Xiaojun Jia, Yuan Xun, Simeng Qin, Xiaochun Cao
-
Wang Yu-Hang, Shiwei Li, Jianxiang Liao, Li Bohan, Jian Liu, Wenfei Yin
-
When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs
Bodam Kim, Hiskias Dingeto, Taeyoun Kwon, Dasol Choi, DongGeon Lee, Haon Park, JaeHoon Lee, Jongho Shin
-
VideoGuard: Protecting Video Content from Unauthorized Editing
Junjie Cao, Kaizhou Li, Xinchun Yu, Hongxiang Li, Xiaoping Zhang
-
Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis
-
Haoran Wang, Xiongxiao Xu, Baixiang Huang, Kai Shu
-
Token-Level Precise Attack on RAG: Searching for the Best Alternatives to Mislead Generation
Zizhong Li, Haopeng Zhang, Jiawei Zhang
-
Adversarial Attention Perturbations for Large Object Detection Transformers
Zachary Yahn, Selim Furkan Tekin, Fatih Ilhan, Sihao Hu, Tiansheng Huang, Yichang Xu, Margaret Loper, Ling Liu
-
Seeing It Before It Happens: In-Generation NSFW Detection for Diffusion-Based Text-to-Image Models
Fan Yang, Yihao Huang, Jiayi Zhu, Ling Shi, Geguang Pu, Jin Song Dong, Kailong Wang
-
evTransFER: A Transfer Learning Framework for Event-based Facial Expression Recognition
Rodrigo Verschae, Ignacio Bugueno-Cordova
-
BadBlocks: Low-Cost and Stealthy Backdoor Attacks Tailored for Text-to-Image Diffusion Models
Yu Pan, Jiahao Chen, Lin Wang, Bingrong Dai, Yi Du
-
Heterogeneity-Oblivious Robust Federated Learning
Weiyao Zhang, Jinyang Li, Qi Song, Miao Wang, Chungang Lin, Haitong Luo, Xuying Meng, Yujun Zhang
-
What If, But Privately: Private Counterfactual Retrieval
Shreya Meel, Mohamed Nomeir, Pasan Dissanayake, Sanghamitra Dutta, Sennur Ulukus
-
BDFirewall: Towards Effective and Expeditiously Black-Box Backdoor Defense in MLaaS
Ye Li, Chengcheng Zhu, Yanchao Zhao, Jiale Zhang
-
Probing and Enhancing the Robustness of GNN-based QEC Decoders with Reinforcement Learning
Ryota Ikeda
-
Peizhuo Liu
-
Selection-Based Vulnerabilities: Clean-Label Backdoor Attacks in Active Learning
Yuhan Zhi, Longtian Wang, Xiaofei Xie, Chao Shen, Qiang Hu, Xiaohong Guan
-
Anti-Tamper Protection for Unauthorized Individual Image Generation
Zelin Li, Ruohan Zong, Yifan Liu, Ruichen Yao, Yaokun Liu, Yang Zhang, Dong Wang
-
EvaDrive: Evolutionary Adversarial Policy Optimization for End-to-End Autonomous Driving
Siwen Jiao, Kangan Qian, Hao Ye, Yang Zhong, Ziang Luo, Sicong Jiang, Zilin Huang, Yangyi Fang, Jinyu Miao, Zheng Fu, Yunlong Wang, Kun Jiang, Diange Yang, Rui Fan, Baoyun Peng
-
Defend LLMs Through Self-Consciousness
Boshi Huang, Fabio Nonato de Paula
-
Secure mmWave Beamforming with Proactive-ISAC Defense Against Beam-Stealing Attacks
Seyed Bagher Hashemi Natanzi, Hossein Mohammadi, Bo Tang, Vuk Marojevic
-
Highlight & Summarize: RAG without the jailbreaks
Giovanni Cherubin, Andrew Paverd
-
Liang Lin, Miao Yu, Kaiwen Luo, Yibo Zhang, Lilan Peng, Dexian Wang, Xuehai Tang, Yuanhe Zhang, Xikang Yang, Zhenhong Zhou, Kun Wang, Yang Liu
-
Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation
Kennedy Edemacu, Vinay M. Shashidhar, Micheal Tuape, Dan Abudu, Beakcheol Jang, Jong Wook Kim
-
Online Robust Multi-Agent Reinforcement Learning under Model Uncertainties
Zain Ulabedeen Farhat, Debamita Ghosh, George K. Atia, Yue Wang
-
Ko-Wei Chuang, Hen-Hsen Huang, Tsai-Yen Li
-
MetAdv: A Unified and Interactive Adversarial Testing Platform for Autonomous Driving
Aishan Liu, Jiakai Wang, Tianyuan Zhang, Hainan Li, Jiangfan Liu, Siyuan Liang, Yilong Ren, Xianglong Liu, Dacheng Tao
-
Yifan Liao, Yuxin Cao, Yedi Zhang, Wentao He, Yan Xiao, Xianglong Du, Zhiyong Huang, Jin Song Dong
-
Is Uncertainty Quantification a Viable Alternative to Learned Deferral?
Anna M. Wundram, Christian F. Baumgartner
-
Mitigating Attention Hacking in Preference-Based Reward Modeling via Interaction Distillation
Jianxiang Zang, Meiling Ning, Shihan Dou, Jiazheng Zhang, Tao Gui, Qi Zhang, Xuanjing Huang
-
Collision-based Watermark for Detecting Backdoor Manipulation in Federated Learning
Wenjie Li, Siying Gu, Yiming Li, Kangjie Chen, Zhili Chen, Tianwei Zhang, Shu-Tao Xia, Dacheng Tao
-
What Makes "Good" Distractors for Object Hallucination Evaluation in Large Vision-Language Models?
Ming-Kun Xie, Jia-Hao Xiao, Gang Niu, Lei Feng, Zhiqiang Kou, Min-Ling Zhang, Masashi Sugiyama
-
IMU: Influence-guided Machine Unlearning
Xindi Fan, Jing Wu, Mingyi Zhou, Pengwei Liang, Dinh Phung
-
Mingyu Wang, Haojie Liu, Zhiyong Li, Wei Jiang
-
Joint Lossless Compression and Steganography for Medical Images via Large Language Models
Pengcheng Zheng, Xiaorong Pu, Kecheng Chen, Jiaxin Huang, Meng Yang, Bai Feng, Yazhou Ren, Jianan Jiang, Chaoning Zhang, Yang Yang, Heng Tao Shen
-
Simulated Ensemble Attack: Transferring Jailbreaks Across Fine-tuned Vision-Language Models
Ruofan Wang, Xin Wang, Yang Yao, Xuan Tong, Xingjun Ma
-
BeDKD: Backdoor Defense based on Dynamic Knowledge Distillation and Directional Mapping Modulator
Zhengxian Wu, Juan Wen, Wanli Peng, Yinghan Zhou, Changtong dou, Yiming Xue
-
BlockA2A: Towards Secure and Verifiable Agent-to-Agent Interoperability
Zhenhua Zou, Zhuotao Liu, Lepeng Zhao, Qiuyang Zhan
-
ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models
Zihan Wang, Rui Zhang, Hongwei Li, Wenshu Fan, Wenbo Jiang, Qingchuan Zhao, Guowen Xu
-
PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation
Zonglei Jing, Xiao Yang, Xiaoqian Li, Siyuan Liang, Aishan Liu, Mingchuan Zhang, Xianglong Liu
-
AgentArmor: Enforcing Program Analysis on Agent Runtime Trace to Defend Against Prompt Injection
Peiran Wang, Yang Liu, Yunfei Lu, Yifeng Cai, Hongbo Chen, Qingyou Yang, Jie Zhang, Jue Hong, Ye Wu
-
R1-ACT: Efficient Reasoning Model Safety Alignment by Activating Safety Knowledge
Yeonjun In, Wonjoong Kim, Sangwu Park, Chanyoung Park
-
Pro2Guard: Proactive Runtime Enforcement of LLM Agent Safety via Probabilistic Model Checking
Haoyu Wang, Chris M. Poskitt, Jun Sun, Jiali Wei
-
CyGATE: Game-Theoretic Cyber Attack-Defense Engine for Patch Strategy Optimization
Yuning Jiang, Nay Oo, Qiaoran Meng, Lu Lin, Dusit Niyato, Zehui Xiong, Hoon Wei Lim, Biplab Sikdar
-
Activation-Guided Local Editing for Jailbreaking Attacks
Jiecong Wang, Haoran Li, Hao Peng, Ziqian Zeng, Zihao Wang, Haohua Du, Zhengtao Yu
-
Wukong Framework for Not Safe For Work Detection in Text-to-Image systems
Mingrui Liu, Sixiao Zhang, Cheng Long
-
LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks
Francesco Panebianco, Stefano Bonfanti, Francesco Trovò, Michele Carminati
-
Backdoor Attacks on Deep Learning Face Detection
Quentin Le Roux, Yannick Teglia, Teddy Furon, Philippe Loubet-Moundi
-
Laura Pedrouzo-Rodriguez, Pedro Delgado-DeRobles, Luis F. Gomez, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez
-
Qiyao Xue, Yuchen Dou, Ryan Shi, Xiang Lorraine Li, Wei Gao
-
DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large Language Models
Shantanu Thorat, Andrew Caines
-
Privacy-Preserving Driver Drowsiness Detection with Spatial Self-Attention and Federated Learning
Tran Viet Khoa, Do Hai Son, Mohammad Abu Alsheikh, Yibeltal F Alem, Dinh Thai Hoang
-
IN2OUT: Fine-Tuning Video Inpainting Model for Video Outpainting Using Hierarchical Discriminator
Sangwoo Youn, Minji Lee, Nokap Tony Park, Yeonggyoo Jeon, Taeyoung Na
-
DBLP: Noise Bridge Consistency Distillation For Efficient And Reliable Adversarial Purification
Chihan Huang, Belal Alsinglawi, Islam Al-qudah
-
Junhao Zheng, Jiahao Sun, Chenhao Lin, Zhengyu Zhao, Chen Ma, Chong Zhang, Cong Wang, Qian Wang, Chao Shen
-
STF: Shallow-Level Temporal Feedback to Enhance Spiking Transformers
Zeqi Zheng, Zizheng Zhu, Yingchao Yu, Yanchen Huang, Changze Lv, Junfeng Tang, Zhaofei Yu, Yaochu Jin
-
Young-ho Cho, Hao Zhu, Duehee Lee, Ross Baldick
-
FedGuard: A Diverse-Byzantine-Robust Mechanism for Federated Learning with Major Malicious Clients
Haocheng Jiang, Hua Shen, Jixin Zhang, Willy Susilo, Mingwu Zhang
-
LeakyCLIP: Extracting Training Data from CLIP
Yunhao Chen, Shujie Wang, Xin Wang, Xingjun Ma
-
Random Walk Learning and the Pac-Man Attack
Xingran Chen, Parimal Parag, Rohit Bhagat, Zonghong Liu, Salim El Rouayheb
-
Privacy Enhancement for Gaze Data Using a Noise-Infused Autoencoder
Samantha Aziz, Oleg Komogortsev
-
Hyperproperty-Constrained Secure Reinforcement Learning
Ernest Bonnah, Luan Viet Nguyen, Khaza Anuarul Hoque
-
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
Ziqian Zhong, Aditi Raghunathan
-
On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI
David Restrepo, Ira Ktena, Maria Vakalopoulou, Stergios Christodoulidis, Enzo Ferrante
-
Improved Robustness and Functional Localization in Topographic CNNs Through Weight Similarity
Nhut Truong, Uri Hasson
-
Data-driven global ocean model resolving ocean-atmosphere coupling dynamics
Jeong-Hwan Kim, Daehyun Kang, Young-Min Yang, Jae-Heung Park, Yoo-Geun Ham
-
Gaussian Splatting Feature Fields for Privacy-Preserving Visual Localization
Maxime Pietrantoni, Gabriela Csurka, Torsten Sattler
-
Foundations and Models in Modern Computer Vision: Key Building Blocks in Landmark Architectures
Radu-Andrei Bourceanu, Neil De La Fuente, Jan Grimm, Andrei Jardan, Andriy Manucharyan, Cornelius Weiss, Daniel Cremers, Roman Pflugfelder
-
FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning
Jiajun Cao, Qizhe Zhang, Peidong Jia, Xuhui Zhao, Bo Lan, Xiaoan Zhang, Zhuo Li, Xiaobao Wei, Sixiang Chen, Liyun Li, Xianming Liu, Ming Lu, Yang Wang, Shanghang Zhang
-
Measuring Harmfulness of Computer-Using Agents
Aaron Xuxiang Tian, Ruofan Zhang, Janet Tang, Ji Wang, Tianyu Shi, Jiaxin Wen
-
Differentially Private Clipped-SGD: High-Probability Convergence with Arbitrary Clipping Level
Saleh Vatan Khah, Savelii Chezhegov, Shahrokh Farahmand, Samuel Horváth, Eduard Gorbunov
-
LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring
Chloe Li, Mary Phuong, Noah Y. Siegel
-
Yufei Chen, Yao Wang, Haibin Zhang, Tao Gu
-
Counterfactual Evaluation for Blind Attack Detection in LLM-based Evaluation Systems
Lijia Liu, Takumi Kondo, Kyohei Atarashi, Koh Takeuchi, Jiyi Li, Shigeru Saito, Hisashi Kashima
-
Yunrui Yu, Hang Su, Cheng-zhong Xu, Zhizhong Su, Jun Zhu
-
RCR-AF: Enhancing Model Generalization via Rademacher Complexity Reduction Activation Function
Yunrui Yu, Kafeng Wang, Hang Su, Jun Zhu
-
LoReUn: Data Itself Implicitly Provides Cues to Improve Machine Unlearning
Xiang Li, Qianli Shen, Haonan Wang, Kenji Kawaguchi
-
Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs
Xikang Yang, Biyu Zhou, Xuehai Tang, Jizhong Han, Songlin Hu
-
Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning
Afshin Khadangi, Amir Sartipi, Igor Tchappi, Ramin Bahmani, Gilbert Fridgen
-
Metamorphic Testing of Deep Code Models: A Systematic Literature Review
Ali Asgari, Milan de Koning, Pouria Derakhshanfar, Annibale Panichella
-
Of Good Demons and Bad Angels: Guaranteeing Safe Control under Finite Precision
Samuel Teuber, Debasmita Lohar, Bernhard Beckert
-
CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models
Kedong Xiu, Saiqian Zhang
-
On the Reliability of Vision-Language Models Under Adversarial Frequency-Domain Perturbations
Jordan Vice, Naveed Akhtar, Yansong Gao, Richard Hartley, Ajmal Mian
-
Shenghao Zhu, Yifei Chen, Weihong Chen, Yuanhan Wang, Chang Liu, Shuo Jiang, Feiwei Qin, Changmiao Wang
-
DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion
Hossein Mirzaei, Zeinab Taghavi, Sepehr Rezaee, Masoud Hadi, Moein Madadi, Mackenzie W. Mathis
-
LCS: An AI-based Low-Complexity Scaler for Power-Efficient Super-Resolution of Game Content
Simon Pochinda, Momen K. Tageldeen, Mark Thompson, Tony Rinaldi, Troy Giorshev, Keith Lee, Jie Zhou, Frederick Walls
-
Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions
Yiting Qu, Ziqing Yang, Yihan Ma, Michael Backes, Savvas Zannettou, Yang Zhang
-
Invisible Injections: Exploiting Vision-Language Models Through Steganographic Prompt Embedding
Chetan Pathade
-
Benchmarking Fraud Detectors on Private Graph Data
Alexander Goldberg, Giulia Fanti, Nihar Shah, Zhiwei Steven Wu
-
Low-Communication Resilient Distributed Estimation Algorithm Based on Memory Mechanism
Wei Li, Limei Hu, Feng Chen, Ye Yao
-
Song Yan, Hui Wei, Jinlong Fei, Guoliang Yang, Zhengyu Zhao, Zheng Wamg
-
Whisper Smarter, not Harder: Adversarial Attack on Partial Suppression
Zheng Jie Wong, Bingquan Shen
-
When Truthful Representations Flip Under Deceptive Instructions?
Xianxuan Long, Yao Fu, Runchao Li, Mu Sheng, Haotian Yu, Xiaotian Han, Pan Li
-
Strategic Deflection: Defending LLMs from Logit Manipulation
Yassine Rachidy, Jihad Rbaiti, Youssef Hmamouche, Faissal Sehbaoui, Amal El Fallah Seghrouchni
-
Quantum-Inspired Audio Unlearning: Towards Privacy-Preserving Voice Biometrics
Shreyansh Pathak, Sonu Shreshtha, Richa Singh, Mayank Vatsa
-
Prompt Optimization and Evaluation for LLM Automated Red Teaming
Michael Freenor, Lauren Alvarez, Milton Leal, Lily Smith, Joel Garrett, Yelyzaveta Husieva, Madeline Woodruff, Ryan Miller, Erich Kummerfeld, Rafael Medeiros, Sander Schulhoff
-
Towards Privacy-preserving Photorealistic Self-avatars in Mixed Reality
Ethan Wilson, Vincent Bindschaedler, Sophie Jörg, Sean Sheikholeslam, Kevin Butler, Eakta Jain
-
Cascading and Proxy Membership Inference Attacks
Yuntao Du, Jiacheng Li, Yuetian Chen, Kaiyuan Zhang, Zhizhen Yuan, Hanshen Xiao, Bruno Ribeiro, Ninghui Li
-
Yang Wang, Chenghao Xiao, Yizhi Li, Stuart E. Middleton, Noura Al Moubayed, Chenghua Lin
-
Yang Wang, Chenghao Xiao, Yizhi Li, Stuart E. Middleton, Noura Al Moubayed, Chenghua Lin
-
UnsafeChain: Enhancing Reasoning Model Safety via Hard Cases
Raj Vardhan Tomar, Preslav Nakov, Yuxia Wang
-
Yida Tao, Yen-Chia Hsu
-
Enhancing Jailbreak Attacks on LLMs via Persona Prompts
Zheng Zhang, Peilin Zhao, Deheng Ye, Hao Wang
-
Harnessing Diffusion-Yielded Score Priors for Image Restoration
Xinqi Lin, Fanghua Yu, Jinfan Hu, Zhiyuan You, Wu Shi, Jimmy S. Ren, Jinjin Gu, Chao Dong
-
Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition
Andy Zou, Maxwell Lin, Eliot Jones, Micha Nowak, Mateusz Dziemian, Nick Winter, Alexander Grattan, Valent Nathanael, Ayla Croft, Xander Davies, Jai Patel, Robert Kirk, Nate Burnikell, Yarin Gal, Dan Hendrycks, J. Zico Kolter, Matt Fredrikson
-
Core Safety Values for Provably Corrigible Agents
Aran Nayebi
-
Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in Visual Language Models
Gabriel Downer, Sean Craven, Damian Ruck, Jake Thomas
-
Rongyao Cai, Ming Jin, Qingsong Wen, Kexin Zhang
-
Security Tensors as a Cross-Modal Bridge: Extending Text-Aligned Safety to Vision in LVLM
Shen Li, Liuyi Yao, Wujia Niu, Lan Zhang, Yaliang Li
-
Memorization in Fine-Tuned Large Language Models
Danil Savine, Muni Sreenivas Pydi, Jamal Atif, Olivier Cappé
-
Verification Cost Asymmetry in Cognitive Warfare: A Complexity-Theoretic Framework
Joshua Luberisse
-
Rongyao Cai, Ming Jin, Qingsong Wen, Kexin Zhang
-
Memorization in Fine-Tuned Large Language Models
Danil Savine
-
Guard-GBDT: Efficient Privacy-Preserving Approximated GBDT Training on Vertical Dataset
Anxiao Song, Shujie Cui, Jianli Bai, Ke Cheng, Yulong Shen, Giovanni Russello
-
The Blessing and Curse of Dimensionality in Safety Alignment
Rachel S.Y. Teo, Laziz U. Abdullaev, Tan M. Nguyen
-
Do Not Mimic My Voice: Speaker Identity Unlearning for Zero-Shot Text-to-Speech
Taesoo Kim, Jinju Kim, Dongchan Kim, Jong Hwan Ko, Gyeong-Moon Park
-
WBHT: A Generative Attention Architecture for Detecting Black Hole Anomalies in Backbone Networks
Kiymet Kaya, Elif Ak, Sule Gunduz Oguducu
-
VAE-GAN Based Price Manipulation in Coordinated Local Energy Markets
Biswarup Mukherjee, Li Zhou, S. Gokul Krishnan, Milad Kabirifar, Subhash Lakshminarayana, Charalambos Konstantinou
-
Trivial Trojans: How Minimal MCP Servers Enable Cross-Tool Exfiltration of Sensitive Data
Nicola Croce, Tobin South
-
Toward Revealing Nuanced Biases in Medical LLMs
Farzana Islam Adiba, Rahmatollah Beheshti
-
Graph Structure Learning with Privacy Guarantees for Open Graph Data
Muhao Guo, Jiaqi Wu, Yang Weng, Yizheng Liao, Shengzhe Chen
-
Tarek Gasmi, Ramzi Guesmi, Mootez Aloui, Jihene Bennaceur
-
Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models?
Chaymaa Abbas, Mariette Awad, Razane Tajeddine
-
Running in CIRCLE? A Simple Benchmark for LLM Code Interpreter Security
Gabriel Chua
-
Yuanhe Zhang, Fangzhou Xie, Zhenhong Zhou, Zherui Li, Hao Chen, Kun Wang, Yufei Guo
-
Transferable and Undefendable Point Cloud Attacks via Medial Axis Transform
Keke Tang, Yuze Gao, Weilong Peng, Xiaofei Wang, Meie Fang, Peican Zhu
-
Secure Best Arm Identification in the Presence of a Copycat
Asaf Cohen, Onur Günlü
-
Clustering-Oriented Generative Attribute Graph Imputation
Mulin Chen, Bocheng Wang, Jiaxin Zhong, Zongcheng Miao, Xuelong Li
-
Game-Theoretic Gradient Control for Robust Neural Network Training
Maria Zaitseva, Ivan Tomilov, Natalia Gusarova
-
MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?
Muntasir Wahed, Xiaona Zhou, Kiet A. Nguyen, Tianjiao Yu, Nirav Diwan, Gang Wang, Dilek Hakkani-Tür, Ismini Lourentzou
-
ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks
Ahmad ALBarqawi, Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, NhatHai Phan
-
San Kim, Jonghwi Kim, Yejin Jeon, Gary Geunbae Lee
-
LoRA-Leak: Membership Inference Attacks Against LoRA Fine-tuned Language Models
Delong Ran, Xinlei He, Tianshuo Cong, Anyu Wang, Qi Li, Xiaoyun Wang
-
Luo Cheng, Hanwei Zhang, Lijun Zhang, Holger Hermanns
-
Xiao Yang, Lingxuan Wu, Lizhong Wang, Chengyang Ying, Hang Su, Jun Zhu
-
Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs
Tevin Atwal, Chan Nam Tieu, Yefeng Yuan, Zhan Shi, Yuhong Liu, Liang Cheng
-
Exploring the Impact of Instruction-Tuning on LLM's Susceptibility to Misinformation
Kyubeen Han, Junseo Jang, Hongjin Kim, Geunyeong Jeong, Harksoo Kim
-
BadReasoner: Planting Tunable Overthinking Backdoors into Large Reasoning Models for Fun or Profit
Biao Yi, Zekun Fei, Jianing Geng, Tong Li, Lihai Nie, Zheli Liu, Yiming Li
-
RECALLED: An Unbounded Resource Consumption Attack on Large Vision-Language Models
Haoran Gao, Yuanhe Zhang, Zhenhong Zhou, Lei Jiang, Fanyu Meng, Yujia Xiao, Kun Wang, Yang Liu, Junlan Feng
-
Facial Demorphing from a Single Morph Using a Latent Conditional GAN
Nitish Shukla, Arun Ross
-
Yanzuo Lu, Yuxi Ren, Xin Xia, Shanchuan Lin, Xing Wang, Xuefeng Xiao, Andy J. Ma, Xiaohua Xie, Jian-Huang Lai
-
NWaaS: Nonintrusive Watermarking as a Service for X-to-Image DNN
Haonan An, Guang Hua, Yu Guo, Hangcheng Cao, Susanto Rahardja, Yuguang Fang
-
Ryusei Fujimoto, Yugo Nakamura, Yutaka Arakawa
-
Junyong Jiang, Buwei Tian, Chenxing Xu, Songze Li, Lu Dong
-
On Reconstructing Training Data From Bayesian Posteriors and Trained Models
George Wynne
-
Removing Box-Free Watermarks for Image-to-Image Models via Query-Based Reverse Engineering
Haonan An, Guang Hua, Hangcheng Cao, Zhengru Fang, Guowen Xu, Susanto Rahardja, Yuguang Fang
-
Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment
Hao Li, Lijun Li, Zhenghao Lu, Xianyi Wei, Rui Li, Jing Shao, Lei Sha
-
RecPS: Privacy Risk Scoring for Recommender Systems
Jiajie He, Yuechun Gu, Keke Chen
-
The Right to be Forgotten in Pruning: Unveil Machine Unlearning on Sparse Models
Yang Xiao, Gen Li, Jie Ji, Ruimeng Ye, Xiaolong Ma, Bo Hui
-
Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment
Hao Li, Lijun Li, Zhenghao Lu, Xianyi Wei, Rui Li, Jing Shao, Lei Sha
-
Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content
Ran Tong, Songtao Wei, Jiaqi Liu, Lanruo Wang
-
Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content
Ran Tong, Songtao Wei, Jiaqi Liu, Lanruo Wang
-
Joobin Jin, Seokjun Hong, Gyeongseon Baek, Yeeun Kim, Byeongjoon Noh
-
P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices
Wei Fan, JinYi Yoon, Xiaochang Li, Huajie Shao, Bo Ji
-
Investigating Training Data Detection in AI Coders
Tianlin Li, Yunxiang Wei, Zhiming Li, Aishan Liu, Qing Guo, Xianglong Liu, Dongning Sun, Yang Liu
-
On the Interaction of Compressibility and Adversarial Robustness
Melih Barsbey, Antônio H. Ribeiro, Umut Şimşekli, Tolga Birdal
-
Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks
Linbo Cao, Jinman Zhao
-
Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs
Eyal German, Sagiv Antebi, Daniel Samira, Asaf Shabtai, Yuval Elovici
-
An h-space Based Adversarial Attack for Protection Against Few-shot Personalization
Xide Xu, Sandesh Kamath, Muhammad Atif Butt, Bogdan Raducanu
-
Boosting Ray Search Procedure of Hard-label Attacks with Transfer-based Priors
Chen Ma, Xinjie Xu, Shuyu Cheng, Qi Xuan
-
BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems
Malsha Ashani Mahawatta Dona, Beatriz Cabrero-Daniel, Yinan Yu, Christian Berger
-
Kagan Ozturk, Louisa Conwill, Jacob Gutierrez, Kevin Bowyer, Walter J. Scheirer
-
Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees
Guanqin Zhang, Kota Fukuda, Zhenya Zhang, H.M.N. Dilum Bandara, Shiping Chen, Jianjun Zhao, Yulei Sui
-
A Privacy-Preserving Data Collection Method for Diversified Statistical Analysis
Hao Jiang, Quan Zhou, Dongdong Zhao, Shangshang Yang, Wenjian Luo, Xingyi Zhang
-
Ruoyang Rykie Guo
-
From Seed to Harvest: Augmenting Human Creativity with AI for Red-teaming Text-to-Image Models
Jessica Quaye, Charvi Rastogi, Alicia Parrish, Oana Inel, Minsuk Kahng, Lora Aroyo, Vijay Janapa Reddi
-
Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation
Jaechul Roh, Zachary Novack, Yuefeng Peng, Niloofar Mireshghallah, Taylor Berg-Kirkpatrick, Amir Houmansadr
-
Minimax Data Sanitization with Distortion Constraint and Adversarial Inference
Amirarsalan Moatazedian, Yauhen Yakimenka, Rémi A. Chou, Jörg Kliewer
-
Hulayyil Alshammari, Praveen Rao
-
Kenta Shiraishi, Yuka Muto, Atsushi Okazaki, Shunji Kotsuki
-
Lower Bounds for Public-Private Learning under Distribution Shift
Amrith Setlur, Pratiksha Thaker, Jonathan Ullman
-
CompLeak: Deep Learning Model Compression Exacerbates Privacy Leakage
Na Li, Yansong Gao, Hongsheng Hu, Boyu Kuang, Anmin Fu
-
Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed
Antoni Kowalczuk, Dominik Hintersdorf, Lukas Struppek, Kristian Kersting, Adam Dziedzic, Franziska Boenisch
-
Towards Trustworthy AI: Secure Deepfake Detection using CNNs and Zero-Knowledge Proofs
H M Mohaimanul Islam, Huynh Q. N. Vo, Aditya Rane
-
Advancing Robustness in Deep Reinforcement Learning with an Ensemble Defense Approach
Adithya Mohan, Dominik Rößle, Daniel Cremers, Torsten Schön
-
Jessup Byun, Xiaofeng Lin, Joshua Ward, Guang Cheng
-
GATEBLEED: Exploiting On-Core Accelerator Power Gating for High Performance & Stealthy Attacks on AI
Joshua Kalyanapu, Farshad Dizani, Darsh Asher, Azam Ghanbari, Rosario Cammarota, Aydin Aysu, Samira Mirbagher Ajorpaz
-
LENS-DF: Deepfake Detection and Temporal Localization for Long-Form Noisy Speech
Xuechen Liu, Wanying Ge, Xin Wang, Junichi Yamagishi
-
Muhammad Zaeem Shahzad, Muhammad Abdullah Hanif, Bassem Ouni, Muhammad Shafique
-
Alaa Alhamzeh, Mays Al Rebdawi
-
The Cost of Compression: Tight Quadratic Black-Box Attacks on Sketches for $\ell_2$ Norm Estimation
Sara Ahmadian, Edith Cohen, Uri Stemmer
-
Robustifying Learning-Augmented Caching Efficiently without Compromising 1-Consistency
Peng Chen, Hailiang Zhao, Jiaji Zhang, Xueyan Tang, Yixuan Wang, Shuiguang Deng
-
Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning
Boheng Li, Renjie Gu, Junjie Wang, Leyi Qi, Yiming Li, Run Wang, Zhan Qin, Tianwei Zhang
-
DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling
Boheng Li, Junjie Wang, Yiming Li, Zhiyang Hu, Leyi Qi, Jianshuo Dong, Run Wang, Han Qiu, Zhan Qin, Tianwei Zhang
-
Challenges of Trustworthy Federated Learning: What's Done, Current Trends and Remaining Work
Nuria Rodríguez-Barroso, Mario García-Márquez, M. Victoria Luzón, Francisco Herrera
-
PromptArmor: Simple yet Effective Prompt Injection Defenses
Tianneng Shi, Kaijie Zhu, Zhun Wang, Yuqi Jia, Will Cai, Weida Liang, Haonan Wang, Hend Alzahrani, Joshua Lu, Kenji Kawaguchi, Basel Alomair, Xuandong Zhao, William Yang Wang, Neil Gong, Wenbo Guo, Dawn Song
-
Scaling Decentralized Learning with FLock
Zehua Cheng, Rui Sun, Jiahao Sun, Yike Guo
-
Red-Team Multi-Agent Reinforcement Learning for Emergency Braking Scenario
Yinsong Chen, Kaifeng Wang, Xiaoqiang Meng, Xueyuan Li, Zirui Li, Xin Gao
-
Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems
Andrii Balashov, Olena Ponomarova, Xiaohua Zhai
-
Missing value imputation with adversarial random forests -- MissARF
Pegah Golchian, Jan Kapar, David S. Watson, Marvin N. Wright
-
Beyond Easy Wins: A Text Hardness-Aware Benchmark for LLM-generated Text Detection
Navid Ayoobi, Sadat Shahriar, Arjun Mukherjee
-
Exploiting Context-dependent Duration Features for Voice Anonymization Attack Systems
Natalia Tomashenko, Emmanuel Vincent, Marc Tommasi
-
Lazaro Janier Gonzalez-Soler, Maciej Salwowski, Christoph Busch
-
Rethinking Occlusion in FER: A Semantic-Aware Perspective and Go Beyond
Huiyu Zhai, Xingxing Yang, Yalan Ye, Chenyang Li, Bin Fan, Changze Li
-
Optimizing Canaries for Privacy Auditing with Metagradient Descent
Matteo Boglioni, Terrance Liu, Andrew Ilyas, Zhiwei Steven Wu
-
Robust and Differentially Private PCA for non-Gaussian data
Minwoo Kim, Sungkyu Jung
-
Weak Links in LinkedIn: Enhancing Fake Profile Detection in the Age of LLMs
Apoorva Gulati, Rajesh Kumar, Vinti Agarwal, Aditya Sharma
-
Security study based on the Chatgptplugin system: ldentifying Security Vulnerabilities
Ruomai Ren
-
Jerry Wang, Fang Yu
-
Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree
Sam Johnson, Viet Pham, Thai Le
-
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data
Alex Cloud, Minh Le, James Chua, Jan Betley, Anna Sztyber-Betley, Jacob Hilton, Samuel Marks, Owain Evans
-
Byzantine-Robust Decentralized Coordination of LLM Agents
Yongrae Jo, Chanik Park
-
Robust Control with Gradient Uncertainty
Qian Qi
-
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding
Yuanhan Zhang, Yunice Chew, Yuhao Dong, Aria Leo, Bo Hu, Ziwei Liu
-
Clustered Federated Learning for Generalizable FDIA Detection in Smart Grids with Heterogeneous Data
Yunfeng Li, Junhong Liu, Zhaohui Yang, Guofu Liao, Chuyun Zhang
-
ROBAD: Robust Adversary-aware Local-Global Attended Bad Actor Detection Sequential Model
Bing He, Mustaque Ahamad, Srijan Kumar
-
Distributional Unlearning: Forgetting Distributions, Not Just Samples
Youssef Allouah, Rachid Guerraoui, Sanmi Koyejo
-
Differentially Private Synthetic Graphs Preserving Triangle-Motif Cuts
Pan Peng, Hangyu Xu
-
Privacy Risks of LLM-Empowered Recommender Systems: An Inversion Attack Perspective
Yubo Wang, Min Tang, Nuo Shen, Shujie Cui, Weiqing Wang
-
Juan Manuel Contreras
-
Juntao Tan, Anran Li, Quanchao Liu, Peng Ran, Lan Zhang
-
VMask: Tunable Label Privacy Protection for Vertical Federated Learning via Layer Masking
Juntao Tan, Lan Zhang, Zhonghao Hu, Kai Yang, Peng Ran, Bo Li
-
GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks
Zixin Xu, Zhijie Wang, Zhiyuan Pan
-
Analyzing Internal Activity and Robustness of SNNs Across Neuron Parameter Space
Szymon Mazurek, Jakub Caputa, Maciej Wielgosz
-
MultiRetNet: A Multimodal Vision Model and Deferral System for Staging Diabetic Retinopathy
Jeannie She, Katie Spivakovsky
-
Glitches in Decision Tree Ensemble Models
Satyankar Chandra, Ashutosh Gupta, Kaushik Mallik, Krishna Shankaranarayanan, Namrita Varshney
-
FORTA: Byzantine-Resilient FL Aggregation via DFT-Guided Krum
Usayd Shahul, J. Harshan
-
Towards Urban Planing AI Agent in the Age of Agentic AI
Yanjie Fu, Dongjie Wang
-
Towards Urban Planing AI Agent in the Age of Agentic AI
Rui Liu, Tao Zhe, Zhong-Ren Peng, Necati Catbas, Xinyue Ye, Dongjie Wang, Yanjie Fu
-
Amro Abdalla, Ismail Shaheen, Dan DeGenaro, Rupayan Mallick, Bogdan Raita, Sarah Adel Bargal
-
Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques
Niveen O. Jaffal, Mohammed Alkhanafseh, David Mohaisen
-
Innocence in the Crossfire: Roles of Skip Connections in Jailbreaking Visual Language Models
Palash Nandi, Maithili Joshi, Tanmoy Chakraborty
-
Learning Deblurring Texture Prior from Unpaired Data with Diffusion Model
Chengxu Liu, Lu Qi, Jinshan Pan, Xueming Qian, Ming-Hsuan Yang
-
Adversarial Training Improves Generalization Under Distribution Shifts in Bioacoustics
René Heinrich, Lukas Rauch, Bernhard Sick, Christoph Scholz
-
Byzantine-resilient federated online learning for Gaussian process regression
Xu Zhang, Zhenyuan Yuan, Minghui Zhu
-
FuSeFL: Fully Secure and Scalable Cross-Silo Federated Learning
Sahar Ghoflsaz Ghinani, Elaheh Sadredini
-
An Adversarial-Driven Experimental Study on Deep Learning for RF Fingerprinting
Xinyu Cao, Bimal Adhikari, Shangqing Zhao, Jingxian Wu, Yanjun Pan
-
TopicAttack: An Indirect Prompt Injection Attack via Topic Transition
Yulin Chen, Haoran Li, Yuexin Li, Yue Liu, Yangqiu Song, Bryan Hooi
-
Eldor Abdukhamidov, Mohammed Abuhamad, Simon S. Woo, Hyoungshick Kim, Tamer Abuhmed
-
Hallucination Score: Towards Mitigating Hallucinations in Generative Image Super-Resolution
Weiming Ren, Raghav Goyal, Zhiming Hu, Tristan Ty Aumentado-Armstrong, Iqbal Mohomed, Alex Levinshtein
-
FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated Learning
Md Rafid Haque, Abu Raihan Mostofa Kamal, Md. Azam Hossain
-
Manipulation Attacks by Misaligned AI: Risk Analysis and Safety Case Framework
Rishane Dassanayake, Mario Demetroudi, James Walpole, Lindley Lentati, Jason R. Brown, Edward James Young
-
Youssef Tawfilis, Hossam Amer, Minar El-Aasser, Tallal Elshabrawy
-
Prompt Injection 2.0: Hybrid AI Threats
Jeremy McHugh, Kristina Šekrst, Jon Cefalu
-
Kutub Uddin, Awais Khan, Muhammad Umar Farooq, Khalid Malik
-
Automating Steering for Safe Multimodal Large Language Models
Lyucheng Wu, Mengru Wang, Ziwen Xu, Tri Cao, Nay Oo, Bryan Hooi, Shumin Deng
-
DiffClean: Diffusion-based Makeup Removal for Accurate Age Estimation
Ekta Balkrishna Gavas, Chinmay Hegde, Nasir Memon, Sudipta Banerjee
-
Taming Diffusion Transformer for Real-Time Mobile Video Generation
Yushu Wu, Yanyu Li, Anil Kag, Ivan Skorokhodov, Willi Menapace, Ke Ma, Arpit Sahni, Ju Hu, Aliaksandr Siarohin, Dhritiman Sagar, Yanzhi Wang, Sergey Tulyakov
-
Training Transformers with Enforced Lipschitz Constants
Laker Newhouse, R. Preston Hess, Franz Cesista, Andrii Zahorodnii, Jeremy Bernstein, Phillip Isola
-
Architectural Backdoors in Deep Learning: A Survey of Vulnerabilities, Detection, and Defense
Victoria Childress, Josh Collyer, Jodie Knapp
-
MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems
Yu Cui, Hongyang Du
-
IConMark: Robust Interpretable Concept-Based Watermark For AI Images
Vinu Sankar Sadasivan, Mehrdad Saberi, Soheil Feizi
-
Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers
Liang Lin, Zhihao Xu, Xuehai Tang, Shi Liu, Biyu Zhou, Fuqing Zhu, Jizhong Han, Songlin Hu
-
Revisiting LLM Value Probing Strategies: Are They Robust and Expressive?
Siqi Shen, Mehar Singh, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Rada Mihalcea
-
Fake or Real: The Impostor Hunt in Texts for Space Operations
Agata Kaczmarek, Dawid Płudowski, Piotr Wilczyński, Przemysław Biecek, Krzysztof Kotowski, Ramez Shendy, Jakub Nalepa, Artur Janicki, Evridiki Ntagiou
-
Spatial Frequency Modulation for Semantic Segmentation
Linwei Chen, Ying Fu, Lin Gu, Dezhi Zheng, Jifeng Dai
-
Robust Planning for Autonomous Vehicles with Diffusion-Based Failure Samplers
Juanran Wang, Marc R. Schlichting, Mykel J. Kochenderfer
-
InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing
Kun-Hsiang Lin, Yu-Wen Tseng, Kang-Yang Huang, Jhih-Ciang Wu, Wen-Huang Cheng
-
Non-Adaptive Adversarial Face Generation
Sunpill Kim, Seunghun Paik, Chanwoo Hwang, Minsu Kim, Jae Hong Seo
-
Thought Purity: Defense Paradigm For Chain-of-Thought Attack
Zihao Xue, Zhen Bi, Long Ma, Zhenlin Hu, Yan Wang, Zhenfang Liu, Qing Sheng, Jie Xiao, Jungang Lou
-
[LLMs Encode Harmfulness and Refusal Separately](https://arxiv.org/abs/2507.1