Understanding Adversarial Transferability in Vision-Language Models for Autonomous Driving: A Cross-Architecture Analysis

David Fernandez; Pedram MohajerAnsari; Amir Salarpour; Mert D. Pese

doi:10.4271/2026-01-0170

Understanding Adversarial Transferability in Vision-Language Models for Autonomous Driving: A Cross-Architecture Analysis

Apr 7, 2026·

David Fernandez

,

Pedram MohajerAnsari

,

Amir Salarpour

,

Mert D. Pese

· 0 min read

PDF DOI

Abstract

Vision-language models (VLMs) are increasingly used in autonomous driving because they combine visual perception with language-based reasoning, supporting more interpretable decision-making, yet their robustness to physical adversarial attacks, especially whether such attacks transfer across different VLM architectures, is not well understood and poses a practical risk when attackers do not know which model a vehicle uses. We address this gap with a systematic cross-architecture study of adversarial transferability in VLM-based driving, evaluating three representative architectures (Dolphins, OmniDrive, and LeapVAD) using physically realizable patches placed on roadside infrastructure in both crosswalk and highway scenarios. Our transfer-matrix evaluation shows high cross-architecture effectiveness, with transfer rates of 73-91% (mean TR = 0.815 for crosswalk and 0.833 for highway) and sustained frame-level manipulation over 64.7-79.4% of the critical decision window even when patches are not optimized for the target model.

Type

Conference paper

Publication

SAE Technical Paper 2026-01-0170, WCX SAE World Congress Experience

Last updated on Apr 7, 2026

Adversarial Attacks Vision Language Models Autonomous Vehicles AI Security

Authors

David Fernandez

PhD Candidate in Computer Science

David Fernandez is a PhD candidate in Computer Science at Clemson University, working on safe, efficient, and explainable AI for safety-critical systems. His research spans perception, adversarial robustness, and on-device deployment of large foundation models — including LLMs and VLMs — with five first-authored publications on component-level explainability, zero-shot reasoning, and adversarial scenario analysis, alongside collaborative work on edge AI for industrial agentic systems. Much of this research is grounded in autonomous driving, where trustworthiness, latency, and robustness constraints are unforgiving, but the underlying methods transfer broadly to other high-stakes domains.

As a member of Clemson’s VIPR-GS Research Program, he develops hierarchical LLM reasoning frameworks and VLM evaluation systems for the U.S. Army’s Next Generation Combat Vehicle (NGCV) program. At BMW Group, he builds AI production security frameworks and edge deployment systems. His work focuses on robust, interpretable AI that bridges rigorous research and real-world deployment.

From MIRAGE to CLEAR: Component-Level Explainable Anomaly Reasoning for Autonomous Vehicle Perception Systems Jan 1, 2026 →

No results found

Understanding Adversarial Transferability in Vision-Language Models for Autonomous Driving: A Cross-Architecture Analysis