AudioHijack Attack Hides Inaudible Commands in Audio to Manipulate AI Voice Models
Researchers at Zhejiang University have developed a novel attack called AudioHijack that embeds imperceptible commands into audio clips to manipulate large audio-language models (LALMs). The attack achieves a 79–96% success rate across 13 open-source models and also works on commercial systems from Microsoft and Mistral. Unlike traditional prompt injection, AudioHijack alters the audio waveform itself, bypassing defenses designed for text-based attacks. The manipulated audio can cause models to refuse requests, spread false information, insert harmful links, change personality, or perform unauthorized actions such as web searches, file downloads, and email sending. The attack can be delivered via online videos, music clips, voice notes, or Zoom call audio. The most effective defense, monitoring internal attention mechanisms, can be partially evaded by reducing manipulation strength. The researchers are now investigating if the technique can extend to closed models from OpenAI and Anthropic through shared open-source audio components.
Key facts
- AudioHijack embeds inaudible commands in audio to manipulate AI voice models with up to 96% success.
- Attack works on 13 open-source LALMs and commercial systems from Microsoft and Mistral.
- Standard defenses block only a small fraction of attacks; monitoring attention mechanisms is most effective.
- Delivery methods include online videos, music clips, voice notes, and Zoom call audio.
- Researchers explore extending attack to closed models from OpenAI and Anthropic.