जेजियांग विश्वविद्यालय के शोधकर्ता आवाज एआई और क्रिप्टो वॉलेट के लिए ऑडियोहाइजैक खतरे की चेतावनी देते हैं

ज़ेजियांग विश्वविद्यालय के शोधकर्ताओं ने एआई आवाज़ प्रणालियों को हाइजैक करने का एक नया, चौंकाने वाला तरीका खोजा है: अदृश्य, मशीन-पठनीय ऑडियो सिग्नल जो मॉडल्स के व्यवहार को बदल देते हैं, जबकि मनुष्यों के लिए अश्रव्य रहते हैं। सैन फ्रांसिस्को में 47वें IEEE सुरक्षा और प्राइवेसी संमेलन में प्रस्तुत किए गए, इस तकनीक—जिसे AudioHijack कहा गया है—के साथ, टीम ने बताया कि बड़े ऑडियो-भाषा मॉडल्स (LALMs) को अधिकतम 96% सफलता के साथ बदला जा सकता है। हमला क्या करता है - AudioHijack, संख्यात्मक मानों में ऐसे सूक्ष्म परिवर्तन करके, छिपे हुए आदेशों को डिजिटल ऑडियो तरंगरूप में सीधे एम्बेड करता है, जिन्हें मनुष्य नहीं सुन सकते, लेकिन LALMs उन्हें निर्देश के रूप में व्याख्या करते हैं। - विरोधी सिग्नल संदर्भ-निरपेक्ष है: लगभग आधे घंटे के प्रशिक्षण के बाद, समान सिग्नल किसी भी मान्य बोलचाल के साथ पुनः प्रसारित किया जा सकता है, और मॉडल के व्यवहार को अभी भी प्रभावित कर सकता है, जैसा कि प्रमुख लेखक मेंग चेन ने कहा। - क्योंकि यह टेक्स्ट प्रतिलिपि के बजाय सीधे ऑडियो को हेरफेर करता है, इसलिए यह दुष्ट टेक्स्ट प्रॉम्प्ट्स की पहचान के लिए डिज़ाइन की गई कई सुरक्षा प्रणालियों से बच जाता है। शोधकर्ताओं ने क्या प्रदर्शित किया - टीम ने AudioHijack का परीक्षण 13 ओपन-सोर्स AI आवाज़ मॉडल्स पर किया, साथ ही Microsoft और Mistral की कमर्शियल आवाज़ प्रणालियों पर, जो समान आर्किटेक्चर का उपयोग करती हैं। - हेरफेर किए गए ऑडियो से मॉडल्स को अनुरोधों से मना करने, गलत सूचना प्रसारित करने, हानिकारक लिंक्स सम्मिलित करने, व्यक्तित्व को बदलने, या उपयोगकर्ता द्वारा कभी मांगी गई क्रियाओं को संपन्न करने—जैसे: वेब सर्च, फ़ाइल डाउनलोड, प्राइवेट डेटा को सार्वजनिक करने वाले e-मेल भेजने—जैसी क्रियाएँ करवाईं। - शोधकर्ताओं का कहना है कि हमला सामान्य स्रोतों—जैसे: ऑनलाइन वीडियो, संगीत फ़ाइलें, आवाज़ संदेश, Zoom कॉल से प्राप्त हुए ऑडियो—के माध्यम से पहुँचाया जा सकता है, जिन्हें AI प्रतिलिपि सेवाओं मेंअपलोड किया जाता है। प्रकाशित-अनुप्रकाशित पीछे की प्रगति में, समान हमलों को AI-आधारित संवाद में प्रदर्शित किए जाने की सूचना है। यह हमला क्यों alag है,और 100% rokna khatarnak hai - पारंपरिक "प्रॉम्प्ट-इंजेक्शन" हमले,उपयोगकर्ता कीबोलीगईबातया 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 12345678998765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323456789876543234567898765432345678987654323 What the attack does - AudioHijack embeds hidden commands directly into a digital audio waveform by tweaking numerical values in ways humans can’t hear but that LALMs interpret as instructions. - The adversarial signal is context-agnostic: after about half an hour of training, the same signal can be replayed alongside any legitimate speech and still steer the model’s behavior, lead author Meng Chen said. - Because it manipulates the audio itself rather than the text transcription, it sidesteps many defenses designed to detect malicious text prompts. What researchers demonstrated - The team tested AudioHijack on 13 open-source AI voice models and on commercial voice systems from Microsoft and Mistral that use similar architectures. - Manipulated audio could make models refuse requests, spread false information, inject harmful links, alter personality, or carry out actions the user never asked for—examples include web searches, file downloads, and sending emails that leak personal data. - The researchers note the attack can be delivered through common channels such as online videos, music files, voice notes, or audio captured from Zoom calls and uploaded to AI transcription services. Unpublished follow-up work reportedly shows similar attacks in live AI voice chats. Why this is different and harder to stop - Traditional “prompt injection” attacks change what a user says or inject malicious text. AudioHijack instead changes the analog/digital audio signal so the manipulation is invisible to text-based filters and many existing safeguards. - Monitoring a model’s internal attention mechanisms was the most effective defense the team tested, but adaptive attackers can weaken their manipulations to evade that countermeasure while retaining much of the attack’s potency. “These single-point defenses struggle to resist our attack because we found it’s very hard for these models to distinguish the normal user intent and our adversary attack,” Chen said. Why crypto platforms should care - As crypto services increasingly experiment with voice-driven features—voice-based wallet access, trading assistants, customer support workflows, or voice authentication—AudioHijack highlights a new attack surface that could be abused for phishing, social-engineering, or to trigger unwanted actions in connected systems. - Although the study did not demonstrate crypto-specific theft, any service that accepts spoken commands or ingests audio could be at risk if voice interfaces are trusted for sensitive operations. Delivery vectors such as videos, music, or call recordings are all channels commonly used in scams. Practical takeaways - Vendors and operators using AI voice models should not rely on text-only filters to catch abuse; defenses that inspect model internals and multi-factor checks for sensitive actions are advisable. - For crypto firms and users, avoid relying solely on voice as an authentication or authorization method; require additional verification for transfers and account-critical actions, and be cautious about audio from untrusted sources. - The research underscores the need for broader threat modeling and collaboration between AI, security, and crypto teams as voice-driven features roll out. The full attack and experiments were presented by Zhejiang University researchers at the IEEE symposium; the work raises urgent questions about how to secure audio-driven AI systems before they become a vector for large-scale abuse.