ElevenLabs Open-Sources Speech Engine Skill for Real-Time Voice Integration

ME AI News, according to monitoring by Beating, voice AI unicorn ElevenLabs has officially open-sourced its real-time voice conversation component, Speech Engine Skill. Speech Engine Skill adheres to the Agent Skills open specification, designed to enable AI agents and large language model applications to rapidly integrate high-fidelity, low-latency voice interaction capabilities. Developers only need to run the command `npx skills add elevenlabs/skills` to add the voice engine to their project runtime, without needing to integrate multiple APIs or build complex state machines. Speech Engine Skill is built on high-performance WebSocket connections, with each connection representing a call session. When a user speaks, the browser captures audio and streams it to ElevenLabs, which then performs real-time speech-to-text and pushes the text to the developer’s server. The server generates a streaming text response via a large language model and sends it back using the SDK’s `sendResponse()` or `send_response()` function (supporting strings or async iterators). ElevenLabs subsequently converts this into low-latency synthesized speech played back in the browser. The SDK manages network routing, request signature validation, heartbeat detection, and session lifecycle in the background, with native support for interruption and turn-taking. To simplify frontend development, ElevenLabs has also launched the client libraries `@elevenlabs/react` and `@elevenlabs/client`. With just minimal frontend code and a secure session token issued by the server, developers can quickly deploy a digital voice assistant with noise resistance and interruption tolerance. In practical deployments, ElevenLabs recommends treating transcribed speech as untrusted input and implementing deterministic security guardrails or intent whitelisting on the server side to prevent raw speech-to-text output from directly triggering privileged model actions or sensitive tool calls. (Source: BlockBeats)