ME News reports that on April 21 (UTC+8), according to monitoring by Beating, customer service AI company Sierra has open-sourced a multilingual automatic speech recognition (ASR) benchmark dataset, μ-Bench. The data consists of 250 real customer service phone recordings and 4,270 manually annotated audio clips, sampled at 8 kHz in mono format. Previously available ASR benchmarks either focused solely on English or used studio-recorded read speech, making them nearly irrelevant for teams aiming to deploy voice agents in multilingual customer service environments. μ-Bench directly fills this gap by using real-world call data. This release represents a subset of Sierra’s full internal benchmark suite, which covers 42 languages, 79 regional variants, and over 13 vendors. The open-sourced portion includes five languages/regions—English, Spanish, Turkish, Vietnamese, and Mandarin—as well as performance results from five vendors: Deepgram Nova-3, Google Chirp-3, Microsoft Azure Speech, ElevenLabs Scribe v2, and OpenAI GPT-4o Mini Transcribe. The code, dataset (hosted on Hugging Face), and an open leaderboard are now publicly available, inviting other vendors to submit their results. The most novel aspect of the evaluation lies in its metrics. Sierra introduces a new metric, UER (Utterance Error Rate), which distinguishes between errors that alter meaning and those that are inconsequential. Traditional WER (Word Error Rate) treats missing a filler word like “uh” the same as mishearing a phone number—but for a voice agent executing actions based on transcription, only the latter causes operational failures. Sierra notes that two vendors with similar WER scores may have vastly different UER scores because they make fundamentally different types of errors. In terms of results, Google Chirp-3 leads in accuracy but has slower inference speed; Deepgram Nova-3 achieves nearly 8x faster p50 latency but ranks lowest in multilingual accuracy. Mandarin recognition error rates can reach five times those of English, and Vietnamese performance varies significantly across vendors—differences invisible when evaluating solely on English benchmarks. (Source: BlockBeats)
Sierra open-sources μ-Bench for multilingual ASR evaluation
KuCoinFlashShare






Sierra, a customer service AI company, has open-sourced μ-Bench, a multilingual ASR benchmark featuring 250 real call recordings and 4,270 annotated samples. The dataset uses 8kHz audio and introduces UER, a metric for tracking meaningful errors. Results show Mandarin error rates are up to five times higher than English. The release coincides with growing interest in new token listings and as markets respond to inflation data.
Source:Show original
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information.
Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.