Cohere Transcribe: Open-Source Speech Recognition That Finally Beats Whisper

⬅️ Back to Projects

🎙️ Cohere Transcribe · HuggingFace · Free API · Docs · Apache 2.0


Cohere released a speech recognition model that quietly took the top spot on the HuggingFace Open ASR Leaderboard and has stayed there.

Transcribe is a 2-billion-parameter Conformer encoder-decoder model trained from scratch on 14 languages. It achieves an average word error rate of 5.42%, beating OpenAI Whisper Large v3 (7.44%), ElevenLabs Scribe v2 (5.83%), and Qwen3-ASR-1.7B (5.76%).

It is open-source under Apache 2.0. The weights are on HuggingFace. You can download them and run inference locally or in edge environments.

What Makes It Different

Most ASR models optimize for benchmark performance and hope it translates to real-world use. Cohere tested theirs both ways — on standard benchmarks and with human evaluators rating transcripts for accuracy, coherence, and usability. The human evaluations matched the benchmark results, which is rarer than it should be.

The model handles multi-speaker environments, boardroom acoustics, and diverse accents. The AMI dataset (meeting transcription) shows 8.13% WER against Whisper’s 15.95%. That is not a small gap.

It supports 14 languages: English, French, German, Italian, Spanish, Portuguese, Greek, Dutch, Polish, Chinese (Mandarin), Japanese, Korean, Vietnamese, and Arabic.

How to Use It

You can download the model from HuggingFace and run it locally. It is designed for practical GPU and edge use, not just research.

You can also access Cohere Transcribe via their free API for low-setup experimentation subject to rate limits. See the documentation for usage details and integration guidance.

For production deployment without rate limits, Cohere offers Model Vault — dedicated, private cloud inference priced per hour-instance.

Why This Matters

The ASR space has been dominated by Whisper for a long time. Not because it is the best, but because it was open-source and good enough. Cohere’s entry changes the calculus. A model that beats Whisper on accuracy, runs efficiently, and is Apache 2.0 licensed is a meaningful upgrade for anyone building transcription into their workflow.

Cohere plans to integrate Transcribe deeper into North, their enterprise AI orchestration platform. But even as a standalone model, it is worth evaluating if you process audio at scale.

Key contributors: Julian Mack, Ekagra Ranjan, Cassie Cao, Bharat Venkitesh, Pierre Harvey Richemond.

Crepi il lupo! 🐺