View Leaderboard
Zoom Speech ranks among the top models on the Open ASR Leaderboard View Leaderboard
close banner
Zoom AI Services: Scribe API
Zoom AI Services: Scribe API

The World’s Most Accurate Speech-to-Text API

Quality transcription you can trust, powered by Zoom’s ASR Model Pro.

 

Ranked as a top performing model on the HuggingFace Open ASR Leaderboard, Scribe API offers both Fast Sync and Batch Transcription services, built on the same ASR powering millions of Zoom meetings every day.

Zoom AI Services: Scribe API
Capabilities
Capabilities

Everything You Need for Speech-to-Text

From near real-time to batch processing, Zoom Scribe API delivers transcription services with high accuracy and speed.

Fast Sync Transcription

Fast Sync Transcription

Access synchronous, low-latency transcription for individual audio files. Process one file at a time with immediate response after completion.

Batch Transcription

Batch Transcription

Process pre-recorded audio and video files at scale. Support for MP3, WAV, MP4, FLAC, OGG, and more with automatic format detection.

Word-Level Timestamps

Word-Level Timestamps

Get precise start and end times for every word, enabling perfect subtitle generation, audio search, and content indexing.

Use Your Own Storage

Use Your Own Storage

Store your transcripts securely in your own AWS S3 bucket for enhanced data control and compliance.

Speaker Diarization

Speaker Diarization

Automatically label different speakers in multi-party conversations with high accuracy. (Coming soon)

Multiple Language Support

Multiple Language Support

Transcribe in multi-languages and dialects with accent robust models trained on diverse datasets. (Coming soon)

Performance
Performance

Zoom Speech ranks among the top models on the Open ASR Leaderboard

We're proud to share that Zoom's Speech Recognition technology ranks among the top models on the  Open ASR Leaderboard, a global benchmark for automatic speech recognition (ASR) performance. This milestone reflects our relentless pursuit of excellence in speech technology and the strength of Zoom AI Services' Scribe API.
Key model strengths include:

  • Best-in-class Accuracy: Our scalable innovation framework continuously enhances model quality, providing best-in-class transcription results where accuracy and readability matter most.
  • Mastery of Enterprise Terminologies: Optimized for business and technical contexts, the model accurately handles company names, product terms, and domain-specific jargon — a critical advantage for meetings, support calls, and professional documentation.
  • Reduced Hallucinations: Zoom’s advanced modeling strategies minimize transcription “hallucinations” so what’s recognized truly reflects the speaker’s intent, not artificial or extraneous words.
Developer First
Developer First

Start Transcribing in Minutes

  • Simple, well-documented APIs
  • REST API and OpenAPI spec
  • Comprehensive error handling
  • Webhook callbacks for async jobs
Pricing
Pricing

Simple, Transparent Pricing

Apply prepaid credits to your use of Scribe API with transparent rates.

Scribe API Fast

Developer Resources

Everything you need to integrate, build, and ship with Zoom Scribe API.

Ready to Build with a Leading Speech-to-Text API?