Whisper-Base

Automatic Speech Recognition (ASR) transcribe spoken audio to text with multilingual support and speech translation to English

Model Properties

Audio is sampled to 16 kHz and converted to 10s window. A Transformer encoder processes the spectrogram and a Transformer decoder autoregressively predicts text tokens.

License name: Apache License 2.0
Number of parameters: 74M
Model Size: 155 MB
Select device..

Technical Details

Audio Sample Rate Hz: 16000
Tokenizer: Whisper tokenizer (multilingual).
Numerical Scheme: A8W8, symmetric, channel-wise
Inference API: CPP
Compiled Model:

Performance Metrics

Load Time In Sec 1.33
Time To First Token In Sec 0.07
TPS 15.00
Accuracy
Test Evaluation Metric Full Precision Accuracy Post Quantization Accuracy
LibriSpeech WER TBD TBD
Sub-Model Performance Metrics
Encoder
Description Transformer encoder over spectrogram frames
Decoder
Description Autoregressive Transformer decoder with beam search / temperature sampling

Explore Related Models

Qwen2.5-Coder-1.5B-Instruct
GenAI Models
Qwen2.5-Coder-1.5B
Generate text responses to prompts, enabling natural language understanding, multilingual support, and code generation
LLM
GenAI Models
Qwen2.5 1.5B-Instruct
Generate text responses to prompts, enabling natural language understanding, multilingual support, and content creation