0/5 (0 Reviews)

Whisper-Base

Automatic Speech Recognition (ASR) transcribes spoken audio to text with multilingual support and speech translation to English

Model Properties

Audio is sampled at 16 kHz and converted to a 10s window. A transformer encoder processes the spectrogram, and a transformer decoder autoregressively
predicts text tokens

License name: Apache License 2.0
Number of parameters: 74M
Model Size: 155 MB
Select device..

Technical Details

Audio Sample Rate Hz: 16,000
Tokenizer: Whisper tokenizer (multilingual)
Numerical Scheme: Mixed precision
Inference API: C++, Python
Compiled Model:

Performance Metrics

Load Time In Sec 3.89
TPS 23.36
Sub-Model Performance Metrics
Encoder Transformer encoder over spectrogram frames
Decoder Autoregressive Transformer decoder with beam search / temperature

Explore Related Models

Qwen2.5-Coder-1.5B-Instruct
GenAI Models
Qwen2.5-Coder-1.5B-Instruct
Generate text responses to prompts, enabling natural language understanding, multilingual support, and code generation
LLM
GenAI Models
Qwen2.5-1.5B-Instruct
Generate text responses to prompts, enabling natural language understanding, multilingual support, and content creation
0/5 (0 Reviews)
0/5 (0 Reviews)