Whisper-Base

Name: Whisper-Base
Brand: Hailo AI

Automatic Speech Recognition (ASR) transcribes spoken audio to text with multilingual support and speech translation to English

Model Properties

Audio is sampled at 16 kHz and converted to a 10s window. A transformer encoder processes the spectrogram, and a transformer decoder autoregressively
predicts text tokens

License name: Apache License 2.0
Number of parameters: 74M
Model Size: 155 MB

Select device..

Hailo-10H

Technical Details

Audio Sample Rate Hz: 16,000

Tokenizer: Whisper tokenizer (multilingual)

Numerical Scheme: Mixed precision

Inference API: C++, Python

Compiled Model:

Performance Metrics

Load Time In Sec 3.89

TPS 23.36

Sub-Model Performance Metrics

Encoder Transformer encoder over spectrogram frames

Decoder Autoregressive Transformer decoder with beam search / temperature

Explore Related Models

GenAI Models

Qwen2.5-Coder-1.5B-Instruct

Generate text responses to prompts, enabling natural language understanding, multilingual support, and code generation

GenAI Models

Qwen2.5-1.5B-Instruct

Generate text responses to prompts, enabling natural language understanding, multilingual support, and content creation