Automatic Speech Recognition (ASR) transcribe spoken audio to text with multilingual support and speech translation to English
Audio is sampled to 16 kHz and converted to 10s window. A Transformer encoder processes the spectrogram and a Transformer decoder autoregressively predicts text tokens.
License name: Apache License 2.0| Test | Evaluation Metric | Full Precision Accuracy | Post Quantization Accuracy |
|---|---|---|---|
| LibriSpeech | WER | TBD | TBD |