Qwen2-1.5B-Instruct

Generate text responses to prompts, enabling natural language conversations
and content creation.

Model Properties

The pipeline consists of a prefill and tbt models

License name: Apache License 2.0
Number of parameters: 1.5B
Model Size: 1.56 GB
Select device..

Technical Details

Operations: 29.4 GOPs per input token
Context Length: 2048
Numerical Scheme: A8W4, symmetric, channel-wise
Inference Api: CPP, Hailo-Ollama

Performance Metrics

First Load Time In Sec 8.34639
Time To First Token In Sec 0.322963
TPS 8.12567
Accuracy
Test Evaluation Metric Full Precision Accuracy Post Quantization Accuracy
MMLU accuracy 55 51