Home GenAI Models Qwen2-VL-2B-Instruct

Qwen2-VL-2B-Instruct

Name: Qwen2-VL-2B-Instruct
Brand: Hailo AI

Generate multimodal responses by interpreting both text and images, enabling vision-language understanding and content creation

Model Properties

The pipeline processes image and text inputs using a vision
encoder and language model to generate contextualized outputs

License name: Apache License 2.0
Number of parameters: 2B
Model Size: 2.18 GB

Select device..

Hailo-10H

Image Input Size: [336, 336, 3]

Numerical Scheme: A8W4, symmetric, channel-wise

Inference Api: C++, Python

Vision Tokens Per Frame: 144

Context Length: 2048

Compiled Model:

First Load Time In Sec 6.226

Text Time To First Token In Sec 0.32

Image Time To First Token In Sec 0.93

TPS 6.73

Time To First Token In Sec 0.97

GenAI Models

Generate high-quality images from textual descriptions by leveraging advanced deep learning techniques

GenAI Models

Generate multimodal responses by interpreting both text and images, enabling vision-language understanding and content creation