Home GenAI Models Qwen3-VL-2B-Instruct

Qwen3-VL-2B-Instruct

Generate multimodal responses by interpreting both text and video, enabling vision-language understanding and content creation

Model Properties

The pipeline processes image and text inputs using a vision
encoder and language model to generate contextualized outputs

License name: Apache License 2.0
Number of parameters: 2B
Model Size: 2.18 GB

Select device..

Hailo-10H

Image Input Size: [288, 512, 3]

Numerical Scheme: A16W4, symmetric, group-wise

Inference Api: C++, Python

Vision Tokens Per Frame: 144

Context Length: 2048

Compiled Model:

First Load Time In Sec 7.02

Time To First Token In Sec 1.47

TPS 4.74

GenAI Models

Generate high-quality images from textual descriptions by leveraging advanced deep learning techniques

GenAI Models

Generate multimodal responses by interpreting both text and images, enabling vision-language understanding and content creation