0/5 (0 Reviews)

Qwen2-VL-2B-Instruct

Generate multimodal responses by interpreting both text and images, enabling vision-language understanding and content creation

Model Properties

The pipeline processes image and text inputs using a vision
encoder and language model to generate contextualized outputs

License name: Apache License 2.0
Number of parameters: 2B
Model Size: 2.18 GB
Select device..

Technical Details

Image Input Size: [336, 336, 3]
Numerical Scheme: A8W4, symmetric, channel-wise
Inference Api: C++, Python
Vision Tokens Per Frame: 144
Context Length: 2048
Compiled Model:

Performance Metrics

First Load Time In Sec 6.226
Text Time To First Token In Sec 0.32
Image Time To First Token In Sec 0.93
TPS 6.73
Time To First Token In Sec 0.97

Explore More Models

Gen AI _ Stable Diffusion
GenAI Models
Stable Diffusion 1.5
Generate high-quality images from textual descriptions by leveraging advanced deep learning techniques
GenAI Models
Qwen2-VL-2B-Instruct
Generate multimodal responses by interpreting both text and images, enabling vision-language understanding and content creation
0/5 (0 Reviews)
0/5 (0 Reviews)