Qwen2-VL 2B

Sign in to your account Log in Don't have an account? Sign up

Sign in to your account Log in Don't have an account?
Sign up

Generate multimodal responses by interpreting both text and images, enabling vision-language understanding and content creation

Model Properties

The pipeline processes image and text inputs using a vision encoder and language model to generate contextualized outputs

License name: Apache License 2.0
Number of parameters: 2B

Select device..

Hailo-10H

Context Length: 2048

Image Input Size: [224, 224, 3]

Max Output Tokens: 64 vision tokens

FPS 2.45

Text Prefill Per Sec 400

Text Time To First Token Per Sec 0.24

Image Time To First Token Per Sec 0.4

TPS 11

GenAI Models

Generate descriptive captions for input images, enabling visual recognition and image-to-text translation tasks

GenAI Models

Generate high-quality images from textual descriptions by leveraging advanced deep learning techniques