Florence-2-Base

Generate descriptive captions for input images, enabling visual recognition and image-to-text translation tasks

Model Properties

Running Images to generate descriptive outputs, enabling automated image annotation and visual understanding

License Name: MIT
Number of Parameters: 220M

Select device..

Technical Details

Image Input Size: [224, 224, 3]
Inference API: CPP

Performance Metrics

FPS 2.307
Text Time To First Token Per Sec 0.142
TPS 110

Explore Mode Models

GenAI Models
Qwen2-VL 2B
Generate multimodal responses by interpreting both text and images, enabling vision-language understanding and content creation