Hailo AI
Software Suite

Overview

Hailo devices are accompanied by a comprehensive AI Software Suite that enables the compilation of deep learning models and the implementation of AI applications in production environments. The model build environment seamlessly integrates with common ML frameworks to allow smooth and easy integration in existing development ecosystems. The runtime environment supports Hailo’s vision processors, and enables integration and deployment in host processors, such as x86 and ARM based products, when utilizing Hailo’s AI accelerators. 

Key Components of the
AI Software Suite:

Model Build Environment

Model Build Computer

Model Zoo Vision, a variety of common and state-of-the-art pre-trained models and tasks in TensorFlow and ONNX.

Model Zoo GenAI, a curated collection of pretrained generative models, available through a CLI and REST API

Dataflow Compiler, for offline compilation and optimization of user’s models for Hailo devices.

Runtime Environment

Example Applications, a set of full application examples, implementing pipeline elements and pre-trained AI tasks for Vision, GenAI and Camera applications. Additional example applications can be found in our community and GitHub repositories

HailoRT, production-grade and light runtime software package, running on the host processor for real-time inferencing the deep learning models compiled by the Dataflow Compiler. It supports both Vision and Generative AI models.

AI Vision Processor

Example Applications – a set of full application examples, implementing pipeline elements and pre-trained AI tasks for Vision, GenAi and Camera applicationsAdditional example applications can be found in our community and GitHub repositories

HailoRT, production-grade and light runtime software package running on Hailo-15 for real-time inferencing the deep learning models compiled by the Dataflow Compiler.

Breathe life into your edge products with Hailo’s AI Accelerators and Vision Processors

Dataflow Compiler

A Complete & Scalable Software Toolchain

Hailo devices are accompanied by a comprehensive dataflow compiler that seamlessly integrates with existing deep learning development frameworks to allow smooth and easy integration in existing development ecosystems.

Full deployment flow toolchain capabilities:

Model Translation from industry-standard frameworks to Hailo executable format

Model Optimization to the internal representation, using state of the art quantization scheme

Automated resource allocation for meeting user requirements in FPS, latency, and power consumption

Compilation of models into Hailo binary by the dedicated deep learning compiler

Loading binary and running inference on the Hailo target device

Supports both standalone inference allowing direct access to the device, and Tensorflow-integrated inference for easy integration with existing environments

Analysis and debug tools:

Emulator providing bit exact emulation of chip behavior

Profiler providing chip performance estimations (e.g. FPS, power and latency)

HailoRT

Runtime Software Suite

HailoRT is a production-grade, light, scalable runtime softwareproviding a robust library with intuitive APIs for optimized performance. Our AI SDK enables developers to build easy and fast pipelines for AI applications in production and is also suitable for evaluation and prototyping. It runs on Hailo AI Vision Processor or when utilizing Hailo AI accelerators, it runs on the host processor and enables high throughput inferencing with one or more Hailo devicesHailoRT is available as open-source software via Hailo Github.

HailoRT Key capabilities:

Multi-Host architecture support – supports both x86 & ARM architectures

Multiple OS systems support such as Linux, Windows and Android 

Flexible interfaces for AI applications – C/C++ and Python API

Easy integration with devices and pipelines – standard frameworks support:
GStreamer, ONNX runtime, Ollama and other Frameworks

Supports multiple streams – process multiple video streams simultaneously

Supports high throughput inferencing with up to 16 Hailo AI Accelerator devices

Seamless interfaces control for two-ways control and data communication with Hailo NN core

Key components:

Runtime Frameworks Integration

  • pyHailoRT – Python API and REST API for load models to Hailo NN core and send & receive data 
  • GStreamer plugin – provides the “hailonet” element which infer GStreamer frames according to the configured network. This element can be used multiple times in a GStreamer pipeline to infer multiple networks in parallel  
  • Onnx Runtime to support infer capability  
  • Rest API compatible with Ollama and OpenAI to load and run GenAI models

Integration Tool

For a verification process of hardware integration with Hailo AI Accelerator

HailoRT CLI

Command Line application for controlling the Hailo device(s), running inferences on the device(s), and collecting inference statistics & device events

HailoRT Library

User-space, runtime, robust, C/C++ API for control and data transfer to/from Hailo devices

Yocto Layer

  • Enables integration of HailoRT into a Yocto build for both Hailo AI Vision Processor and Hailo AI Accelerators
  • Includes recipes for the HailoRT library, pyHailoRT, PCIe driver and NN Core driver

Get Hailo’s Software Downloads and Documentation

Sign in / Sign up is required

Model Zoo

Hailo offers two dedicated Model Zoo repositories to accelerate development across a wide range of AI workloads:

Hailo Model Zoo Vision

The Hailo Model Zoo Vision provides pre-trained deep learning models for various computer vision tasks, enabling rapid prototyping on Hailo devices. The accompanying GitHub repository allows users to easily reproduce Hailo’s published performance benchmarks using commonly supported models and architectures.

Hailo Model Zoo GenAI

The Hailo Model Zoo GenAI features a curated collection of precompiled generative models, including LLMs, VLMs, ASR, Stable Diffusion, and more. These models are accessible via both CLI and REST API, fully compatible with OpenAI and Ollama, making them ideal for fast integration into GenAI workflows.

Explore Hailo models in the Model Zoo and choose the best neural network models for your AI applications

Main features include

A variety of common and state-of-the-art pre-trained models and tasks in TensorFlow and ONNX

Model details, including full precision accuracy vs. quantized model accuracy measured on Hailo devices

A compiled binary HEF file is released for each model, to be loaded by HailoRT and the example applications.

Hailo Model Explorers

To support a wide range of AI applications, Hailo provides two tailored Model Explorer tools—one for Computer Vision models and one for GenAI models—each designed to help developers make informed decisions and accelerate deployment.

Model Explorer Vision

The Model Explorer – Vision enables users to browse, filter, and evaluate deep learning models from the Hailo Model Zoo. It features an interactive interface with filters such as Hailo device type, task, model name, FPS, and accuracy—helping you select the optimal model for your real-time vision workload.

Each model is pre-trained and includes a ready-to-use binary (HEF file), compatible with the Hailo toolchain and application suite. Models are available in TensorFlow and ONNX formats, and can be retrained, compiled, and deployed for rapid prototyping on Hailo devices.

Model selection takes into account multiple factors like inference speed, accuracy, model size, and hardware compatibility. Because inference speed can’t be reliably predicted from static attributes (like FLOPS or parameter count), the Vision Model Explorer offers real hardware benchmarks to guide performance-based decisions.

Model Explorer GenAI

The Model Explorer GenAI allows users to browse precompiled GenAI models by product and task type. All supported models are listed in the Hailo Model Zoo GenAI, which includes everything needed to run them—such as CLI tools, REST APIs, and integration support compatible with OpenAI and Ollama.

Each model is pre-trained and includes a ready-to-use binary (HEF file), compatible with the Hailo toolchain and application suite. Models are available in TensorFlow and ONNX formats, and can be retrained, compiled, and deployed for rapid prototyping on Hailo devices.

Model selection takes into account multiple factors like inference speed, accuracy, model size, and hardware compatibility. Because inference speed can’t be reliably predicted from static attributes (like FLOPS or parameter count), the Vision Model Explorer offers real hardware benchmarks to guide performance-based decisions.

Example Applications

Streamline the Development of Edge AI Applications

Hailo provides a set of reference application examples designed to simplify the development and deployment of edge AI applications. These examples demonstrate how to build real-time pipelines using GStreamer and pre-trained AI tasks, offering a practical starting point for developers working with Hailo’s AI accelerators. Built to highlight the high throughput and power efficiency of Hailo devices, these applications showcase best practices for integrating with the Hailo runtime and system architecture. They serve as ready-to-use templates that can be customized and extended to match specific use cases, helping reduce development time and accelerate time-to-market. 

Object detection

Detection on one video file source by running a single-stream object detection pipeline

Detecting and classifying objects within an image is a crucial task in computer vision, known as object detection. Deep learning models trained on the COCO dataset, which is a popular dataset for object detection, offer varying tradeoffs between performance and accuracy. For instance, by running inference on Hailo-8, the YOLOv5m model achieves 218 FPS and 42.46mAP accuracy, while the SSD-MobileNet-v1 model attains 1055 FPS and 23.17mAP accuracy. The COCO dataset includes 80 unique classes of objects for general usage scenarios, including both indoor and outdoor scenes.

License Plate Recognition (LPR)

Automatic license plate recognition application based on a complex pipeline utilizing model scheduling

License Plate Recognition (LPR) pipeline, also referred to as Automatic Number Plate Recognition (ANPR), is commonly used in the Intelligent Transportation Systems (ITS) market. This example application demonstrates an automatic model switching between 3 different networks in a complex pipeline. Running in parallel YOLOv5m model for vehicle detection, YOLOv4-tiny model for detecting license plate and lprnet model for text extraction.

Read more in our License Plate Recognition AI Blog Post

Multi-stream Object Detection

Detection apps with several available neural networks, delivering unique functionalities and supporting multiple streams

Multi-Stream object detection is utilized in diverse applications across different industries, including complex ones like Smart City traffic management and Intelligent Transportation Systems (ITS). You can either use your own object detection network or rely on pre-built models like YOLOv5m, which are all trained on the COCO dataset. Notably, these models offer unique capabilities such as Tiling, which utilizes Hailo’s high throughput to handle high-resolution images (FHD, 4K) by dividing them into smaller tiles. Processing high-resolution images proves particularly useful in crowded locations and public safety applications where small objects are abound, for instance, in crowd analytics for Retail and Smart Cities, among other use cases.

Multi-Camera Multi-Person Tracking (RE-ID)

Tracking specific objects or people across multiple cameras utilizing model scheduling

Multi-person re-identification across different streams is essential for security and retail applications. This includes the identification of a specific person multiple times, either in a specific location over time, or along a trail between multiple locations. This example application demonstrates NN model switching, of both the YOLOv5s and repvgg_a0_person_reid deep learning models, trained on Hailo’s dataset, in a complex pipeline with inference-based decision-making. This is achieved using the model scheduler, an automatic tool for model switching, which enables processing multiple models simultaneously at runtime.

Semantic Segmentation

Application used for partitioning an image into multiple image segments

Semantic segmentation aims to assign a specific class to each pixel within the input image, and recognize a collection of pixels that form distinct categories. This technique is commonly used for ADAS applications, to enable the vehicle to decide where the road, sidewalk, other vehicles and pedestrians are. It also enhances the detection of defects in quality control through optical inspection applications in industrial automation and enhances the precision of detail detection in medical imaging cameras, retail cameras and more. In this specific setup, the pipeline relies on the Cityscapes dataset, which contains images captured from the perspective of a vehicle’s front-facing camera, encompassing 19 distinct classes. The pre-configured TAPPAS semantic segmentation pipeline showcases the robust computational capacity necessary for handling an FHD input video stream (1080p) while employing the FCN8-ResNet-v1-18 network.

Depth Estimation

Estimates the depth or distance information from a given 2D image or video, providing a perception of the three-dimensional structure

Depth estimation from a single image is achieved by the ability to estimate the depth or distance information from 2D images and turn it into a 3D mapping. It enables automotive cameras to better understand the distance to objects, helps industrial inspection cameras in tasks like defect detection and quality control and can improve the accuracy of person detection for security cameras by proving more detailed spatial information.

In this example, we are using the fast_depth deep learning model, trained on NYUv2 dataset, which predicts a distance matrix (different depth for each pixel) with the same shape of the input frame.

Instance Segmentation

Application identifies, outlines and colors different objects and persons for precise object localization and separation

Instance segmentation task is the process of merging the capabilities of object detection (which includes identifying and categorizing objects) and semantic segmentation (which allocates specific classes to individual pixels) to produce a distinct mask for each object within a given scene. This task becomes especially crucial when bounding boxes lack precision for localization, and when the application requires pixel-level differentiation between objects. This application utilizes either the yolov5seg or YOLACT architectures, and it entails the training of these models using the COCO dataset.

Pose Estimation

Understanding and analyzing human activities or detecting and tracking suspicious or abnormal human poses or movements

Pose estimation is a computer vision technology that detects and tracks human body poses in images or videos. From recognizing emergency situations at home or on the factory floor, to analyzing customer behavior for better business outcomes. It involves localizing the different parts of the human body such as the head, shoulders, arms, legs, and torso, and estimating their positions and orientations in a 3D space. This pipeline includes a combination of centerpose models pre-trained on COCO dataset.

Facial Detection and Recognition

Application utilized for surveillance and security, authentication and access control and human-computer interaction.

Facial detection is a common task of utilizing object detection network for a specific object of faces. The face detection network was trained using the WIDER dataset and its output is the boxes prediction of all the faces in the frame. This application demonstrates how to crop the Region of Interest (ROI) produced by the detector and feed a second network to predict facial landmarks for each predicted face. Facial landmarks are important features in analyzing the face orientation, structure and so on.

Tiling

Allowing to examine specific sections of an image in greater detail without compromising its resolution

To enhance the processing power of Hailo devices for handling large input resolutions, we can divide an input frame into multiple tiles and run an object detector on each tile individually. For instance, consider an object that occupies 10×10 pixels in a 4K input frame. This object will only contain 1 pixel of information for a 640×640 detector, such as YOLOv5m, making it nearly impossible to detect. To address this challenge, we use tiles to divide the input frame into smaller patches and detect the object in each tile without sacrificing information due to resizing. The tiles are identified by blue rectangles, and we utilize a pre-trained SSD-MobileNet-v1 model trained on the VisDrone dataset.

Breathe life into your edge applications with the Hailo AI processors