Hailo AI
Software Suite

概要

Hailoのデバイスには、ディープラーニングモデルのコンパイルと、本番環境でのAIアプリケーションの実装を可能にする総合的なAIソフトウェアスイートが付属しています。モデル構築環境を共通のMLフレームワークにシームレスに統合し、既存の開発エコシステムへのスムーズかつ容易な統合を実現します。ランタイム環境が、Hailo-8™の利用時にx86やARMベースの製品などホストプロセッサ、またビジョンプロセッサ「Hailo-15™」への統合をして、高性能で、よりスマートなAI製品を実現することができます。

Hailo社のソフトウェアスイートには、以下の主要なコンポーネントが含まれています。

Model Build Environment

Model Build Computer
Machine Learning Frameworks
User Models
Hailo Model Zoo Dataflow Compiler

Model Zoo, 一般的に知られる多くの最先端のトレーニング済みディープラーニングおよびタスクで、Tensor Flow およびONNXに対応

データフローコンパイラ, 既存のモデルとシームレスに統合され、Hailoのデバイス向けにオフラインでコンパイルを行う

Runtime Environment

TAPPAS, 実用的で高品質の推論パイプラインと、高性能な事前訓練済みAIアプリケーションを備えたタスクテンプレート

HailoRT, ランタイムソフトウェアパッケージ。ホストプロセッサで実行し、Dataflow Compilerがコンパイルしたディープラーニングモデルをリアルタイムで推論

AI Vision Processor

Hailo-15

TAPPAS, set of full application examples, implementing pipeline elements and pre-trained AI tasks.

HailoRT, production-grade and light runtime software package running on Hailo-15™ for real-time inferencing the deep learning models compiled by the Dataflow Compiler.

AIプロセッサ「Hailo-8」と「Hailo-15」は、エッジ上で画期的高性能AIを可能にします。

データフローコンパイラ

すべてを備えた拡張性の高いソフトウェアツールチェーン

Hailoデバイスには、高度に洗練されたデータフローコンパイラが準備され提供されます。このコンパイラは既存のディープラーニング開発フレームワークとシームレスに統合され、既存の開発エコシステムとスムーズかつ容易に統合することができます。

展開フローツールチェーンの全機能:

業界標準フレームワークからHailo実行形式へのモデル変換

最先端量子化技術によりモデルの最適化

FPS、レイテンシ、消費電力等ユーザーの要求を適切に実現する自動リソース割り当て機能

高度に準備された専用のコンパイラによるHailoバイナリへのモデルコンパイル

バイナリデータをHailoデバイスにロードする事で準備される推論実行環境

既存環境との統合を容易にするTensorflow統合推論とデバイス上で単独実行する
スタンドアロン推論の両方に対応

解析/デバッグツール:

チップの動作をビット単位で正確に再現するエミュレータ

推論実行前に性能推定が出来るプロファイラ (FPS、消費電力、レイテンシなど)

Find the best model to run on your Hailo device with the new Model Explorer

HailoRT

ランタイムソフトウェアスイート

HailoRTは、ホストプロセッサー上で動作する商用化品質で軽量かつ拡張可能なランタイムソフトウェアで、AIアプリケーションのパイプラインを容易に素早く構築することが出来る等、評価, 試作、量産などの幅広い用途に活用頂けます。ホストプロセッサー環境に実装された1個あるいは複数のHailoデバイスはHailoRTにより卓越した高性能で推論を実行いたします。
又, HailoRT はHailo GitHubからオープンソースソフトウェアとして入手頂けます。

HailoRTの主な機能:

複数のホストアーキテクチャに対応 – x86およびARMアーキテクチャ両方に対応

AIアプリケーションのための柔軟なインターフェース – C/C++およびPythonインターフェース

デバイスとパイプラインの容易な統合 – GStreamerフレームワーク用のプラグイン実装

複数ストリーム – 複数ビデオストリームを同時処理

最大16台までの Hailo-8TM  デバイスを接続して高スループットの推論を行うことが可能

シームレスなHailoデバイス制御 – 複数 Hailo-8 デバイス間の双方向コントロールとデータコミュニケーション対応のインターフェース

HalioRTの主要コンポーネント:

Runtime Frameworks Integration

  • pyHailoRT   –  Hailo-8TMデバイスへのモデルのロードやデータ送受信を行うためのPython API
  • Gstreamerプラグイン  – 形成されたネットワークに対してGStreamerフレームで推論を行う「hailonet」エレメントを提供。複数ネットワークを並列に推論実行する際には、このエレメントはGStreamerパイプライン内で複数回使用されます。

Integration Tool

統合ツール – Hailo-8TM M.2&Hailo-8TM mPCIeモジュールのハードウェア統合検証ツール

HailoRT CLI

HailoRT CLI – Hailoデバイスの制御やデバイス上での推論の実行、推論結果の集計、デバイスイベント等を制御する、コマンドラインアプリケーション

HailoRT Library

HailoRT Library – Hailoデバイスの制御とデータ送受信のためのユーザーライブラリ。堅牢で使い易いAPI (C/C++)
PCIeドライバ – 外部カーネルモジュール。LinuxおよびWindowsで利用可能。DKMSフレームワークを使用してインストール。

Yocto レイヤー

HailoのソフトウェアをYocto環境(Zeus、Dunfell、Hardknott、Gatesgarth)に実装

HailoRT library、pyHailoRTおよびPCIeドライバのレシピを含む

Get Hailo’s Software Downloads and Documentation

Sign in / Sign up is required

Model Zoo

The Hailo Model Zoo provides deep learning models for various computer vision tasks. The pre-trained models can be used to create fast prototypes on Hailo devices.

Additionally, the Hailo Model Zoo Github repository provides users with the capability to quickly and easily reproduce the Hailo-8’s published performance on the common models and architectures included in our Model Zoo.

Main features include

A variety of common and state-of-the-art pre-trained models and tasks in TensorFlow and ONNX

Model details, including full precision accuracy vs. quantized model accuracy measured on Hailo-8

Each model also includes a binary HEF file that is fully supported in the Hailo toolchain and Application suite (for registered users only)

Model Explorer

The Hailo Model Explorer is a dynamic tool designed to help users explore the models on the Model Zoo and select the best NN models for their AI applications.

The Model Zoo gives users the capability to quickly and easily reproduce the Hailo published performance on the common models and architectures included in our Model Zoo and retrain these models. The collection encompasses both common and state-of-the-art models available in TensorFlow and ONNX formats.
The pre-trained models can be used for rapidly prototyping on Hailo devices and each model is accompanied by a binary HEF file, fully supported within the Hailo toolchain and Application suite (accessible to registered users only).

Selecting an appropriate model to use in your application can be challenging due to various factors like inference speed, model availability, desired accuracy, licensing, and more. Inference speed is unique since it cannot be easily estimated without the underlying hardware used.
Unfortunately, no single intrinsic model attribute (e.g., FLOPS, parameters, size of activation maps. etc.) is a reliable predictor for inference speed and, to complicate things further, different hardware architectures have different optimal workloads. While it is possible to measure the inference time for each model, it can be tedious and time consuming.
The Model Explorer below was designed to helps users in making better-informed decisions, ensuring maximum efficiency on the Hailo platform. The Model Explorer offers an interactive interface with filters based on Hailo device, tasks, model, FPS, and accuracy, allowing users to explore numerous NN models from Hailo’s vast library.

Read more about how the Hailo Model Zoo works and what it can do on the Hailo Blog.

Find the best model to run on your Hailo device with the new Model Explorer

TAPPAS

Template APPlications And Solutions

TAPPAS (Template Applications And Solutions) は、業界をリードする高性能なHailo-8™AI プロセッサをベースに実用的で高品質の推論パイプラインと、高性能な事前訓練済みAIアプリケーションを備えたタスクテンプレートです。システムへの開発と展開を容易にします。ニューラルネットワークを実行した最高クラスの処理能力により、非常に低い消費電力を実現します。ユーザーにとって、スムーズにHailo-8へと展開を進めることが可能で、使いやすいPythonベースのテンプレートになります。事前にトレーニング済みアプリケーションと開発環境を利用することで、お客様の開発の時間と労力を削減し、市場投入までの時間を短縮することが可能です。様々なオペレーショナルモデルを提供しており、既存モデルへの展開を容易にします。

Object detection

Detection on one video file source by running a single-stream object detection pipeline

Detecting and classifying objects within an image is a crucial task in computer vision, known as object detection. Deep learning models trained on the COCO dataset, which is a popular dataset for object detection, offer varying tradeoffs between performance and accuracy. For instance, by running inference on Hailo-8, the YOLOv5m model achieves 218 FPS and 42.46mAP accuracy, while the SSD-MobileNet-v1 model attains 1055 FPS and 23.17mAP accuracy. The COCO dataset includes 80 unique classes of objects for general usage scenarios, including both indoor and outdoor scenes.

License Plate Recognition (LPR)

Automatic license plate recognition application based on a complex pipeline utilizing model scheduling

License Plate Recognition (LPR) pipeline, also referred to as Automatic Number Plate Recognition (ANPR), is commonly used in the Intelligent Transportation Systems (ITS) market. This example application demonstrates an automatic model switching between 3 different networks in a complex pipeline. Running in parallel YOLOv5m model for vehicle detection, YOLOv4-tiny model for detecting license plate and lprnet model for text extraction.

Read more in our blog post

Multi-stream Object Detection

Detection apps with several available neural networks, delivering unique functionalities and supporting multiple streams

Multi-Stream object detection is utilized in diverse applications across different industries, including complex ones like Smart City traffic management and Intelligent Transportation Systems (ITS). You can either use your own object detection network or rely on pre-built models like YOLOv5m, which are all trained on the COCO dataset. Notably, these models offer unique capabilities such as Tiling, which utilizes Hailo-8’s high throughput to handle high-resolution images (FHD, 4K) by dividing them into smaller tiles. Processing high-resolution images proves particularly useful in crowded locations and public safety applications where small objects are abound, for instance, in crowd analytics for Retail and Smart Cities, among other use cases.

Multi-Camera Multi-Person Tracking (RE-ID)

Tracking specific objects or people across multiple cameras utilizing model scheduling

Multi-person re-identification across different streams is essential for security and retail applications. This includes the identification of a specific person multiple times, either in a specific location over time, or along a trail between multiple locations. This example application demonstrates NN model switching, of both the YOLOv5s and repvgg_a0_person_reid deep learning models, trained on Hailo’s dataset, in a complex pipeline with inference-based decision-making. This is achieved using the model scheduler, an automatic tool for model switching, which enables processing multiple models simultaneously at runtime.

Semantic Segmentation

Application used for partitioning an image into multiple image segments

Semantic segmentation aims to assign a specific class to each pixel within the input image, and recognize a collection of pixels that form distinct categories. This technique is commonly used for ADAS applications, to enable the vehicle to decide where the road, sidewalk, other vehicles and pedestrians are. It also enhances the detection of defects in quality control through optical inspection applications in industrial automation and enhances the precision of detail detection in medical imaging cameras, retail cameras and more. In this specific setup, the pipeline relies on the Cityscapes dataset, which contains images captured from the perspective of a vehicle’s front-facing camera, encompassing 19 distinct classes. The pre-configured TAPPAS semantic segmentation pipeline showcases the robust computational capacity necessary for handling an FHD input video stream (1080p) while employing the FCN8-ResNet-v1-18 network.

Depth Estimation

Estimates the depth or distance information from a given 2D image or video, providing a perception of the three-dimensional structure

Depth estimation from a single image is achieved by the ability to estimate the depth or distance information from 2D images and turn it into a 3D mapping. It enables automotive cameras to better understand the distance to objects, helps industrial inspection cameras in tasks like defect detection and quality control and can improve the accuracy of person detection for security cameras by proving more detailed spatial information.

In this example, we are using the fast_depth deep learning model, trained on NYUv2 dataset, which predicts a distance matrix (different depth for each pixel) with the same shape of the input frame.

Instance Segmentation

Application identifies, outlines and colors different objects and persons for precise object localization and separation

Instance segmentation task is the process of merging the capabilities of object detection (which includes identifying and categorizing objects) and semantic segmentation (which allocates specific classes to individual pixels) to produce a distinct mask for each object within a given scene. This task becomes especially crucial when bounding boxes lack precision for localization, and when the application requires pixel-level differentiation between objects. This application utilizes either the yolov5seg or YOLACT architectures, and it entails the training of these models using the COCO dataset.

Pose Estimation

Understanding and analyzing human activities or detecting and tracking suspicious or abnormal human poses or movements

Pose estimation is a computer vision technology that detects and tracks human body poses in images or videos. From recognizing emergency situations at home or on the factory floor, to analyzing customer behavior for better business outcomes. It involves localizing the different parts of the human body such as the head, shoulders, arms, legs, and torso, and estimating their positions and orientations in a 3D space. This pipeline includes a combination of centerpose models pre-trained on COCO dataset.

Facial Detection and Recognition

Application utilized for surveillance and security, authentication and access control and human-computer interaction.

Facial detection is a common task of utilizing object detection network for a specific object of faces. The face detection network was trained using the WIDER dataset and its output is the boxes prediction of all the faces in the frame. This application demonstrates how to crop the Region of Interest (ROI) produced by the detector and feed a second network to predict facial landmarks for each predicted face. Facial landmarks are important features in analyzing the face orientation, structure and so on.

Tiling

Allowing to examine specific sections of an image in greater detail without compromising its resolution

To enhance the processing power of Hailo devices for handling large input resolutions, we can divide an input frame into multiple tiles and run an object detector on each tile individually. For instance, consider an object that occupies 10×10 pixels in a 4K input frame. This object will only contain 1 pixel of information for a 640×640 detector, such as YOLOv5m, making it nearly impossible to detect. To address this challenge, we use tiles to divide the input frame into smaller patches and detect the object in each tile without sacrificing information due to resizing. The tiles are identified by blue rectangles, and we utilize a pre-trained SSD-MobileNet-v1 model trained on the VisDrone dataset.

HailoのAIプロセッサを使って、エッジアプリケーションに新たな息吹を与えてみましょう