The Hailo Architecture – Hailo

Hailo Drives Toward a Deep Learning Solution

For 70 years, computer processors have been based on the classic, rule-based Von Neumann architecture that is rapidly approaching its limits. CPUs, GPUs, DSPs, and even accelerators sitting on a CPU bus are all variants of the Von Neumann architecture. Neural networks have unique characteristics that require a new type of processing.

To truly achieve an effective solution, we must reinvent a new processing architecture which will be based on new concepts, building blocks and different interaction between the hardware and software. Hailo's novel architecture takes a clean-slate approach that does not rely on the traditional CPU/GPU building blocks. Hailo has designed a specialized technology stack and created a domain-specific processor that vastly outperforms Van Neumann architecture.

Hailo’s Structure-Defined Dataflow Architecture

Hailo delivers a structure-defined dataflow architecture based on multiple innovations, all targeting the fundamental properties of NNs:
  • Distributed memory fabric combined with purpose-made pipeline elements, allowing very low power memory access.
  • Novel control scheme based on a combination of hardware and software, reaching very low joules/operation with a high degree of flexibility.
  • Extremely efficient computational elements which can be variably applied according to need.
  • Dataflow-oriented interconnect that adapts according to the structure of the NN and allows high resource utilization.
  • A full-stack software toolchain co-designed with the hardware architecture that enables efficient deployment of NNs developed on industry- standard frameworks.

Neural Networks Basic Observations

Our thought process in designing a brand new type of processing architecture for NNs, started with taking a close look on NNs and crystallizing their fundamental characteristics, from processing perspective.
This process highlights significant insights to why NN processing is fundamentally different from classic rule-based code processing, and provides hints to a architecture design significantly more efficient for the NN domain.
Resource Balance The balance between memory, control and compute changes dramatically along the neural network’s layers, in fact, by orders of magnitude. The “one size fits all” approach of standard processors is terribly inefficient in such a dynamic environment. Interconnect Optimizing each part of the neural net’s functions separately is relatively easy, but to interconnect everything is much more complex. Compute NNs utilize the multiply-and-accumulate operation extensively as well as non-linear activation functions. Building dedicated and optimized compute engines is expected. Memory In a neural network, the weights and partial sums of the calculations are highly localized on average. In most cases, memory is shared between adjacent layers and must be accessed very frequently. This trait of high-bandwidth, localized memory doesn’t sit well with the Von- Neumann architecture concept of narrow and deep central memory and that creates a bottleneck. Significant improvements can be made by advancing to a distributed memory-based architecture. Control While classic software is built with many branches and requires the processing to provide full control flexibility during run time, neural networks (NNs) are fully deterministic in terms of the control. The power- hungry requirement for run-time flexible control can be removed.

Subscribe to our newsletter