Video Management System (VMS)
What is a Video Management System?
A Video Management System, also known as VMS, collects inputs from a number of cameras and other sensors and addresses all related aspects of video handling, such as storage, retrieval, analysis and display.
VMS is typically used for security and surveillance, enhancing personal safety while maintaining individual privacy.
Traditionally, the analysis of live video streams used to be manual, relying on human perception for visual identification of events happening in each feed. Nowadays, deep learning is enabling the automation of the analytics task, thereby allowing for easier scalability, and improvement in overall performance.
To enable real-time and accurate video analytics, VMS systems require high-performance computing. AI is transformational to this market as it allows faster and more accurate event identification at a lower cost.
Hailo enables advanced analytics on the edge for improved performance at a lower Total Cost of Ownership:

Low latency and higher frame rates enable timely video processing for detection and search on video streams

With more compute per unit price that is translated into more complexity per application, more applications per stream and more streams per platform, as well as significant saving on streaming bandwidth and storage space thanks to event-based recording

Improved performance by lowering the chances for false alerts and missed detections

Video analytics on the edge means that personally identifiable information (PII) does not need to be transmitted and stored in the cloud

A comprehensive software suite which supports a wide range of platforms, operating systems, and neural network models

Products designed to withstand harsh conditions, and conform with industrial operating conditions
Video Analytics Layers
The first layer of intelligent video analytics is scene understanding and metadata extraction for the purpose of both real-time response as well as long term storage for future search and analysis. Multiple neural networks can be deployed for this phase, leveraging advanced algorithms to perform deep-learning tasks. These tasks serve as the foundation on which more complex tasks identify pre-defined events and trigger specific response.
Object Recognition
Used to identify a specific object or a class of objects within a frame and could be used
to distinguish between classes of objects
Counting
Used to count the number of objects in a specific area, especially in places where occupancy is limited and has safety implications
Density Estimation
Used to identify an evolving situation. Unlike counting, this is relevant in cases in which
the exact number of objects is not required but rather an overall density
Object Attributes
Assist in unique identification of a person or an object in a specific scene,
or re-identification of the same unique entity in the same scene over time, or in multiple locations
Gesture Estimation
Analysis of a sequence of gestures which enables interpretation of specific behaviors
for the purpose of behavioral analysis
Distance Measurement
Used to calculate the distance between two or more people or objects, based on the accurate 3D location of each entity in a defined space
The second layer of video analytics is the real-time event triggering based on insights from the detection phase. In this phase pre-defined events will trigger a specific response, such as setting off an alarm or a call to action for security personnel and/or first responders.
Common applications:
Crowd Management
Based on people and vehicle counting, and density estimation, the accumulation of large number of entities creates an event in a specific location which may trigger a response or call to action such as load balancing and dynamic traffic management
Perimeter Protection
Face and person attributes as well as gesture estimation and distance measurement, can all serve to secure and protect an area with restricted access or an unsafe zone
Social Distancing
Used to detect whether people maintain social distancing in public places, supporting health authorities in their effort to contain a pandemic
Behavioral Analysis
Used to recognize distress signals and trigger an alert to call first responders to the scene as soon as possible, and much faster than a call to 911 would
Lost Person / Unattended Object
Helpful for security personnel searching for lost people/luggage, or tracking down suspects, based on property indexing and gesture estimation
License Plate Recognition (LPR)
Used for access control and billing in parking lots or garage
The third layer in video analytics is indexing, storage and retrieval of metadata.
Artificial intelligence is leveraged to cost-efficiently index and store relevant information from the video streams.
Deep insights can be extracted by exploring patterns over long-term observations and joint analysis of multiple points of view. This can be employed for the following purposes:
Indexing & Recording
To efficiently manage the data and reduce communication and storage costs, artificial intelligence is used to differentiate between meaningful events and background footage. This enables the system to record and store only significant events as defined by the user
Summarization
Used to create an edited and concise version of the camera input, which only includes the significant events and insights, while cropping out “dead footage”
Data Extraction
Data extraction is the act of Identifying patterns in stored data, based on the metadata extracted from the video stream. The better, and more elaborate this metadata is, the better the results and insights will be. The quality of the metadata relies on the quality of the analytics, and this is where advanced algorithms exhibited by deep learning, come into play