Smart Retail on the Edge: Why Powerful Intelligent Vision is the Future of Brick & Mortar Stores

Hailo Team

7月 1st, 2021

Brick-and-mortar retail has been facing tough competition from eCommerce, with the COVID-19 tailwind reinforcing the growing transition from brick-and-mortar to eCommerce. Smart Retail solutions and technologies are helping physical stores compete against online ones by leveling the playing field. Digitalizing and deploying AI for automation and advanced analytics creates a physical shopping experience that is on par, if not better, with the entirely digital one, with benefits both to the customer and the retailer.

The customer enjoys an improved in-store experience – a walk through a streamlined, easy to navigate store, a no-queue “Just Walk Out” automated checkout and personalized offers and functions. It is very similar to shopping online, but with the added benefits of being able to see and touch the products. The experience may also be a multi-channel one, in which customers can choose between shopping in-store and ordering a delivery or in-store pickup and connect to online resources while in-store for navigation, additional information, customization, digital try-on and more.

For the retailer, intelligent automation and in-store analytics translate into operational optimization and cost savings. AI is used to increase customer traffic, improve loss prevention measures, and optimize store layout, shelves and displays, inventory and personnel work. It provides the data and insight that are used, among other things, to increase footfall, better target promotions and nurture long-term relationships with customers.

In Smart Retail, Intelligent Vision Is King

While there are many types of sensors used in Smart Retail and IoT solutions (e.g., RF, motion, pressure, temperature etc.), vision is the basis and leading solution for robust, reliable and accurate data acquisition in complex systems and environments. For most analytics and automation applications in physical and multi-channel retail, you will need intelligent vision.

RF tags or ESL (electronic shelf labels), require much more granular treatment while a single camera can cover dozens of products without any intervention. Also, they are not standalone solutions for an automated checkout. An RF tag on a product may be enough for the system to identify the product and register its removal from the shelf and exit out of the door (in a customer’s hands). However, a system based entirely on RF tags will most likely be limited in its ability to prevent shoplifting (RF tags can be taken off a product) and its management of the sale of restricted or controlled substances (like tobacco, alcohol and pharmaceutical). Most importantly, it will provide a fraction of the data a vision sensor would. Customer demographics, the shopper’s journey to the purchased product, their behavior and reactions, shelf and display status and effectiveness – all of these will require input from elsewhere.

Beyond sales and analytics, automation of physical tasks introduces intelligent machines that rely on high-quality low-latency vision and AI processing to navigate and function. As in manufacturing and automotive, so it is in retail. Any robotic solution in the warehouse or automated arm or conveyor belt at a supermarket checkout need to see what they are moving and where they are moving. Autonomous machines include plenty of sensors, but they rely on cameras.

This is not to say that vision and vision alone is always enough. Other sensors are useful to enrich, augment and cross-reference the data received from video, especially in environments where vision has significant limitations.

Intelligent Vision at the Edge of Retail

The clear business benefits of AI-based video analytics have made retailers early adopters of AI in general. The amazing developments in edge AI technology have been making vision increasingly smarter. We went from being able to run AI-based video processing on large servers in the cloud with high latency and at a significant expense, to deploying cost-effective devices that analyze and extract rich insights on the spot and in real time. A cloud deployment of analytics requires storage, analysis and bandwidth, which could add high costs if one were to deliver raw video content to the cloud. Also important is a retailer’s ability to protect customer data. Keeping things local and under the retailer’s control and privacy policy (as opposed to transmitting, storing and analyzing the data with a third-party on public cloud) has its advantages for managing data privacy regulations and concerns.

As a result, the retail field is experiencing a renaissance of edge-based intelligent devices – a convergence of the decrease in price and increase in availability of edge AI processing and software with retailers’ constant pursuit of growth and operational optimization.

Largely vision-based Smart Store solutions have been increasingly appearing in various market verticals in recent years, with grocery leading the way. The most robust and exciting ones are seamless checkout solutions:

Just-walk-out stores and kiosks – also referred to as pick-and-pay or pick-and-go. This is usually a relatively small convenience store that allows the customer to use her mobile phone to enter the store, pick out the products she wants and, as the name suggests, just walk out with them, to be billed automatically. These systems are camera-based, sometimes in combination with other sensors. Cameras monitor shoppers and products and appear to be shoplifting-proof, as many who have tried to trick the system on various occasions can attest. Amazon Go is the most well-known, but many other grocery and convenience chains have been partnering with AI companies to create their own.
Smart shopping carts – The cart monitors the products inserted into it (or taken out) and allows on-cart payment (whether through the mobile app or an onboard POS device). The central sensor is an intelligent camera that identifies the products as they are placed in the cart. Some carts add pressure sensors at the bottom for cross-referencing weight data with the visual input. The customer interacts with the shopping cart through a mobile app or an on-cart touchscreen.
Intelligent checkout – a counter, kiosk or gate at the store exits that visually scans the items in a shopper’s cart, basket or placed on a conveyor belt. The smart camera detects and recognizes the products and POS device or mobile app bill the purchase.

Besides these Smart Store and Smart Checkout solutions, vision is central in smart in-store advertising (for instance in signage and displays that react to the demographic data of the shopper standing in front of them) and AR/VR (Augmented or Virtual Reality) applications such as Smart Mirrors and virtual try-on.

The High-Performance, High-Efficiency Edge of Retail

AI in retail requires a lot of compute due to the amounts of video that needs to be processed and because of the high complexity of the analytics. In this regard, the pivotal advance in edge AI technology is the appearance of high-power efficiency AI processors. These provide high throughput in a small form factor and at low power consumption, so they can fit into an edge device (read more about the why power efficiency is important at the edge in this white paper).

In practical terms, high AI processing throughput and high-power efficiency in an edge device for Smart Retail and intelligent analytics applications allow to:

Connect multiple cameras to a single computing device (like an NVR (Network Video Recorder) or Edge AI Box) and perform AI processing for automation and analytics tasks on all video streams simultaneously and in real time. The number of cameras this is possible for would depend on the camera resolution and frame rate and the neural networks models used. All else being equal, the more powerful the AI processor inside a dedicated aggregation point is, the more cameras it can take on. That one device can process multiple cameras has significance both for system-level costs and for the quality of coverage on the trade floor.
Extract more data from a single camera or video stream:
- High throughput enables processing high-resolution video in real time, which enables detection and recognition of the smallest objects in a large, busy frame. This capability is seminal to the accurate detection of a small product on the shelf or in a customer’s hands.
- It also supports several tasks or types of analytics that run simultaneously. This is another important capability for something like a cashierless store. At minimum, you would like to have one camera, whether overhead or on-shelf, that both tracks the customer’s movements, detects, identifies and tracks the product. If this is an overhead camera, you might also want to count the customers in its FOV and alert about spills in the aisle.
- Run complicated tasks that require multiple neural networks – to take one example, “tracking customer behavior” is a complex, multi-neural network model task. Alongside detecting a person, the system needs to detect and recognize direction and speed of movement, body posture and position, which way the customer is facing, possibly their countenance or gaze (for context). We might also require context to classify the behavior or draw valuable insights, for instance where the person is in the store and what item or other person are in his vicinity.
Fuse data from multiple sensors – as we have established, many Smart Retail solutions combine different kinds of sensors. In such systems, the video processing is a part of a larger data pipeline, so the output from every frame of video processed might need to be cross-referenced or combined with RF, pressure or location data, for example. Like with multiple video streams, a powerful and efficient AI processor can process multiple data streams simultaneously and in real-time.

For retailers, these powerful capabilities translate into better business outcomes and faster ROI. They do not have to compromise on their in-store video analytics requirements anymore – they can both perform better loss prevention and enjoy robust customer analytics. The latter facilitates more accurate targeting, better customer experience and a higher level of operational optimization. Retailers are able to spend less on the same processing power. They can automate their checkout or warehouse without erecting mighty power-hungry servers at every location, while also saving on video storage and bandwidth costs (video is processed locally and in real time) and mitigating data security and privacy concerns (no video recording, transmission, or use of public cloud services).