Quantization To The Rescue: An Edge AI Story
Over the last decade, deep neural networks have brought in a resurgence in artificial intelligence, with machines outperforming humans in some of the most popular image recognition problems. But all that jazz comes with its costs – high compute complexity and large memory requirements. These requirements translate to higher power consumption resulting in steep electricity bills and a sizeable carbon footprint. Optimizing model size and complexity thus becomes a necessity for a sustainable future for AI.
Memory and compute complexity optimizations also bring in the promise of unimaginable possibilities with edge AI - self-driving cars, predictive maintenance, smart speakers, body monitoring are only the beginning. The smartphone market, with its reach to nearly 4 billion people, is only a fraction of the potential edge devices waiting to be truly ‘smart’. Think smart hospitals or mining, oil and gas industrial automation and so much more.
In this session we will talk about,
- Challenges in deep neural network (DNN) deployment on embedded systems with resource constraints
- Quantization, which has been popularly used in mathematics and digital signal processing to map values from a large often continuous set to values in a countable smaller set, now reimagined as a possible solution for compressing DNNs and accelerating inference.
It is gaining popularity not only with machine learning frameworks like MATLAB, TensorFlow and PyTorch but also amidst hardware toolchains like NVIDIA® TensorRT and Xilinx® DNNDK. The core idea behind quantization is the resiliency of neural networks to noise. Deep neural networks, in particular, are trained to pick up key patterns and ignore noise. This means that the networks can cope with small changes resulting from quantization error, as backed by research indicating minimal impact of quantization on overall accuracy of the network. This, coupled with significant reduction in memory footprint, power consumption, and gains in computational speed, makes quantization an efficient approach for deploying neural networks to embedded hardware.
- Example of a quantization solution for an object detection problem
Outline/Structure of the Talk
Ready or not, here I come! Edge AI and its growing popularity with varied applications |
3 mins |
But wait – It's not that easy... Challenges in DNN deployment on embedded systems |
5 mins |
Quantization to the rescue! What is quantization and why does it work? |
7 mins |
What’s the catch? Trade-offs of memory reduction and inference speedup vs accuracy drop |
2 mins |
Let the magic begin Example of a quantization workflow for an object detector with MATLAB’s Model Quantization Library
|
3 mins |
Learning Outcome
The attendees will gain insight on exploring quantization as a means to make their deep learning models more edge friendly. Those new to the topic will learn about the challenges in embedded deployment of deep neural networks and possible solutions.
Target Audience
Machine Learning Engineers, Deep Learning Engineers, Embedded Systems Engineers, ADAS Engineers
Prerequisites for Attendees
NA