Over the last decade, deep neural networks have brought in a resurgence in artificial intelligence, with machines outperforming humans in some of the most popular image recognition problems. But all that jazz comes with its costs – high compute complexity and large memory requirements. These requirements translate to higher power consumption resulting in steep electricity bills and a sizeable carbon footprint. Optimizing model size and complexity thus becomes a necessity for a sustainable future for AI.
Memory and compute complexity optimizations also bring in the promise of unimaginable possibilities with edge AI - self-driving cars, predictive maintenance, smart speakers, body monitoring are only the beginning. The smartphone market, with its reach to nearly 4 billion people, is only a fraction of the potential edge devices waiting to be truly ‘smart’. Think smart hospitals or mining, oil and gas industrial automation and so much more.

In this session we will talk about,

  • Challenges in deep neural network (DNN) deployment on embedded systems with resource constraints
  • Quantization, which has been popularly used in mathematics and digital signal processing to map values from a large often continuous set to values in a countable smaller set, now reimagined as a possible solution for compressing DNNs and accelerating inference.
    It is gaining popularity not only with machine learning frameworks like MATLAB, TensorFlow and PyTorch but also amidst hardware toolchains like NVIDIA® TensorRT and Xilinx® DNNDK. The core idea behind quantization is the resiliency of neural networks to noise. Deep neural networks, in particular, are trained to pick up key patterns and ignore noise. This means that the networks can cope with small changes resulting from quantization error, as backed by research indicating minimal impact of quantization on overall accuracy of the network. This, coupled with significant reduction in memory footprint, power consumption, and gains in computational speed, makes quantization an efficient approach for deploying neural networks to embedded hardware.
  • Example of a quantization solution for an object detection problem
 
 

Outline/Structure of the Talk

Ready or not, here I come!

Edge AI and its growing popularity with varied applications

3 mins

But wait – It's not that easy...

Challenges in DNN deployment on embedded systems

5 mins

Quantization to the rescue!

What is quantization and why does it work?

7 mins

What’s the catch?

Trade-offs of memory reduction and inference speedup vs accuracy drop

2 mins

Let the magic begin

Example of a quantization workflow for an object detector with MATLAB’s Model Quantization Library

3 mins

Learning Outcome

The attendees will gain insight on exploring quantization as a means to make their deep learning models more edge friendly. Those new to the topic will learn about the challenges in embedded deployment of deep neural networks and possible solutions.

Target Audience

Machine Learning Engineers, Deep Learning Engineers, Embedded Systems Engineers, ADAS Engineers

Prerequisites for Attendees

NA

schedule Submitted 3 years ago
help