Efficient and Optimal Deep Learning Inference for Computer Vision Applications

Author(s): Venkatesh Wadawadagi

Abstract

The journey of a cognitive solution is meaningful when it’s put to use or can actually solve business problems in real-time through inference. Deep Learning model Inference is as important as model training, and especially when it comes to deploying cognitive solutions on edge, inference becomes a lot more critical as it also controls the performance and accuracy of the implemented solution. For a given computer vision application, once the deep learning model is trained, the next step would be to ensure it is deployment/production ready, which requires the application and model to be efficient and reliable. It’s very essential to maintain a healthy balance between model performance/accuracy and inference time. Inference time decides the running cost for “on the cloud” solutions, and cost-optimal “on edge” solutions come with processing speed and memory constraints, so it’s important to have memory optimal and real-time (lower processing time) deep learning models. With the rising use of Augmented Reality, Facial Recognition, Facial Authentication and Voice assistants that require real-time processing, developers are looking for newer and more effective ways of reducing the size/memory and amount of compute required for the application of neural networks.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Explore more from Association of Data Scientists

Become ADaSci Chapter Lead

As a chapter lead, you will have the opportunity to connect with fellow data professionals in your area, share knowledge and resources, and work together to advance the field of data science.