The journey of a cognitive solution is meaningful when it’s put to use or can actually solve business problems in real-time through inference. Deep Learning model Inference is as important as model training, and especially when it comes to deploying cognitive solutions on edge, inference becomes a lot more critical as it also controls the performance and accuracy of the implemented solution. For a given computer vision application, once the deep learning model is trained, the next step would be to ensure it is deployment/production ready, which requires the application and model to be efficient and reliable. It’s very essential to maintain a healthy balance between model performance/accuracy and inference time. Inference time decides the running cost for “on the cloud” solutions, and cost-optimal “on edge” solutions come with processing speed and memory constraints, so it’s important to have memory optimal and real-time (lower processing time) deep learning models. With the rising use of Augmented Reality, Facial Recognition, Facial Authentication and Voice assistants that require real-time processing, developers are looking for newer and more effective ways of reducing the size/memory and amount of compute required for the application of neural networks.