Predicting missing product taxonomy in retail: An embedded approach using N-gram Mixture Models and Newton’s Method

Author(s): Neeraj Mishra, Sanjay Shukla, Anthony Kilili

Abstract

In retail, taxonomy is a hierarchal and logical arrangement of products such that customers can easily navigate and find what they need in the store or website. Taxonomists, information scientists, and linguistics experts all collaborate to build an effective taxonomy. Clearly, this requires a lot of resources in terms of time and effort. It is not always feasible for companies to put these resources for all the products, especially newly launched products. In this research, we have developed a novel machine learning algorithm to predict a product’s taxonomy by leveraging N-gram Mixture Model, cross-entropy function, and Newton’s optimization method. A modified Naïve Bayes and up to 4-gram models are combined with general heuristics inspired by Jaccard Similarity. A One-vs-all classifier is trained with weights for combining different n-gram models and heuristics scores using the cross-entropy loss function and Newton’s optimization method. This model is developed and tested on online retail data. The model predicts the correct product taxonomy in 84% of the cases using online retail data.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Explore more from Association of Data Scientists

Become ADaSci Chapter Lead

As a chapter lead, you will have the opportunity to connect with fellow data professionals in your area, share knowledge and resources, and work together to advance the field of data science.