ADaSci Premium Membership fee will be revised from 1st March 2024. Lock your membership for 1 year at current price.

Predicting missing product taxonomy in retail: An embedded approach using N-gram Mixture Models and Newton’s Method

Author(s): Neeraj Mishra, Sanjay Shukla, Anthony Kilili


In retail, taxonomy is a hierarchal and logical arrangement of products such that customers can easily navigate and find what they need in the store or website. Taxonomists, information scientists, and linguistics experts all collaborate to build an effective taxonomy. Clearly, this requires a lot of resources in terms of time and effort. It is not always feasible for companies to put these resources for all the products, especially newly launched products. In this research, we have developed a novel machine learning algorithm to predict a product’s taxonomy by leveraging N-gram Mixture Model, cross-entropy function, and Newton’s optimization method. A modified Naïve Bayes and up to 4-gram models are combined with general heuristics inspired by Jaccard Similarity. A One-vs-all classifier is trained with weights for combining different n-gram models and heuristics scores using the cross-entropy loss function and Newton’s optimization method. This model is developed and tested on online retail data. The model predicts the correct product taxonomy in 84% of the cases using online retail data.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists

Explore more from Association of Data Scientists