Generative AI Crash Course for Non-Tech Professionals. Register Now >

Predicting missing product taxonomy in retail: An embedded approach using N-gram Mixture Models and Newton’s Method

Author(s): Neeraj Mishra, Sanjay Shukla, Anthony Kilili


In retail, taxonomy is a hierarchal and logical arrangement of products such that customers can easily navigate and find what they need in the store or website. Taxonomists, information scientists, and linguistics experts all collaborate to build an effective taxonomy. Clearly, this requires a lot of resources in terms of time and effort. It is not always feasible for companies to put these resources for all the products, especially newly launched products. In this research, we have developed a novel machine learning algorithm to predict a product’s taxonomy by leveraging N-gram Mixture Model, cross-entropy function, and Newton’s optimization method. A modified Naïve Bayes and up to 4-gram models are combined with general heuristics inspired by Jaccard Similarity. A One-vs-all classifier is trained with weights for combining different n-gram models and heuristics scores using the cross-entropy loss function and Newton’s optimization method. This model is developed and tested on online retail data. The model predicts the correct product taxonomy in 84% of the cases using online retail data.

Picture of Association of Data Scientists

Association of Data Scientists

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.