ADaSci Banner 2024

Weighted clustering on fast sentence embeddings to determine themes from large unstructured data

Author(s): Paritosh Sinha, Mohan Krishna Askani


Most engineering product improvements are driven based on feedback from users and engineers. B2C products, such as the ones used to target customers or send personalised communications or manage order requests, track event-level actions and failures to improve product performance. However, the volume of failure logs (often in the order of a billion) and their unstructured nature (machine logs with minimal friendliness for human understanding) often hinder the detection of underlying themes from event failures. This paper discusses a unique and highly efficient approach to tune and leverage a language model for embedding generation. Using a weighted clustering technique, the embeddings are subsequently used to group failures into auto-detectable themes. The paper also proposes distinctive methods to manage embeddings that help improve the algorithm’s performance, while retaining its focus on efficiency and computation time. Our experiments show that the proposed technique provides similar performance to the latest language models while taking less than one-tenth of the overall computation time.

Picture of Association of Data Scientists

Association of Data Scientists

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.