Upskill your Team on Generative AI. Start here >

Time Expression Extraction and Normalization in Industrial Setting

Author(s): Piyush Arora, Bharath Venkatesh, Salil Rajeev Joshi, Rahul Ghosh

Abstract

We present TEEN, an industry-grade solution to the problem of time expression extraction and normalization (Timex). Extraction and normalization of temporal units is a challenging problem due to several factors, e.g., (i) same-time units may be expressed in different ways, (ii) inherent ambiguity in natural languages leading to multiple interpretations, and (iii) context-sensitive nature of natural languages. While various academic and industrial approaches have presented solutions towards Timex, building an industry strength solution involves additional challenges in the form of user expectations, need for delivering high precision, and lack of training corpora. We elaborate how TEEN carefully mitigates these challenges. We demonstrate how the proposed approach compares with various state-of-the-art baselines on textual data from finance industry. We further categorize inadequacies of these baselines in an industrial setting. Finally, we provide insights gathered through the observations we made and the lessons we learned while designing TEEN to work in an industrial setting.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists

Explore more from Association of Data Scientists