Abstract
We present TEEN, an industry-grade solution to the problem of time expression extraction and normalization (Timex). Extraction and normalization of temporal units is a challenging problem due to several factors, e.g., (i) same-time units may be expressed in different ways, (ii) inherent ambiguity in natural languages leading to multiple interpretations, and (iii) context-sensitive nature of natural languages. While various academic and industrial approaches have presented solutions towards Timex, building an industry strength solution involves additional challenges in the form of user expectations, need for delivering high precision, and lack of training corpora. We elaborate how TEEN carefully mitigates these challenges. We demonstrate how the proposed approach compares with various state-of-the-art baselines on textual data from finance industry. We further categorize inadequacies of these baselines in an industrial setting. Finally, we provide insights gathered through the observations we made and the lessons we learned while designing TEEN to work in an industrial setting.