ADaSci Premium Membership fee will be revised from 1st March 2024. Lock your membership for 1 year at current price.

PII Detection in Emails through QLoRA Fine-tuned LLMs: A comparative analysis with BERT and GPT3.5

Author(s): Chinmay Prakash, Rishit Tyagi, Prakash Selvakumar


Personally Identifiable Information (PII) detection is critical due to the increasing exploitation of individual data, particularly in the text analytics domain. With the rise in the application of large language models (LLMs) for Natural Language Processing (NLP) solutions, data security concerns call for effective on-premises solutions and privacy-centric methods.

This paper explores the use of LLMs fine-tuned on limited domain-specific datasets for detecting and masking PII and benchmarking this solution against existing NLP methods such as BERT and GPT3.5. Our approach includes fine-tuning the Vicuna-7B LLM using the Quantized and Low Rank Adaptation (QLoRA) technique, enabling cost-effective fine-tuning and deployment on consumer GPUs; The proposed approach offers several advantages, including improved performance and reliability compared to GPT3.5, enhanced data security by keeping data within the company’s cloud, domain adaptability through model fine-tuning, and on-premise usage benefits such as reduced dependence on proprietary models, quota limitations, and flexible scaling of model hosting infrastructure.

Overall, this paper presents an efficient and secure solution for domain specific PII detection tasks using LLMs.

Access The Research Paper:

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists

Explore more from Association of Data Scientists