ML based high-cardinality reduction methods to create geo-score to improve auto insurance Tweedie pricing model

Author(s):Suguna Jayaraj, Harmandeep Kaur

Abstract

A typical automobile insurance rating plan contains a plethora of risk factors, ranging from driver and vehicle to policy characteristics. Including the geographical risk characteristics in the pricing has been challenging owing to its high cardinality. The traditional approach groups the postal codes based on the historical loss experience, which suffers from two major drawbacks: a) For geographies with low exposure, the loss cost is almost always zero b) Low confidence as we lose information on the latent variables. In this paper, we demonstrate a case study of a Greece automobile insurance product offered by a major US-based P&C provider, where a Geo-score was developed at a postal code level to improve risk segmentation in own damage cover pricing. The base loss cost(loss/exposure) model was built using Tweedie Compound Poisson regression, and geospatial attributes were added to the model without changing the existing rating structure. The external attributes like socio-demographic variables and highway/network data are sourced to create geographical clusters using partitioning around medoids (PAM). Further, various high cardinality feature reduction techniques were used to predict the residual loss cost. This paper illustrates the hybrid approach of the target-based encoding methods and XGBoost to create the geo-score.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Explore more from Association of Data Scientists

Become ADaSci Chapter Lead

As a chapter lead, you will have the opportunity to connect with fellow data professionals in your area, share knowledge and resources, and work together to advance the field of data science.