Baseline Sales Prediction Using Linear Regression on Retail Big Data

Author(s):Udit Shrivastava,Ray Uwe Hidaka


In Retail World, promotional offers are important as they increase consumer demand and sales. As promotional strategies directly engage with the customers, it is important to pick the right product, and the discount that resonates with the customer’s needs. One way to do so is by evaluating past promotions and understanding their impact on sales and baskets. In this paper, we discussed an approach that evaluates sales uplift due to promotion and helps understand its wider impact on the business. Sales uplift is a key metric that calculates the difference between promotional sales (i.e., sales during the promotion period) and baseline sales (sales if there were no promotions in that period). The challenge is to calculate the baseline sales value, as it is not possible to know the actual sales value if the promotion had not taken place. Hence, we proposed a Linear-Regression (LR) based approach to forecast the baseline sales value of a product that is on promotion by considering various factors like seasonality, trend, etc. Our LR-based baseline approach has an edge over other Time-series approaches (e.g., Holt- Winters Exponential Smoothening) especially for seasonal products, as LR-predicted sales values are not only driven by the recent past sales value. The baseline is used to calculate uplift to evaluate promotion performance and allows retailers to align their promotional strategies. This methodology was tested on transaction data sets for two different retailers. Based on the results the retailers were able to identify and avoid sales dilutive promotions which were neither benefitting the retailer nor the customers.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Explore more from Association of Data Scientists

Become ADaSci Chapter Lead

As a chapter lead, you will have the opportunity to connect with fellow data professionals in your area, share knowledge and resources, and work together to advance the field of data science.