Abstract
Lookalike audience generation is an effective way to increase the audience base in online advertising. Segregating the lookalike audience into multiple priority levels gives greater flexibility to the advertiser in selecting their user reach. In this paper, a novel approach of lookalike audience generation in multiple priority levels on a large-scale data with millions of users and thousands of audience segments is explained. An automated system combining custom models to generate similar audience segments and group lookalike audience into priority levels using Spark Scala and Hadoop ecosystem is developed. The experimental results comparing different approaches show that our proposed model outperforms others in reach, scalability, and speed.