The Machine Learning Developers Summit (MLDS) 2024 in Bengaluru marked a significant milestone in India’s artificial intelligence landscape. This premier generative AI conference brought together leading minds in the field, and among them was Vaibhav Pawar, Senior Director and Head of Data Science at Poshmark India. With 15 years of experience, Vaibhav is a luminary in the data science and machine learning domain. His talk, “Practical Lessons from Developing Large-Scale Systems for Search and Recommendations,” unveiled a wealth of knowledge crucial for navigating the intricate world of large-scale ML systems.
Scaling Search and Recommendations
Vaibhav Pawar’s extensive experience resonated as he shared practical insights gained from his journey in building large-scale ML systems. The talk emphasized the paramount importance of meticulously defining metrics, distinguishing between goal, guardrail, and debug metrics. Pawar highlighted the need to contextualize these metrics to ensure a comprehensive understanding of system performance. Notably, he stressed the significance of the “overall evaluation criterion,” a single metric crucial for determining the success of the system post-launch.
Navigating Data Biases
Vaibhav delved into the critical issue of data biases, particularly pronounced in search and recommendation systems. Addressing biases like selection and position bias, he emphasized the challenges in creating unbiased models. Pawar noted the scarcity of battle-tested open-source implementations, making it imperative for data scientists to develop algorithms tailored to their specific needs. From exploration components in data collection to strategies in handling biases during training, Vaibhav provided a comprehensive guide to navigating the intricate landscape of data biases in ML systems.
Training Data, Features, and Model-Level Metrics
In the realm of search and recommendation systems, defining appropriate training data and features is far from straightforward. Vaibhav Pawar shed light on the importance of training mature systems with already collected logs. He advocated for data quality improvement, emphasizing the need for judicious sampling to balance training time and costs. The talk explored the nuances of handling positives and negatives in training data, stressing the complexity of defining negative samples in search and recommendation systems. Pawar also shared insights into feature considerations, cautioning against the challenges posed by features with a cold start.
A Crucial Pre-Deployment Step
The significance of offline evaluation in understanding model performance before deployment took center stage in Vaibhav’s discourse. Pawar cautioned against pitfalls, emphasizing the need to reconsider established metrics like NDCG in the context of specific use cases. His insights extended to data splitting methodologies, urging practitioners to be wary of potential issues that may arise. A key takeaway was the importance of not only evaluating systems through code but also exposing them to human scrutiny before launch, ensuring alignment with business and user expectations.
Navigating the Challenges
Addressing the Holy Grail of search and recommendation systems, Vaibhav discussed the online and offline gap. Acknowledging that models performing well in offline evaluations might not translate to success in production, he urged practitioners to avoid the trap of solely optimizing for offline metrics. Pawar introduced the concept of off-policy evaluation as a strategy to debias evaluations but cautioned about the scarcity of open-source implementations, highlighting the need for bespoke algorithm development.
A Critical Stage in System Performance
Vaibhav delved into the intricacies of the retrieval stage, emphasizing its critical role in system performance. He highlighted the challenges posed by large inventories, where determining truly relevant items for a given query becomes a formidable task. Pawar introduced alternative metrics like retrievability and creative approaches, such as using a ranking model to assess retrieval performance. The talk underscored the irreplaceable nature of the retrieval step and its profound impact on the overall effectiveness of search and recommendation systems.
The Hybrid Search Paradigm
In a thought-provoking segment, Vaibhav Pawar challenged the perception of Vector-based search as a panacea. Drawing parallels with lexical search, he emphasized the potency of a hybrid approach that combines both, shedding light on the current state of the art in search technology. Pawar encouraged practitioners to consider full-fledged search engines for comprehensive infrastructure, cautioning against the limitations of relying solely on Vector DBs.
Experimentation and Continuous Improvement
Vaibhav underscored the role of experimentation not only in gauging the impact of features but also in guiding the development of models and systems. He urged practitioners to consider the impact of accuracy versus latency and navigate the network effects in a marketplace setting. Pawar stressed the need for transparent communication during significant changes, outlining the importance of managing user expectations and refining policies to adapt to evolving scenarios.
Guardrails for Success
In a final insightful segment, Vaibhav Pawar explored often-overlooked dimensions: diversity and fairness. He highlighted their correlation with long-term success and the importance of balancing business metrics with these critical factors. The talk emphasized embedding considerations of diversity and fairness from the project’s inception, cautioning against overlooking their impact on the system’s overall efficacy.
Infrastructure, QA, and Team Dynamics
Vaibhav concluded his talk by delving into the essential peripheral components of system development. From creating robust monitoring systems and incident response management to emphasizing the need for comprehensive logging, he outlined crucial elements in ensuring the stability and performance of large-scale ML systems. Pawar stressed the importance of running systems in shadow mode and implementing effective QA processes, underscoring the need for dedicated, cross-functional teams with steadfast executive sponsorship.
Conclusion
Vaibhav Pawar’s talk at MLDS 2024 unfolded as a tapestry of insights, providing a holistic view of the multifaceted challenges and strategies in developing large-scale search and recommendation systems. As we reflect on the key takeaways, it becomes evident that his practical lessons are invaluable signposts for any data scientist, machine learning engineer, or AI enthusiast navigating the intricate landscape of scalable ML systems. The Machine Learning Developers Summit 2024 not only showcased cutting-edge advancements but also provided a platform for luminaries like Vaibhav Pawar to share knowledge that propels the industry forward.