Newsletter of depends-on-the-definition - with Data validation, Behavioral Testing of NLP Models and Population Shift Monitoring

Hey there, you just received the monthly depends-on-the-definition newsletter for June 2020.

Post of the month: Data validation for NLP applications with topic models

Once a machine learning model has been deployed its behavior must be monitored. The predictive performance is expected to degrade over time as the environment changes. This is known as concept drift, occurs when the distributions of the input features shift away from the distribution upon which the model was originally trained.

This time we implement a sophisticated approach to detect and reduce the impact of concept drift. We will leverage a topic model based on Latent Dirichlet Allocation and estimate the likelihood of a new document under this model. This will give us a way to filter out documents.

Paper pick

"Beyond Accuracy: Behavioral Testing of NLP Models with CheckList" introduces CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model.

Tips & Tricks

Population Shift Monitoring with popmon

Recommended reading

Luigi Patruno wrote a comprehensive blog series on how to deploy ML models to production. Many of these blog posts include tutorial-style code. With the goal to educate data scientists, ML engineers, and ML product managers about the pitfalls of model deployment, I found it a very useful overview:
https://mlinproduction.com/deploying-machine-learning-models/