Deep neural networks are powerful black box predictors that achieved impressive performance on a wide spectrum of tasks. Quantifying predictive uncertainty in NNs is a challenging and yet unsolved problem. In the paper Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, the authors propose an alternative to Bayesian NNs that is simple to implement, readily parallelizable, requires very little hyperparametertuning, and yields high quality predictive uncertainty estimates.
Tips & Tricks
PyTorch-Transformers is a library of state-of-the-art pre-trained models for Natural Language Processing.
The paper on the “No Free Lunch Theorem”, actually called "The Lack of A Priori Distinctions Between Learning Algorithms" is one of these papers that are often cited and rarely read. People in the ML community refer to it when supporting the claim that “one model can’t be the best at everything” or “one model won’t always be better than another model”. The point of this post is to convince you that this is not what the paper or theorem says, and you should not cite this theorem in this context: https://peekaboo-vision.blogspot.com/2019/07/dont-cite-no-free-lunch-theorem.html