arg min blog

Does AI Suck at Art?

Oct 26, 2022. I’ve been researching machine learning for a little over 20 years. For the past five years or so, with the latest wave of AI overpromising, I think I’ve been mostly known as an AI skeptic.... Continue

There’s more to data than distributions.

Mar 31, 2022. This is first guest post by Deb. More to come! In “The clinician and dataset shift in artificial intelligence,” published in the New England Journal of Medicine, a set of physician-scientists describe how a popular... Continue

Machine Learning has a validity problem.

Mar 15, 2022. One of the central tenets of machine learning warns the more times you run experiments with the same test set, the more you overfit to that test set. This conventional wisdom is mostly wrong and... Continue

Let us never speak of these values again.

Feb 23, 2022. A recent Twitter quiz asked “what is a powerful concept from your field that, if more people understood it, their lives would be better?” Unambiguously, the answer from my field is statistical significance. Significance testing... Continue

What were the effects of the Bangladesh mask intervention?

Dec 1, 2021. There’s been a bit of a social-media back-and-forth between us and Jason Abaluck about the design and statistical significance of the Bangladesh Mask RCT. To focus and hone the discussion on some crucial details, we... Continue

The cult of statistical significance and the Bangladesh Mask RCT.

Nov 29, 2021. In the last post, I argued that the effect size in the Bangladesh Mask RCT was too small to inform policy making. I deliberately avoided diving into statistical significance as arguments about p-values quickly devolve... Continue

Revisiting the Bangladesh Mask RCT.

Nov 23, 2021. In an earlier post, I raised a few issues with a large-scale RCT run in Bangladesh aimed at estimating the effectiveness of masks on reducing the spread of the coronavirus. In particular, I was a... Continue

The Perceptron as a prototype for machine learning theory.

Nov 4, 2021. Just as many of the algorithms and community practices of machine learning were invented in the late 1950s and early 1960s, the foundations of machine learning theory were also established during this time. Many of... Continue

The Saga of Highleyman's Data.

Oct 20, 2021. The first machine learning benchmark dates back to the late 1950s. Few used it and even fewer still remembered it by the time benchmarks became widely used in machine learning in the late 1980s. In... Continue

Machine learning is not nonparametric statistics.

Oct 13, 2021. Many times in my career, I’ve been told by respected statisticians that machine learning is nothing more than nonparametric statistics. The longer I work in this field, the more I think this view is both... Continue