18 Comments
Jun 4Liked by Sayash Kapoor, Arvind Narayanan

“The end of the beginning” really was an apt description by Ben Thompson

Expand full comment
Jun 4Liked by Sayash Kapoor, Arvind Narayanan

Great read! The feedback loop of overoptimism fueling flawed research, further misleading or obfuscating fact finding is extremely worrisome.

It's a feedback loop that is maintained, and partly manufactured (if you want to get really cynical), by closed-source AI companies that benefit from over-attributing qualities to their newfound models and are happy to cheat benchmarks to beat the competition (on paper), as for them, it can be the difference between getting or not getting that next capital injection.

Expand full comment
Jun 6Liked by Sayash Kapoor, Arvind Narayanan

Thank you very much for another insightful and important article!

Expand full comment

Good analysis. I do think there's an elephant in the room behind hype. Money. You get more grants, book deals and speaking engagements, if your results are "groundbreaking" than if they're "interesting." Your point about the suspension of common sense is also very important.

Expand full comment
Jun 4Liked by Sayash Kapoor, Arvind Narayanan

Great piece. An example of the "overestimate in the short run" half of Amara's Law unfolding before our very eyes. I appreciate that rather than just pointing at the problem, you analyze causes and propose constructive solutions.

Expand full comment
Jun 3Liked by Sayash Kapoor, Arvind Narayanan

I really appreciate how measured, even handed and objective you both are in all of your essays. I’m just so conditioned to having a subject like AI overrun by hot takes from the people who think Elon Musk will build a robot that will be elected President by November or the people who think that Terminator was a documentary. It’s just refreshing to read about a new and uncertain - though clearly potentially momentous - technology in a way that neither sensationalizes it or that makes it a boogeyman for all the world’s woes.

And as I write this and realize how pervasive that sort of hysterical commentary is on the internet, and then remember that most LLMs were trained on internet data, I just felt a twinge of nausea about the future…

Expand full comment
Jun 10Liked by Sayash Kapoor

Stats (spurring pervasive hype) on the 2 Nature papers cited in your article’s 1st paragraph:

“In the top 5% of all research outputs scored by Altmetric”

“,,, it's in the top 5% of all research outputs ever tracked by Altmetric”

Expand full comment
Jun 3·edited Jun 3Liked by Sayash Kapoor

Brilliant description of the hype-pressure-myth-reinforcement cycle. Makes me wonder where else in AI applications that model applies... something I'll have to look further into. Thank you for the food for thought.

Expand full comment

Thanks for this. Data leakage is a big problem in corporate use cases as well, and I rarely see it discussed. I have seen highly experienced data scientists make mistakes regarding leakage. Formal peer review processes are critical to catching these issues and fostering staff development. I suspect that many cases where models underperform in production can be traced back to leakage mistakes (and models that needed more work).

Expand full comment

Thanks for this very insightful post. Social and clinical scientists have used meta analyses and funnel plots to put overoptimistic results in broader context. Developing similar tools for ML-driven science can serve as a counterweight to the hype flywheel that you describe herein.

To that end, we recently published on a new approach for generating realistic estimates of ML model performance in a given field from a collection of published overoptimistic results:

https://arxiv.org/abs/2405.14422

Expand full comment
author

Nice paper! I skimmed it a couple weeks ago. One quick comment is that not all types of leakage vary with sample size. I understand how adaptive data analysis / feature selection leads to overoptimism that scales with n^{-0.5}, but in other cases such as using features "from the future" overoptimism will be stable with increasing simple size.

Expand full comment
Jun 13Liked by Arvind Narayanan

Thanks, this is great feedback. It's unlikely that the applications we considered have that type of leakage as these models make predictions about the current (not future) state of the patients - the observed negative association between sample size and accuracy serves as evidence that leakage/bias is related with sample size. Nevertheless, you bring up an important clarification for what types of data leakage we're considering - will update the arxiv.

Expand full comment

The problem is Big Tech literally controls the PR that is mistaken as facts on social media.

When the media is this degraded this is what happens. So I'm reading even figures like Casey Newton parrot Venture Capital interests.

So there is this insidious lobbying aspect of the AI hype cycle where OpenAi marketing or the idea that Nvidia can reach 10 trillion is pushed as fact.

Expand full comment

https://m.youtube.com/watch?v=m23tGqmmiA8

SpaceX big tech gets ragged several times a year

Expand full comment

https://m.youtube.com/watch?v=jOrXSraSLuo

that YouTube channel's popularity proves that Big Tech doesn't completely control the narrative

Expand full comment

Agree completely with your admonitions to scientists about use of AI. FWIW, I've been playing around with using AI for influencing approaches to AI safety governance: https://www.linkedin.com/pulse/washington-address-scott-lewis-pc9vc

Expand full comment

The Duede et al. preprint of which you depict the graph showing AI engagement in science would have profited from following your REFORMS checklist too. Trying to reproduce their results I already failed at finding the list of keywords they used to classify abstracts as AI engaged.

Expand full comment

Important to keep track of this and how it develops. Overall, due to the rapid, actually accelerating pace at which the AI hype is running, we must not be surprised that a majority of its uses will be misguided. Some of that will become more obvious with time, some of it is intentional from the outset.

I think we just can't help it, other than being aware. It is an incentives-issue. Part of the incentives have to do with the technology, and another part is preexisting in the fields, systems, institutions and communities where it gets applied. Realistically, any kind of balanced or wise approach can’t be expected in the short-term.

Expand full comment