AI Snake Oil

Share this post

Evaluating LLMs is a minefield

www.aisnakeoil.com

Evaluating LLMs is a minefield

Annotated slides from a recent talk

Arvind Narayanan
and
Sayash Kapoor
Oct 4, 2023
84
Share this post

Evaluating LLMs is a minefield

www.aisnakeoil.com
6

We have released annotated slides for a talk titled Evaluating LLMs is a minefield. We show that current ways of evaluating chatbots and large language models don't work well, especially for questions about their societal impact. There are no quick fixes, and research is needed to improve evaluation methods.

The challenges we highlight are somewhat distinct from those faced by builders of LLMs or by developers interested in comparing between LLMs for adoption. Those challenges are better understood and tackled by evaluation frameworks such as HELM. 

You can view the annotated slides here.

The slides were originally presented at a launch event for Princeton Language and Intelligence, a new initiative to strengthen LLM access and expertise in academia.

You’re reading AI Snake Oil, a blog about our upcoming book. Subscribe to get new posts.

The talk is based on the following previous posts from our newsletter:

  • Is GPT-4 getting worse over time?

  • Does ChatGPT have a liberal bias?

  • Generative AI companies must publish transparency reports

  • GPT-4 and professional benchmarks: the wrong answer to the wrong question

  • ML is useful for many things, but not for predicting scientific replicability

  • OpenAI’s policies hinder reproducible research on language models

  • Licensing is neither feasible nor effective for addressing AI risks

  • Is the future of AI open or closed?

84
Share this post

Evaluating LLMs is a minefield

www.aisnakeoil.com
6
Share
Previous
Next
6 Comments
Share this discussion

Evaluating LLMs is a minefield

www.aisnakeoil.com
Dan
Oct 4, 2023Liked by Arvind Narayanan

This is a great deck on multiple levels. The information is extremely helpful but almost more impressive is that it is presented in such a clear, easy-to-follow style. Thank you for sharing it.

Expand full comment
Reply
Share
Ravi J
Decentralize Your Life
Oct 7, 2023

Good topics

Expand full comment
Reply
Share
4 more comments...
Top
New
Community

No posts

Ready for more?

© 2024 Sayash Kapoor and Arvind Narayanan
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing