This is a great deck on multiple levels. The information is extremely helpful but almost more impressive is that it is presented in such a clear, easy-to-follow style. Thank you for sharing it.
Thanks for this. It is a crucial topic fundamental to wide-scale adoption in corporate settings. In the world of time series models, no measure is perfect, but we have many good ones that allow baselining and monitoring (that also include whether users added value by changing the outputs). Finding more rigorous approaches to LLMs will require some very thoughtful work.
Very interesting slides! A bit off-topic, but it looks like you're using reveal.js -- how do you render your notes alongside as annotations? That's a very helpful way of sharing them!
Evaluating LLMs is a minefield
This is a great deck on multiple levels. The information is extremely helpful but almost more impressive is that it is presented in such a clear, easy-to-follow style. Thank you for sharing it.
Good topics
Thanks for this. It is a crucial topic fundamental to wide-scale adoption in corporate settings. In the world of time series models, no measure is perfect, but we have many good ones that allow baselining and monitoring (that also include whether users added value by changing the outputs). Finding more rigorous approaches to LLMs will require some very thoughtful work.
thanks for this!!
Very interesting slides! A bit off-topic, but it looks like you're using reveal.js -- how do you render your notes alongside as annotations? That's a very helpful way of sharing them!