17 - Evaluation | Philip Tannor (DeepChecks)

LangTalks - A podcast by Lee Twito, Gal Peretz

Evaluating LLMs and AI pipeline in dev and production environments. How to work with datasets