Recording: AI Evals Workshop
Courses
12Sales

Join Aishwarya Srinivasan and Arvind Narayanamurthy for a live workshop on AI Evals – How To Evaluate Your LLM Applications.


This session is designed to move you past anecdotal testing and "gut checks" toward a rigorous, data-driven approach for building reliable AI applications. You’ll learn the end-to-end process of measuring performance, selecting the right metrics, and maintaining quality in production.


This session is designed for both beginners and practitioners building LLM-powered applications who want a reliable way to measure quality.


Prerequisite: Basic familiarity with LLM APIs and Python is recommended.


You’ll learn:


→ What are Evals: Gain a clear understanding of what LLM evaluations really are (and are not), when public benchmarks help, when they mislead, and the most common misconceptions that cause teams to trust the wrong signals.


→ The End-to-End Eval Process: A practical walkthrough of the full evaluation lifecycle—from defining the problem to interpreting results—so you can reason about quality in a structured, repeatable way.


→ Scoring & Judgement Strategies: Explore the trade-offs between human evaluators and "LLM-as-a-Judge" to create a scalable and objective scoring system for your model outputs.


→ Lab: Building an Eval Pipeline from Scratch: A nuts-and-bolts walkthrough where we build an evaluation pipeline for a real LLM application, helping you develop intuition for metric selection, scoring strategies, and result interpretation.


→ Evals in Production & LLMOps: How evaluations fit into real systems, including offline vs online evals, monitoring quality over time, and an overview of the current eval and LLMOps tooling ecosystem.

We’ll wrap up with an Open Q&A and Insights session to help you troubleshoot your specific challenges and apply these frameworks to your current projects.


By the end of this workshop, you will have a clear, actionable roadmap for measuring LLM quality and the confidence to deploy AI systems that are backed by data, not just intuition.

$24