← HomeLogin
Scalable oral exams with an ElevenLabs voice AI agent
~ai.llms~education~tech
www.behind-the-enemy-lines.com Jan 3, 2026Tildes

Summary

In our new "AI/ML Product Management" class, the "pre-case" submissions (short assignments meant to prepare students for class discussion) were looking suspiciously good. Not "strong student" good. More like "this reads like a McKinsey memo that went through three rounds of editing," good.

So we started cold calling students randomly during class.

The result was... illuminating. Many students who had submitted thoughtful, well-structured work could not explain basic choices in their own submission after two follow-up questions. Some could not participate at all. This gap was too consistent to blame on nerves or bad luck. If you cannot defend your own work live, then the written artifact is not measuring what you think it is measuring.

Brian Jabarian has been doing interesting work on this problem, and his results both inspired us and gave us the confidence to try something that would have sounded absurd two years ago: running the final exam with a Voice AI agent.

[...]

Total cost for 36 students: 15 USD.

[...]

The grading was stricter than my own default. That's not a bug. Students will be evaluated outside the university, and the world is not known for grade inflation.

The feedback was better than any human would produce. The system generated structured "strengths / weaknesses / actions" summaries with verbatim quotes from the transcript. Sample feedback from the highest scorer:

[...]

And here is an underrated benefit of this whole setup: the exam is powered by guidelines, not by secret questions. We can publish exactly how the exam works—the structure, the skills being tested, the types of questions. No surprises. The LLM will pick the specific questions live, and the student will have to handle them.

[...]

And here is the delicious part: you can give the whole setup to the students and let them prepare for the exam by practicing it multiple times. Unlike traditional exams, where leaked questions are a disaster, here the questions are generated fresh each time. The more you practice, the better you get. That is... actually how learning is supposed to work.