AIED 2026 · Long Paper · Seoul

Can We Trust AI’s
Self-Assessment?

Evaluating and improving LLM confidence calibration in educational dialogue coding.

Hongming (Chip) Li1Dr. Huan Kuang2Dr. Anthony F. Botelho1

1 University of Florida · VIABLE Lab

2 Florida State University

Confidence distribution density under three anchoring conditions
When a model says “confidence: 0.9,” can we believe it?