student.uva.nl
Welke opleiding volg je?
UvA Logo
Welke opleiding volg je?
Colloquiumpunten

Presentation Master's thesis - Conrad Bauer - Psychological Methods

Colloquiumpunten

Presentation Master's thesis - Conrad Bauer - Psychological Methods

Laatst gewijzigd op 19-08-2025 11:23
Automated master’s Thesis Grading Using Large Language Models and Rubrics
Toon informatie voor jouw opleiding
Welke opleiding volg je?
of
event-summary.start-date
22-08-2025 11:00
event-summary.end-date
22-08-2025 12:00
event-summary.location

Roeterseilandcampus - Gebouw G, Straat: Nieuwe Achtergracht 129-B, Ruimte: GS.01. Vanwege beperkte zaalcapaciteit is deelname op basis van wie het eerst komt, het eerst maalt. Leraren moeten zich hieraan houden.

Automated grading with large language models (LLMs) shows promise on short essays, but thesis-length evaluation remains untested. This study examines alignment between LLM and human grades for master’s theses and the impact of prompt design. English theses from the University of Amsterdam archive are graded on four analytic “Scientific Reasoning” rubrics from the Student Manual Master’s Thesis 2024-2025 (Research Question; Design/Procedure/Methods; Data Analysis; Conclusion/Discussion). Four raters evaluate every thesis: two humans (expert, novice) and two LLMs (ChatGPT-4o, GPT-5). Each LLM uses a task-only prompt and a structured-reasoning prompt. Agreement is quantified with ICC(A,1), quadratic-weighted κ, and Pearson’s r with a Bayesian hierarchical model evaluating effects on absolute residuals. Human resource use is compared to API costs & time use. The study addresses the gap in automated assessment for thesis-length work.