University of Twente Student Theses

As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.

AI for Automatic Feedback on Assessment Portfolios in Secondary Education

Nieuwenhuis, Kevin (2025) AI for Automatic Feedback on Assessment Portfolios in Secondary Education.

PDF
318kB

Abstract:	The purpose of this paper is to support a program designed to assist Dutch secondary education teachers in assessing student assignments. The teachers' work is graded by instructors based on a set of criteria, with a pass/fail rating and an optional comment. A locally-run LLM is used to generate feedback instead of the instructors. Running LLMs locally requires smaller models, which perform worse in Dutch. To find out the quality of using these smaller LLMs to generate feedback, we first analyzed the language performance in Dutch with various benchmarks for Gemma-3-27B-it. Furthermore, several approaches were explored to generate feedback with the model, including incorporating summaries of textbook chapters and an answer model. Feedback quality was manually assessed by an instructor, and automatically via pass/fail agreement with the reference feedback and using the LLM-as-judge framework, G-Eval, with Qwen3-32B. After analyzing the performance of Gemma-3-27B-it, we found that the model shows strong comprehension, but still struggles with understanding specific semantics of a sentence. Moreover, we found that supplying extra context of the textbook and the corresponding answer model in the prompt improves the quality of the feedback, according to one of the involved instructors. The scores from G-Eval with Qwen also returned the same conclusion. For future work, the system should have an appropriate interface for which the generated feedback can be further evaluated in practice. Our contributions with this paper are (1) showing how the Dutch language performance of LLMs can be analyzed, (2) how extra context can be used to improve the output of generated feedback, and that (3) smaller LLMs can be used to evaluate content using LLM-as-judge.
Item Type:	Essay (Bachelor)
Faculty:	EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:	54 computer science
Programme:	Computer Science BSc (56964)
Link to this item:	https://purl.utwente.nl/essays/107360
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page