University of Twente Student Theses

As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.

Comparing a Multistage and a Linear Summative Test on Ability Estimate Precision and Classification Accuracy

Lamoré, M. (2017) Comparing a Multistage and a Linear Summative Test on Ability Estimate Precision and Classification Accuracy.

PDF
1MB

Abstract:	At the end of primary education in the Netherlands, it has to be decided what level of secondary school the primary school students will attend. The initial advice for the level of secondary education that is most suitable for a pupil is given by the school. Next to that, all pupils take a test, which offers an independent advice on the most suitable level of secondary education. One of those tests is the Centrale Eindtoets, which is developed by Stichting Cito under the direction of the College voor Toetsen en Examens. This test provides two measures. Firstly, it classifies pupils into categories, which are linked to the levels of secondary education in the Netherlands, based on their performance on a test. Secondly, it offers an estimate of a pupil’s ability in the form of a standardized score. Accurate classification in this test is important, because misclassifications can lead to pupils attending a level of secondary education that is too high or too low compared to their ability. The test is currently administered in a linear format, which implies that all pupils respond to the same items, regardless of their ability. Therefore, it is likely that pupils have to respond to items that are too easy or too hard relative to their ability. Responding to items that are too easy results in a lack of challenge, while responding to items that are too hard results in frustration. Both emotions can negatively impact a pupil’s performance on the test. Secondly, items that are too easy or too hard relative to a pupil’s ability provide less than optimal information about the ability of the pupil. This is because specific test items provide optimal information about a pupil’s ability on a small range of the ability scale, which implies that an item selection with a mismatch in item difficulty for a particular pupil results in suboptimal information about that pupil’s ability. For low measurement precision, it is more likely that two (or more) adjacent school advices are within the pupil’s ability confidence interval and thus the probability for misclassification is higher. To increase classification accuracy on a test, there are two common approaches: increasing the amount of items that measure optimally around the cut-off point between two classification categories, or increasing the amount of items on a test. Both options are impractical in this case: test items have already been carefully chosen as to optimize the amount of test information available around the cut-off points between the classification categories, and the test already takes three mornings. Another option is to make use of adaptive testing, in which the pupils receive test items based on their performance on the test. Currently, an adaptive, multistage, version of the Centrale Eindtoets is under development. This version of the Centrale Eindtoets consists of three stages. In the first stage, it presents all pupils an initial block of items, or module, to gather an initial set of responses. Based on the responses on the first stage, pupils are routed to one of three modules with different difficulty levels based on their ability. After the second stage, the pupil is again routed to one of three modules, based on the performance on the first and second stage. As the items are adapted to the pupil’s estimated ability, it becomes possible to administer items that provide more information in the range of classification categories to which a pupil will likely belong. Therefore, measurement precision can be increased by opting for adaptive testing instead of linear testing. Although the advantages of the multistage the Centrale Eindtoets over a linear variant are evident from the literature, it is unknown to what extent the choice of the test design influences the measurement precision and the classification accuracy of the test. In that light, a simulation study was performed with two configurations of the multistage version of the Centrale Eindtoets, and one configuration of the linear version of the Centrale Eindtoets. The two variants of the multistage version of the Centrale Eindtoets differ with respect to the placement of the test items across the three different stages of the test. With the results of this simulation study, the linear and multistage version of the Centrale Eindtoets are compared with respect to the precision of the ability estimates and classification accuracy. Furthermore, the influence of different classification methods on classification accuracy is investigated. Lastly, the influence of different module designs on the precision of the ability estimates and classification accuracy is examined. The results show that a multistage version of the Centrale Eindtoets outperforms the linear version of the Centrale Eindtoets on both measurement precision and classification accuracy. Furthermore, “the sum of the estimated probability on all items” classification method consistently provides the highest classification accuracy, regardless of the test variant. Finally, the second variant of the multistage the Centrale Eindtoets outperforms the first variant of the multistage the Centrale Eindtoets, both in terms of measurement precision and classification accuracy. Based on the results from this study, one can conclude that the multistage the Centrale Eindtoets will indeed be an improvement compared with a linear the Centrale Eindtoets. Keeping in mind the limitations of the study, and the fact that the test design in the present study does not conform to all requirements of the 2018 version of the multistage the Centrale Eindtoets, it can be stated that adaptive testing will indeed be an improvement over the current linear way of testing.
Item Type:	Essay (Master)
Clients:	Cito
Faculty:	BMS: Behavioural, Management and Social Sciences
Subject:	81 education, teaching
Programme:	Educational Science and Technology MSc (60023)
Link to this item:	https://purl.utwente.nl/essays/72379
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page