Prompt and Rater Effects in Second Language Writing Performance Assessment.

[摘要] Performance assessments have become the norm for evaluating language learners’ writing abilities in international examinations of English proficiency.Two aspects of these assessments are usually systematically varied: test takers respond to different prompts, and their responses are read by different raters.This raises the possibility of undue prompt and rater effects on test-takers’ scores, which can affect the validity, reliability, and fairness of these tests.This study uses data from the Michigan English Language Assessment Battery (MELAB), including all official ratings given over a period of over four years (n=29,831), to examine these issues related to scoring validity.It uses the multi-facet extension of Rasch methodology to model this data, producing measures on a common, interval scale.First, the study investigates the comparability of prompts that differ on topic domain, rhetorical task, prompt length, task constraint, expected grammatical person of response, and number of tasks.It also considers whether prompts are differentially difficult for test takers of different genders, language backgrounds, and proficiency levels.Second, the study investigates the quality of raters’ ratings, whether these are affected by time and by raters’ experience and language background.It also considers whether raters alter their rating behavior depending on their perceptions of prompt difficulty and of test-takers’ prompt selection behavior.The results show that test-takers’ scores reflect actual ability in the construct being measured as operationalized in the rating scale, and are generally not affected by a range of prompt dimensions, rater variables, or test taker characteristics.It can be concluded that scores on this test and others whose particulars are like it have score validity, and assuming that other inferences in the validity argument are similarly warranted, can be used as a basis for making appropriate decisions.Further studies to develop a framework of task difficulty and a model of rater development are proposed.

[发布日期] [发布机构] University of Michigan

[效力级别] Writing Assessment [学科分类]

[关键词] Language Testing;Writing Assessment;Performance Assessment;Educational Measurement;Education;Social Sciences;Education [时效性]

浏览次数：80

统一登录查看全文激活码登录查看全文