Title Page
ABSTRACT
Contents
CHAPTER 1. INTRODUCTION 15
1.1. Background and Purpose of the Study 15
1.2. Research Questions 18
1.3. Organization of the Thesis 19
CHAPTER 2. LITERATURE REVIEW 20
2.1. Defining and Assessing L2 Writing Abilities 20
2.1.1. Traditions of Language Performance Tests 20
2.1.2. Theories to Define Constructs of Language Ability 25
2.1.3. Empirical Research Exploring the Nature of Writing Ability 32
2.2. Factors that Affect L2 Writing Assessments 36
2.2.1. Rater Effects on L2 Writing Assessments 37
2.2.2. What are Rating Scales? 38
2.2.3. Interactions between Raters and Rating Criteria 41
2.3. Rater Cognition 48
2.3.1. Research on Scoring Behavior Using Many-facet Rasch Measurement (MFRM) Analysis 52
2.4. Summary 55
CHAPTER 3. METHODOLOGY 56
3.1. Participants 56
3.2. Instruments 57
3.2.1. Questionnaire 57
3.2.2. Essays to be Rated 58
3.2.3. Rating Rubric 59
3.3. Procedures 61
3.3.1. Data Collection 62
3.3.2. Data Analysis 64
3.4. Summary 72
CHAPTER 4. RESULTS AND DISCUSSION 73
4.1. Descriptive Statistics 73
4.2. Two-way facet Rasch Analysis 76
4.3. Cognitive Rater Types (CRTs) 84
4.4. Many-facet Rasch Analysis on Essay Ratings 91
4.4.1. Inter-Rater Agreement 91
4.4.2. Rater Measurement Results 92
4.4.3. Intra-Rater Reliability 97
4.5. Operational Rater Types (ORTs) 99
4.6. Relation between Criteria Perception and Scoring Behavior 107
4.6.1. Group-Based Investigation 107
4.6.2. Individual Rater-Based Investigation 112
CHAPTER 5. CONCLUSION 119
5.1. Findings and Implications 119
5.2. Limitations and Suggestions for Further Research 124
REFERENCES 127
APPENDICES 137
APPENDIX 1. Questionnaire 138
APPENDIX 2. Rating Scale 140
국문초록 145
Table 4.1. Means of the Criterion Importance Ratings per Rating Criterion (N=30) 74
Table 4.2. Descriptive Statistics for Essay Ratings across the Rating Criteria (N=30) 75
Table 4.3. Summary Statistics for the Many-facet Rasch Analysis of Raters' Criterion Importance Ratings (N=30) 79
Table 4.4. Frequencies of Rater Fit Statistics (N=30) 82
Table 4.5. Functioning of the Criterion Importance Rating Scale 83
Table 4.6. Measures and Fit Statistics of the Rating Criteria 84
Table 4.7. Means of the Criterion Importance Ratings among each CRT (N=30) 87
Table 4.8. Summary Statistics for the Many-facet Rasch Analysis of Essay Ratings in Two Rasch Models (N=30) 97
Table 4.9. Mean Bias Measures among each ORT (N=27) 103
Table 4.10. Rater Composition of ORTs in Relation to CRTs (N=27) 108
Table 4.11. Four Criterion-Related Bias Cases and the Means of the Criterion Importance Ratings (N=27) 109
Table 4.12. Criterion-Related Bias Measures under the Rating Scale Model and the Criterion Importance Ratings (N=27) 113
Table 4.13. Criterion-Related Bias Measures under the Partial Credit Model and the Criterion Importance Ratings (N=27) 114
Figure 2.1. The Characteristics of Performance Assessment 41
Figure 4.1. Variable Map of Raters' Criterion Importance Ratings 78
Figure 4.2. Hierarchical Clustering Solution for CRTs 86
Figure 4.3. Criterion Importance Profiles for CRTs 87
Figure 4.4. Variable Map from the Many-facet Rasch Analysis of 30 Essay Ratings under the Rating Scale Model 94
Figure 4.5. Variable Map from the Many-facet Rasch Analysis of 30 Essay Ratings under the Partial Credit Model 95
Figure 4.6. Hierarchical Clustering Solution for ORTs 101
Figure 4.7. Bias Diagram for ORT 1 through ORT 6 102