목차

Title Page

ABSTRACT

Contents

CHAPTER 1. INTRODUCTION 15

1.1. Background and Purpose of the Study 15

1.2. Research Questions 18

1.3. Organization of the Thesis 19

CHAPTER 2. LITERATURE REVIEW 20

2.1. Defining and Assessing L2 Writing Abilities 20

2.1.1. Traditions of Language Performance Tests 20

2.1.2. Theories to Define Constructs of Language Ability 25

2.1.3. Empirical Research Exploring the Nature of Writing Ability 32

2.2. Factors that Affect L2 Writing Assessments 36

2.2.1. Rater Effects on L2 Writing Assessments 37

2.2.2. What are Rating Scales? 38

2.2.3. Interactions between Raters and Rating Criteria 41

2.3. Rater Cognition 48

2.3.1. Research on Scoring Behavior Using Many-facet Rasch Measurement (MFRM) Analysis 52

2.4. Summary 55

CHAPTER 3. METHODOLOGY 56

3.1. Participants 56

3.2. Instruments 57

3.2.1. Questionnaire 57

3.2.2. Essays to be Rated 58

3.2.3. Rating Rubric 59

3.3. Procedures 61

3.3.1. Data Collection 62

3.3.2. Data Analysis 64

3.4. Summary 72

CHAPTER 4. RESULTS AND DISCUSSION 73

4.1. Descriptive Statistics 73

4.2. Two-way facet Rasch Analysis 76

4.3. Cognitive Rater Types (CRTs) 84

4.4. Many-facet Rasch Analysis on Essay Ratings 91

4.4.1. Inter-Rater Agreement 91

4.4.2. Rater Measurement Results 92

4.4.3. Intra-Rater Reliability 97

4.5. Operational Rater Types (ORTs) 99

4.6. Relation between Criteria Perception and Scoring Behavior 107

4.6.1. Group-Based Investigation 107

4.6.2. Individual Rater-Based Investigation 112

CHAPTER 5. CONCLUSION 119

5.1. Findings and Implications 119

5.2. Limitations and Suggestions for Further Research 124

REFERENCES 127

APPENDICES 137

APPENDIX 1. Questionnaire 138

APPENDIX 2. Rating Scale 140

국문초록 145

Table 4.1. Means of the Criterion Importance Ratings per Rating Criterion (N＝30) 74

Table 4.2. Descriptive Statistics for Essay Ratings across the Rating Criteria (N＝30) 75

Table 4.3. Summary Statistics for the Many-facet Rasch Analysis of Raters' Criterion Importance Ratings (N＝30) 79

Table 4.4. Frequencies of Rater Fit Statistics (N＝30) 82

Table 4.5. Functioning of the Criterion Importance Rating Scale 83

Table 4.6. Measures and Fit Statistics of the Rating Criteria 84

Table 4.7. Means of the Criterion Importance Ratings among each CRT (N＝30) 87

Table 4.8. Summary Statistics for the Many-facet Rasch Analysis of Essay Ratings in Two Rasch Models (N＝30) 97

Table 4.9. Mean Bias Measures among each ORT (N＝27) 103

Table 4.10. Rater Composition of ORTs in Relation to CRTs (N＝27) 108

Table 4.11. Four Criterion-Related Bias Cases and the Means of the Criterion Importance Ratings (N＝27) 109

Table 4.12. Criterion-Related Bias Measures under the Rating Scale Model and the Criterion Importance Ratings (N＝27) 113

Table 4.13. Criterion-Related Bias Measures under the Partial Credit Model and the Criterion Importance Ratings (N＝27) 114

Figure 2.1. The Characteristics of Performance Assessment 41

Figure 4.1. Variable Map of Raters' Criterion Importance Ratings 78

Figure 4.2. Hierarchical Clustering Solution for CRTs 86

Figure 4.3. Criterion Importance Profiles for CRTs 87

Figure 4.4. Variable Map from the Many-facet Rasch Analysis of 30 Essay Ratings under the Rating Scale Model 94

Figure 4.5. Variable Map from the Many-facet Rasch Analysis of 30 Essay Ratings under the Partial Credit Model 95

Figure 4.6. Hierarchical Clustering Solution for ORTs 101

Figure 4.7. Bias Diagram for ORT 1 through ORT 6 102