초록

혼합형 검사로 구성된 대규모 검사의 개발 초기 단계에는 검사의 내용 영역 및 행동 영역, 제한 시간, 피험자의 발달적 특성, 측정의 정교성 수준 등을 고려하여 검사 문항 유형별 문항 수와 배점 등을 결정하는 것이 바람직하다. 그러나 실제로는 배점을 조정함에 따르는 측정학적 특성의 변화, 특히 신뢰도의 변화에 대한 경험적인 근거 없이 선택형 문항 및 서답형 문항의 문항수 및 배점이 정책적 결정 또는 선험적인 판단에 근거하여 결정되는 경우가 있다. 본 연구에서는 중·고등학교 현장에서 사용되고 있는 혼합형 검사 데이터를 이용한 예시적 분석을 통하여 신뢰도를 최대화할 수 있는 문항 유형별 배점의 조합을 탐색하는 방법을 적용하고, 이 방법의 현장 적용 가능성을 모색하였다. 이를 위해, 한 중학교의 국어과와 과학과 중간고사 데이터를 이용하여 합성수 산출시 선택형 문항과 서답형 문항에 적용하는 가중치를 다양하게 변화시킴에 따라 검사의 신뢰도가 어떻게 변화하는지를 탐색하였다. 분석 방법으로는 다변량 일반화가능도 분석을 적용하여 오차 분산 및 일반화가능도계수를 산출하였다. 본 연구 결과로 산출된 최적의 상대 가중치는 본 연구에서 사용한 특정 검사에만 국한되지만, 본 연구에서 적용한 방법론적 틀은 검사의 측정학적 특성을 바탕으로 최적의 문항 유형 가중치를 결정하기 위해 적용 가능하다. 본 연구에서는 피험자 국면과 문항 국면으로 구성된 설계에만 국한하여 분석을 수행하였으나, 향후 채점자 국면을 임의 국면으로 포함한 설계에 확장하여 적용할 것을 후속 연구로 제안하였다.

It is a typical practice in developing a large-scale test with mixed format that number of items of multiple-choice(MC) and constructed-response(CR) items and relative weights of each format are determined based on test specifications including content and performance domains, time constraints, developmental stage of examinees, and desired accuracy of measurement. In some cases, however, this kind of decision is arbitrarily made based on government policy or a priori judgment, not based on psychometric properties from empirical research. This study suggested a method of investigating effects of diversifying relative weights of MC and CR items on reliability indices such as generalizability index or index of dependability using multivariate generalizability theory and finding an optimum combination of relative weights that maximize the reliability, by providing an illustrative example of applying the methodology especially in the context of teacher-crafted assessments. It should be noted that the optimum combination of relative weights obtained from the illustrative example is not generalizable to other testing programs, but the methodology applied in this study can be utilized to find an optimum compositions of MC and CR items in other mixed format tests in general and will provide basis for making informed decisions in test construction.