A Psychometric Evaluation of Script Concordance Tests for Measuring Clinical Reasoning

Date
2013-06
Language
American English
Embargo Lift Date
Department
Degree
Ph.D.
Degree Year
2013
Department
Department of Anatomy & Cell Biology
Grantor
Indiana University
Journal Title
Journal ISSN
Volume Title
Found At
Abstract

Purpose: Script concordance tests (SCTs) are assessments purported to measure clinical data interpretation. The aims of this research were to (1) test the psychometric properties of SCT items, (2) directly examine the construct validity of SCTs, and (3) explore the concurrent validity of six SCT scoring methods while also considering validity at the item difficulty and item type levels. Methods: SCT scores from a problem solving SCT (SCT-PS; n=522) and emergency medicine SCT (SCT-EM; n=1040) were used to investigate the aims of this research. An item analysis was conducted to optimize the SCT datasets, to categorize items into levels of difficulty and type, and to test for gender biases. A confirmatory factor analysis tested whether SCT scores conformed to a theorized unidimensional factor structure. Exploratory factor analyses examined the effects of six SCT scoring methods on construct validity. The concurrent validity of each scoring method was also tested via a one-way multivariate analysis of variance (MANOVA) and Pearson’s product moment correlations. Repeated measures analysis of variance (ANOVA) and one-way ANOVA tested the discriminatory power of the SCTs according to item difficulty and type. Results: Item analysis identified no gender biases. A combination of moderate model-fit indices and poor factor loadings from the confirmatory factor analysis suggested that the SCTs under investigation did not conform to a unidimensional factor structure. Exploratory factor analyses of six different scoring methods repeatedly revealed weak factor loadings, and extracted factors consistently explained only a small portion of the total variance. Results of the concurrent validity study showed that all six scoring methods discriminated between medical training levels in spite of lower reliability coefficients on 3-point scoring methods. In addition, examinees as MS4s significantly (p<0.001) outperformed their MS2 SCT scores in all difficulty categories. Cross-sectional analysis of SCT-EM data reported significant differences (p<0.001) between experienced EM physicians, EM residents, and MS4s at each level of difficulty. When considering item type, diagnostic and therapeutic items differentiated between all three training levels, while investigational items could not readily distinguish between MS4s and EM residents. Conclusions: The results of this research contest the assertion that SCTs measure a single common construct. These findings raise questions about the latent constructs measured by SCTs and challenge the overall utility of SCT scores. The outcomes of the concurrent validity study provide evidence that multiple scoring methods reasonably differentiate between medical training levels. Concurrent validity was also observed when considering item difficulty and item type.

Description
Indiana University-Purdue University Indianapolis (IUPUI)
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Rights
Source
Alternative Title
Type
Thesis
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Full Text Available at
This item is under embargo {{howLong}}