Despite a lack of intent to discriminate, physicians educated in U.S. medical schools and residency programs often take actions that systematically disadvantage minority patients. The approach to ...assessment of learner performance in medical education can similarly disadvantage minority learners. The adoption of holistic admissions strategies to increase the diversity of medical training programs has not been accompanied by increases in diversity in honor societies, selective residency programs, medical specialties, and medical school faculty. These observations prompt justified concerns about structural and interpersonal bias in assessment. This manuscript characterizes equity in assessment as a “wicked problem” with inherent conflicts, uncertainty, dynamic tensions, and susceptibility to contextual influences. The authors review the underlying individual and structural causes of inequity in assessment. Using an organizational model, they propose strategies to achieve equity in assessment and drive institutional and systemic improvement based on clearly articulated principles. This model addresses the culture, systems, and assessment tools necessary to achieve equitable results that reflect stated principles. Three components of equity in assessment that can be measured and evaluated to confirm success include intrinsic equity (selection and design of assessment tools), contextual equity (the learning environment in which assessment occurs), and instrumental equity (uses of assessment data for learner advancement and selection and program evaluation). A research agenda to address these challenges and controversies and demonstrate reduction in bias and discrimination in medical education is presented.
One of the most important considerations in psychological and educational assessment is the extent to which a test is free of bias and fair for groups with diverse backgrounds. Establishing ...measurement invariance (MI) of a test or items is a prerequisite for meaningful comparisons across groups as it ensures that test items do not function differently across groups. Demonstration of MI is particularly important in assessment settings where test scores are used in decision making. In this review, we begin with an overview of test bias and fairness, followed by a discussion of issues involving group classification, focusing on categorizations of race/ethnicity and sex/gender. We then describe procedures used to establish MI, detailing steps in the implementation of multigroup confirmatory factor analysis, and discussing recent developments in alternative procedures for establishing MI, such as the alignment method and moderated nonlinear factor analysis, which accommodate reconceptualization of group categorizations. Lastly, we discuss a variety of important statistical and conceptual issues to be considered in conducting multigroup confirmatory factor analysis and related methods and conclude with some recommendations for applications of these procedures.
Public Significance Statement
This article highlights some important conceptual and statistical and issues that researchers should consider in research involving MI to maximize the meaningfulness of their results. Additionally, it offers recommendations for conducting MI research with multigroup confirmatory factor analysis and related procedures.
Ongoing transformations in health professions education underscore the need for valid and reliable assessment. The current standard for assessment validation requires evidence from five sources: ...content, response process, internal structure, relations with other variables, and consequences. However, researchers remain uncertain regarding the types of data that contribute to each evidence source. We sought to enumerate the validity evidence sources and supporting data elements for assessments using technology-enhanced simulation. We conducted a systematic literature search including MEDLINE, ERIC, and Scopus through May 2011. We included original research that evaluated the validity of simulation-based assessment scores using two or more evidence sources. Working in duplicate, we abstracted information on the prevalence of each evidence source and the underlying data elements. Among 217 eligible studies only six (3 %) referenced the five-source framework, and 51 (24 %) made no reference to any validity framework. The most common evidence sources and data elements were: relations with other variables (94 % of studies; reported most often as variation in simulator scores across training levels), internal structure (76 %; supported by reliability data or item analysis), and content (63 %; reported as expert panels or modification of existing instruments). Evidence of response process and consequences were each present in <10 % of studies. We conclude that relations with training level appear to be overrepresented in this field, while evidence of consequences and response process are infrequently reported. Validation science will be improved as educators use established frameworks to collect and interpret evidence from the full spectrum of possible sources and elements.
This book is designed to help students review pharmacology and to prepare for both regular course examinations and board examinations. The fourteenth edition has been revised to make such preparation ...as active and efficient as possible. .
The meaningful assessment of competence is critical for the implementation of effective competency-based medical education (CBME). Timely ongoing assessments are needed along with comprehensive ...periodic reviews to ensure that trainees continue to progress. New approaches are needed to optimize the use of multiple assessors and assessments; to synthesize the data collected from multiple assessors and multiple types of assessments; to develop faculty competence in assessment; and to ensure that relationships between the givers and receivers of feedback are appropriate. This paper describes the core principles of assessment for learning and assessment of learning. It addresses several ways to ensure the effectiveness of assessment programs, including using the right combination of assessment methods and conducting careful assessor selection and training. It provides a reconceptualization of the role of psychometrics and articulates the importance of a group process in determining trainees' progress. In addition, it notes that, to reach its potential as a driver in trainee development, quality care, and patient safety, CBME requires effective information management and documentation as well as ongoing consideration of ways to improve the assessment system.
Across all sciences, the quality of measurements is important. Survey measurements are only appropriate for use when researchers have validity evidence within their particular context. Yet, this step ...is frequently skipped or is not reported in educational research. This article briefly reviews the aspects of validity that researchers should consider when using surveys. It then focuses on factor analysis, a statistical method that can be used to collect an important type of validity evidence. Factor analysis helps researchers explore or confirm the relationships between survey items and identify the total number of dimensions represented on the survey. The essential steps to conduct and interpret a factor analysis are described. This use of factor analysis is illustrated throughout by a validation of Diekman and colleagues' goal endorsement instrument for use with first-year undergraduate science, technology, engineering, and mathematics students. We provide example data, annotated code, and output for analyses in R, an open-source programming language and software environment for statistical computing. For education researchers using surveys, understanding the theoretical and statistical underpinnings of survey validity is fundamental for implementing rigorous education research.
PURPOSEThe medical student performance evaluation (MSPE) summarizes a residency applicant’s academic performance. Despite attempts to improve standardized clerkship grading, concerns regarding grade ...inflation and variability at United States medical schools persist. This study’s aim was to describe current patterns of clerkship grading and applicant performance data provided in the MSPE.
METHODThe authors evaluated Electronic Residency Application Service data submitted to a single institution for the 2016–2017 Match cycle. Clerkship grading characteristics regarding grading tiers, school rank, location, and size were obtained. Data regarding methods for summative comparisons such as key word utilization were also extracted. Descriptive statistics were generated, and generalized linear modeling was performed.
RESULTSData were available for 137/140 (98%) MD-granting U.S. medical schools. Pass/fail grading was most commonly used during the preclinical years (47.4%). A 4-tier system was most common for clerkship grading (31%); however, 19 different grading schemes were identified. A median of 34% of students received the highest clerkship grade (range, 5%–97%). Students attending a top 20 medical school were more likely to receive the highest grade compared with those attending lower-rated schools (40% vs 32%, P < .001). Seventy-three percent of schools ranked students, most commonly using descriptive adjectives. Thirty-two different adjectives were used.
CONCLUSIONSThere is significant institutional variation in clinical grading practices and MSPE data. For core clerkships where most students received the highest grade, the ability to distinguish between applicants diminishes. A standardized approach to reporting clinical performance may allow for better comparison of residency applicants.
The relations among various spatial and mathematics skills were assessed in a cross-sectional study of 854 children from kindergarten, third, and sixth grades (i.e., 5 to 13 years of age). Children ...completed a battery of spatial mathematics tests and their scores were submitted to exploratory factor analyses both within and across domains. In the within domain analyses, all of the measures formed single factors at each age, suggesting consistent, unitary structures across this age range. Yet, as in previous work, the 2 domains were highly correlated, both in terms of overall composite score and pairwise comparisons of individual tasks. When both spatial and mathematics scores were submitted to the same factor analysis, the 2 domain specific factors again emerged, but there also were significant cross-domain factor loadings that varied with age. Multivariate regressions replicated the factor analysis and further revealed that mental rotation was the best predictor of mathematical performance in kindergarten, and visual-spatial working memory was the best predictor of mathematical performance in sixth grade. The mathematical tasks that predicted the most variance in spatial skill were place value (K, 3rd, 6th), word problems (3rd, 6th), calculation (K), fraction concepts (3rd), and algebra (6th). Thus, although spatial skill and mathematics each have strong internal structures, they also share significant overlap, and have particularly strong cross-domain relations for certain tasks.