General mental ability (GMA) tests have long been at the heart of the validity-diversity trade-off, with conventional wisdom being that reducing their weight in personnel selection can improve ...adverse impact, but that this results in steep costs to criterion-related validity. However, Sackett et al. (2022) revealed that the criterion-related validity of GMA tests has been considerably overestimated due to inappropriate range restriction corrections. Thus, we revisit the role of GMA tests in the validity-diversity trade-off using an updated meta-analytic correlation matrix of the relationships six selection methods (biodata, GMA tests, conscientiousness tests, structured interviews, integrity tests, and situational judgment tests) have with job performance, along with their Black-White mean differences. Our results lead to the conclusion that excluding GMA tests generally has little to no effect on validity, but substantially decreases adverse impact. Contrary to popular belief, GMA tests are not a driving factor in the validity-diversity trade-off. This does not fully resolve the validity-diversity trade-off, though: Our results show there is still some validity reduction required to get to an adverse impact ratio of .80, although the validity reduction is less than previously thought. Instead, it shows that the validity-diversity trade-off conversation should shift from the role of GMA tests to that of other selection methods. The present study also addresses which selection methods now emerge as most valid and whether composites of selection methods can result in validities similar to those expected prior to Sackett et al. (2022). (PsycInfo Database Record (c) 2024 APA, all rights reserved).
A norm-referenced score expresses the position of an individual test taker in the reference population, thereby enabling a proper interpretation of the test score. Such normed scores are derived from ...test scores obtained from a sample of the reference population. Typically, multiple reference populations exist for a test, namely when the norm-referenced scores depend on individual characteristic(s), as age (and sex). To derive normed scores, regression-based norming has gained large popularity. The advantages of this method over traditional norming are its flexible nature, yielding potentially more realistic norms, and its efficiency, requiring potentially smaller sample sizes to achieve the same precision. In this tutorial, we introduce the reader to regression-based norming, using the generalized additive models for location, scale, and shape (GAMLSS). This approach has been useful in norm estimation of various psychological tests. We discuss the rationale of regression-based norming, theoretical properties of GAMLSS and their relationships to other regression-based norming models. Based on 6 steps, we describe how to: (a) design a normative study to gather proper normative sample data; (b) select a proper GAMLSS model for an empirical scale; (c) derive the desired normed scores for the scale from the fitted model, including those for a composite scale; and (d) visualize the results to achieve insight into the properties of the scale. Following these steps yields regression-based norms with GAMLSS for a psychological test, as we illustrate with normative data of the intelligence test IDS-2. The complete R code and data set is provided as online supplemental material.
Translational Abstract
Standardized psychological tests are widely used. Examples include intelligence, developmental, and neuropsychological tests. They are used for purposes as monitoring, selection, and diagnosing individuals. High-quality standardized tests have normed scores, like the well-known IQ scores for intelligence tests. Normed scores allow for properly interpreting an individual's test score. They are derived in the test construction phase, based on scores in a large normative sample. Normed scores express the position of an individual test taker in the reference population. The reference population for a test typically depends on individual characteristic(s), like age and possibly sex. This tutorial introduces the reader to a method to compute normed scores that depend on individual characteristic(s), making optimal use of all background knowledge and the scores in the whole normative sample. Therefore, the method yields potentially more realistic norms, and more precise norms than traditional methods, using the same amount of data. This is an important asset, because gathering sufficient data is difficult and costly. In this tutorial, we explain the technical background of the method, called regression-based norming with the generalized additive models for location, scale, and shape (GAMLSS), and explain how to apply it based on six steps. Following these steps yield regression-based norms with GAMLSS for a psychological test, as we illustrate with normative data of the intelligence test IDS-2. The complete R code and data set is provided as online supplemental material, so that test developers can apply the method to derive high-quality norms for their own test.
The Flynn Effect: A Meta-Analysis Trahan, Lisa H; Stuebing, Karla K; Fletcher, Jack M ...
Psychological bulletin,
09/2014, Letnik:
140, Številka:
5
Journal Article
Recenzirano
Odprti dostop
The Flynn effect refers to the observed rise in IQ scores over time, which results in norms obsolescence. Although the Flynn effect is widely accepted, most efforts to estimate it have relied upon ..."scorecard" approaches that make estimates of its magnitude and error of measurement controversial and prevent determination of factors that moderate the Flynn effect across different IQ tests. We conducted a meta-analysis to determine the magnitude of the Flynn effect with a higher degree of precision, to determine the error of measurement, and to assess the impact of several moderator variables on the mean effect size. Across 285 studies (N = 14,031) since 1951 with administrations of 2 intelligence tests with different normative bases, the meta-analytic mean was 2.31, 95% CI 1.99, 2.64, standard score points per decade. The mean effect size for 53 comparisons (N = 3,951, excluding 3 atypical studies that inflate the estimates) involving modern (since 1972) Stanford-Binet and Wechsler IQ tests (2.93, 95% CI 2.3, 3.5, IQ points per decade) was comparable to previous estimates of about 3 points per decade but was not consistent with the hypothesis that the Flynn effect is diminishing. For modern tests, study sample (larger increases for validation research samples vs. test standardization samples) and order of administration explained unique variance in the Flynn effect, but age and ability level were not significant moderators. These results supported previous estimates of the Flynn effect and its robustness across different age groups, measures, samples, and levels of performance.
In his review of the literature on models and measures of emotional intelligence (EI), Ackley (2016) did not include enhancements to the Emotional Competency Inventory (ECI) since 2006, when it was ...revised and renamed the Emotional and Social Competency Inventory (ESCI). In 2006, the test was substantially revised and improved. The ESCI reflects that the instrument measured not just the intrapersonal recognition and management of one's own emotions but also how they influence interpersonal interactions with other people, the recognition and management of others' emotions. Both the Other version (completed by informants) and the self-assessment version demonstrated appropriate factor loadings for each item on each scale in exploratory factor analyses, model fit to rigorous standards for each scale in confirmatory factor analyses, and convergent and discriminant validity against appropriate criteria for each scale within each version. The ESCI, as completed by others, predicted leadership effectiveness in a number of studies using various dependent measures. These studies are important because several of them showed that behavioral ESCI is a more powerful predictor of real-world outcomes than g and personality. We call this a Stream 4 or behavioral level measure of EI. It does not rely on self-assessment. A more comprehensive view of EI would include multiple levels of EI and distinguish behavioral measures. The ESCI is used in training programs, coaching, undergraduate and graduate courses in many disciplines worldwide to help develop more EI to dramatic effectiveness on the many longitudinal studies published.
Due to physical distancing guidelines, the closure of nonessential businesses, and the closure of public schools, the role of telehealth for the delivery of psychological services for children has ...never been more debated. However, the transition to teleassessment is more complicated for some types of assessment than others. For instance, the remote administration of achievement and intelligence tests is a relatively recent adaptation of telehealth, and despite recommendations for rapid adoption by some policymakers and publishing companies, caution and careful consideration of individual and contextual variables and the existing research literature, as well as measurement, cultural and linguistic, and legal and ethical issues, is warranted. The decision to use remotely administered achievement and intelligence tests is best made on a case-by-case basis after consideration of these factors. We discuss each of these issues as well as implications for practice and policy, as well as issue provisional guidance for consideration for publishing companies interested in these endeavors moving forward.
Public Significance Statement
The current review describes a number of factors that may reduce the accuracy of standardized tests, like intelligence tests, when they are given remotely. Additionally, it highlights the importance of considering the purpose of assessment, client cultural and linguistic background, as well as ethical and legal decision making, on the use and interpretation of standardized test results.
James R. Flynn (1934-2020) Ceci, Stephen; Farley, Frank
The American psychologist,
01/2022, Letnik:
77, Številka:
1
Journal Article
Recenzirano
Memorializes James (Jim) R. Flynn (1934-2020). Jim was one of psychology's most influential thinkers even though he was not a psychologist, receiving a PhD in 1958 in politics and moral philosophy at ...the University of Chicago. At the time of his death, Jim was Emeritus Professor of Politics at the University of Otago in New Zealand. Over the course of his career, Jim amassed several high impact papers and books, including his two
articles, "Massive IQ Gains in 14 Nations: What IQ Tests Really Measure" and "The Mean IQ of Americans: Massive Gains 1932-1978." In 2019, in
, he mounted a brilliant defense of free speech. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Early research that relied on standardized assessments of intelligence reported negative effects of bilingualism for children, but a study by Peal and Lambert (1962) reported better performance by ...bilingual than monolingual children on verbal and nonverbal intelligence tests. This outcome led to the view that bilingualism was a positive experience. However, subsequent research abandoned intelligence tests as the assessment tool and evaluated performance on cognitive tasks, making the research after Peal and Lambert qualitatively different from that before their landmark study, creating a disconnect between the new and earlier research. These newer cognitive studies showed both positive effects of bilingualism and no differences between language groups. But why were Peal and Lambert's results so different from previous studies that were also based on intelligence tests? The present study analyzed data from verbal and nonverbal intelligence tests that were collected from 6,077 participants across 79 studies in which intelligence tests were administered as background measures to various cognitive tasks. By including adults, the study extends the results across the life span. On standardized verbal tests, monolinguals outperformed bilinguals, but on nonverbal measures of intelligence, there were no differences between language groups. These results, which are different from those reported by Peal and Lambert, are used to reinterpret their findings in terms of the sociolinguistic, political, and cultural context in which the Peal and Lambert study was conducted and the relevance of those factors for all developmental research.
Reports an error in "Revisiting Carroll's survey of factor-analytic studies: Implications for the clinical assessment of intelligence" by Nicholas F. Benson, A. Alexander Beaujean, Ryan J. McGill and ...Stefan C. Dombrowski (
, Advanced Online Publication, May 24, 2018, np). In the article "Revisiting Carroll's Survey of Factor-Analytic Studies: Implications for the Clinical Assessment of Intelligence," by Nicholas F. Benson, A. Alexander Beaujean, Ryan J. McGill, and Stefan C. Dombrowski (
, Advance online publication, May 24, 2018, http://dx.doi.org/10.1037/pas0000556), the majority of values in the ωH and ωHS columns of Table 4 were incorrect and have been amended. These revisions required text in the fourth paragraph of the Results section to be changed from "Moreover, the ωHS value for
is relatively high and very close to the and ωH values for g" to "Moreover, the ωHS values for
and
are relatively high, exceeding the ω and ωH values for g." All versions of this article have been corrected. (The following abstract of the original article appeared in record 2018-23627-001.) John Carroll's three-stratum theory (and the decades of research behind its development) is foundational to the contemporary practice of intellectual assessment. The present study addresses some limitations of Carroll's work: specification, reproducibility with more modern methods, and interpretive relevance. We reanalyzed select data sets from Carroll's survey of factor analytic studies using confirmatory factor analysis as well as modern indices of interpretive relevance. For the majority of data sets, we found that Carroll likely extracted too many factors representing Stratum II abilities. Moreover, almost all factors representing Stratum II abilities had little-to-no interpretive relevance above and beyond that of general intelligence. We conclude by discussing the implications of this research with respect to the interpretive relevance and clinical utility of scores reflecting cognitive abilities at all strata of the three-stratum theory and offer some directions for future research. (PsycINFO Database Record
The Web-Based Assessment of Mental Speed Gnambs, Timo
European journal of psychological assessment : official organ of the European Association of Psychological Assessment,
09/2023, Letnik:
39, Številka:
5
Journal Article
Recenzirano
Odprti dostop
Although web-based cognitive assessments have gained
increasing attention in recent decades, it is still debated whether
unstandardized test settings allow for comparable measurements as compared to
...proctored testing, particularly for speeded cognitive tests. Therefore, two
within-subject experiments (N = 73 and
N = 72) compared differences in means, criterion
correlations with measures of intelligence, and subjective test quality
perceptions of a trail-making test between a proctored paper-based, a proctored
computerized, and an unproctored web-based administration mode. The results in
both samples showed equivalent means between the two computerized modes,
equivalent criterion correlations between the three modes, and no differential
item functioning. However, the web-based tests were rated as having an inferior
measurement quality as compared to the proctored assessments. Thus, web-based
testing allows for comparable measurements of mental speed as compared to
traditional computerized tests, although it is still regarded as a lower quality
medium by test takers.
Due to their high item difficulties and excellent
psychometric properties, construction-based figural matrices tasks are of
particular interest when it comes to high-stakes testing. An important
...prerequisite is that test preparation - which is likely to occur in this
context - does not impair test fairness or item properties. The goal of
this study was to provide initial evidence concerning the influence of test
preparation. We administered test items to a sample of
N = 882 participants divided into two groups, but
only one group was given information about the rules employed in the test items.
The probability of solving the items was significantly higher in the test
preparation group than in the control group
(M = 0.61,
SD = 0.19 vs.
M = 0.41,
SD = 0.25;
t(54) = 3.42, p = .001;
d = .92). Nevertheless, a multigroup
confirmatory factor analysis, as well as a differential item functioning
analysis, indicated no differences between the item properties in the two
groups. The results suggest that construction-based figural matrices are
suitable in the context of high-stakes testing when all participants are
provided with test preparation material so that test fairness is ensured.