Abstract
Statistical practice in psychological science is undergoing reform which is reflected in part by strong recommendations for reporting and interpreting effect sizes and their confidence ...intervals. We present principles and recommendations for research reporting and emphasize the variety of ways effect sizes can be reported. Additionally, we emphasize interpreting and reporting unstandardized effect sizes because of common misconceptions regarding standardized effect sizes which we elucidate. Effect sizes should directly answer their motivating research questions, be comprehensible to the average reader, and be based on meaningful metrics of their constituent variables. We illustrate our recommendations with empirical examples involving a One-way ANOVA, a categorical variable analysis, an interaction effect in linear regression, and a simple mediation model, emphasizing the interpretation of effect sizes.
Translational Abstract
We present general principles of good research reporting, elucidate common misconceptions about standardized effect sizes, and provide recommendations for good research reporting. Effect sizes should directly answer their motivating research questions, be comprehensible to the average reader, and be based on meaningful metrics of their constituent variables. We illustrate our recommendations with four different empirical examples involving popular statistical methods such as ANOVA, categorical variable analysis, multiple linear regression, and simple mediation; these examples serve as a tutorial to enhance practice in the research reporting of effect sizes.
Measurement quality has recently been highlighted as an important concern for advancing a cumulative psychological science. An implication is that researchers should move beyond mechanistically ...reporting coefficient alpha toward more carefully assessing the internal structure and reliability of multi-item scales. Yet a researcher may be discouraged upon discovering that a prominent alternative to alpha, namely, coefficient omega, can be calculated in a variety of ways. In this Tutorial, I alleviate this potential confusion by describing alternative forms of omega and providing guidelines for choosing an appropriate omega estimate pertaining to the measurement of a target construct represented with a confirmatory factor analysis model. Several applied examples demonstrate how to compute different forms of omega in R.
Critics have long pointed to the importance of effect sizes for remedying the problems that a mechanistic focus on null hypothesis significance testing (NHST) has caused for psychological science. ...Following a brief discussion of the meaning of the term effect size, we describe how the same issues stemming from an overreliance on NHST (i.e., publication bias, p-hacking, and researcher degrees of freedom) that led to a "replication crisis" have also impacted effect size accuracy and interpretation. Next, we describe the central role that effect sizes play in efforts to overcome the replication crisis and revitalize psychology as a cumulative science. Specifically, we emphasize the importance of effect size interpretation and the place of effect sizes in sample size planning, replication, and meta-analysis. We conclude that focusing on effect sizes can serve a cumulative psychological science to the extent that they serve statistical thinking, which values contextualization based on researcher expertise over mechanical statistical rituals.
Les critiques ont longtemps souligné l'importance de la taille de l'effet pour remédier aux problèmes qu'une focalisation mécaniste sur les tests de signification de l'hypothèse nulle (NHST) a entraînés pour la science psychologique. Après une brève discussion sur la signification du terme taille de l'effet, nous décrivons comment les mêmes problèmes découlant d'une dépendance excessive aux NHST (p. ex., les biais de publication, le p-hacking, et les degrés de liberté des chercheurs) qui ont mené à une « crise de réplication » ont également eu une incidence sur l'exactitude et l'interprétation de la taille de l'effet. Ensuite, nous décrivons le rôle central que joue la taille de l'effet dans les efforts visant à surmonter la crise de réplication et à revitaliser la psychologie comme science cumulative. Plus précisément, nous mettons l'accent sur l'importance de l'interprétation de la taille de l'effet et de la place de la taille de l'effet dans la planification, la réplication et la méta-analyse de la taille de l'échantillon. Nous en concluons que le fait de se concentrer sur la taille des effets peut servir à la science psychologique cumulative dans la mesure où ces derniers servent la pensée statistique, qui valorise la contextualisation fondée sur l'expertise de chercheurs en matière de rituels statistiques mécaniques.
Public Significance Statement
The reliability of psychological research has been called into question because a range of prominent findings have not been replicated by follow-up studies, leading to a so-called replication crisis. Certain practices regarding data analysis and interpretation of results, centered around null-hypothesis significance testing, are responsible for this situation. The current article discusses how an increased focus on effect size interpretation can help psychology progress as a cumulative science to the extent that such interpretation values contextualization based on researcher expertise over mechanical statistical routine.
There are many high-quality resources available which describe best practices in the implementation of both exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). Yet, partly owing ...to the complexity of these procedures, confusion persists among psychologists with respect to the implementation of EFA and CFA. Primary among these misunderstandings is the very mathematical distinction between EFA and CFA. The current paper uses a brief example to illustrate the difference between the statistical models underlying EFA and CFA, both of which are particular instantiations of the more general common factor model. Next, important considerations for the implementation of EFA and CFA discussed in this paper include the need to account for the categorical nature of item-level observed variables in factor analyses, the use of factor analysis in studies of the psychometric properties of new tests or questionnaires and previously developed tests, decisions about whether to use EFA or CFA in these contexts, and the importance of replication of factor analytic models in the ongoing pursuit of validation.
De nombreuses ressources de haute qualité existent pour décrire les meilleures pratiques en matière de mise en œuvre de l'analyse factorielle exploratoire (AFE) et de l'analyse factorielle confirmatoire (AFC). Or, en partie dû à la complexité de ces procédures, une certaine confusion persiste entre les psychologues quant à la mise en œuvre de l'AFE et de l'AFC. L'une des principales sources de ces malentendus réside dans la distinction mathématique entre l'AFE et l'AFC. Le présent article utilise un bref exemple pour illustrer la différence entre les modèles statistiques sous-jacents à l'AFE et l'AFC, lesquels sont tous deux des instanciations particulières du modèle factoriel plus général. Ensuite, d'importantes considérations relatives à la mise en œuvre de l'AFE et de l'AFC, abordées dans le présent article, incluent la nécessité de tenir compte de la nature catégorique de variables observées au niveau des items dans les analyses factorielles, l'utilisation de l'analyse factorielle dans l'étude de propriétés psychométriques de nouveaux tests ou questionnaires et de tests élaborés dans le passé, des décisions quant à la procédure la plus appropriée - soit l'AFE ou l'AFC - dans ces contextes et l'importance de la reproduction de modèles d'analyse factorielle dans la poursuite de la validation en cours.
The concept of contaminated mindware provides one conceptualization for measuring beliefs and attitudes about three domains that have evaluation-disabling properties in the context of reasoning: ...paranormal beliefs, conspiracy beliefs, and anti-science attitudes. We tested the underlying structure of individual differences in these three domains of contaminated mindware and their predictors in a sample of 321 Canadian undergraduate students. The predictors included cognitive ability, cognitive reflection, the dispositional tendency of actively open-minded thinking, and ontological confusions. A hierarchical model with three correlated general factors of paranormal, conspiracy, and anti-science beliefs and attitudes and four specific paranormal factors (i.e., psi, superstition, spiritualism, and precognition) was optimal. While all predictors were significantly correlated with the contaminated mindware domains, structural equation modeling results supported the unique effects of ontological confusions and actively open-minded thinking. The current results support the multidimensional nature of contaminated mindware domains and highlight some of its correlates and unique predictors. Providing a structure and theoretical framework for unwarranted beliefs and attitudes will be useful for measuring their potential impact on the processes of human reasoning.
Celotno besedilo
Dostopno za:
BFBNIB, DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
Confirmatory factor analysis (CFA) is widely used for examining hypothesized relations among ordinal variables (e.g., Likert-type items). A theoretically appropriate method fits the CFA model to ...polychoric correlations using either weighted least squares (WLS) or robust WLS. Importantly, this approach assumes that a continuous, normal latent process determines each observed variable. The extent to which violations of this assumption undermine CFA estimation is not well-known. In this article, the authors empirically study this issue using a computer simulation study. The results suggest that estimation of polychoric correlations is robust to modest violations of underlying normality. Further, WLS performed adequately only at the largest sample size but led to substantial estimation difficulties with smaller samples. Finally, robust WLS performed well across all conditions.
The Trauma Symptom Checklist–40 (TSC-40) is commonly used in clinical research to index history of childhood maltreatment and assess complex trauma symptomatology in adults. Yet the dimensional ...structure of this measure has not been examined. We examined the factor structure of the TSC-40 in a sample of 706 undergraduate students, measurement invariance of the TSC-40 across groups with or without a history of abuse-related and multiple trauma, and the association between the TSC-40 and other trauma indices. A higher order model of complex trauma symptomatology was optimal. The higher order model also demonstrated strong measurement invariance across participants with or without abuse-related and multiple trauma histories. The current findings support the dimensional structure of the TSC-40, as well as extending and revising its subscale composition. This study provided support for using the TSC-40 to measure trauma symptoms across groups exposed to different and multiple types of trauma and provided further evidence for the construct of complex trauma symptomatology.
It Might Not Make a Big DIF Chalmers, R. Philip; Counsell, Alyssa; Flora, David B.
Educational and psychological measurement,
02/2016, Letnik:
76, Številka:
1
Journal Article
Recenzirano
Odprti dostop
Differential test functioning, or DTF, occurs when one or more items in a test demonstrate differential item functioning (DIF) and the aggregate of these effects are witnessed at the test level. In ...many applications, DTF can be more important than DIF when the overall effects of DIF at the test level can be quantified. However, optimal statistical methodology for detecting and understanding DTF has not been developed. This article proposes improved DTF statistics that properly account for sampling variability in item parameter estimates while avoiding the necessity of predicting provisional latent trait estimates to create two-step approximations. The properties of the DTF statistics were examined with two Monte Carlo simulation studies using dichotomous and polytomous IRT models. The simulation results revealed that the improved DTF statistics obtained optimal and consistent statistical properties, such as obtaining consistent Type I error rates. Next, an empirical analysis demonstrated the application of the proposed methodology. Applied settings where the DTF statistics can be beneficial are suggested and future DTF research areas are proposed.
Measurement Invariance (MI) is often concluded from a nonsignificant chi-square difference test. Researchers have also proposed using change in goodness-of-fit indices (
GOFs) instead. Both of these ...commonly used methods for testing MI have important limitations. To combat these issues, To combat these issues, it was proposed using an equivalence test (EQ) to replace the chi-square difference test commonly used to test MI. Due to concerns with the EQ's power, and adjusted version (EQ-A) was created, but provides little evaluation of either procedure. The current study evaluated the Type I error and power of both the EQ and EQ-A, and compared their performance to that of the traditional chi-square difference test and
GOFs. The EQ was the only procedure that maintained empirical error rates below the nominal alpha level. Results also highlight that the EQ requires larger sample sizes than traditional difference-based approaches or using equivalence bounds based on larger than conventional RMSEA values (e.g., > .05) to ensure adequate power rates. We do not recommend the proposed adjustment (EQ-A) over the EQ.
Piecewise latent trajectory models for longitudinal data are useful in a wide variety of situations, such as when a simple model is needed to describe nonlinear change, or when the purpose of the ...analysis is to evaluate hypotheses about change occurring during a particular period of time within a model for a longer overall time frame, such as change that occurs following onset of a treatment or some other event. However, the specification of various forms of piecewise models has not been fully explicated for the structural equation modeling (SEM) framework. This article describes piecewise models as a straightforward extension of the basic SEM model for linear growth, which makes them relatively easy both to specify and to interpret. After presenting models for 2 linear slopes (or pieces) in detail, the article discusses extensions that include additional linear slopes (i.e., a 3-piece model) or a quadratic factor (i.e., a hybrid linear-quadratic model).