Measurement is fundamental to all research in psychology and should be accorded greater scrutiny than typically occurs. Among other claims, McNeish and Wolf (Thinking twice about sum scores.
Behavior ...Research Methods
,
52
, 2287-2305) argued that use of sum scores (a) implies that a highly constrained latent variable model underlies items comprising a scale, and (b) may misrepresent or bias relations with other criteria. The central claim by McNeish and Wolf that use of sum scores requires the assumption that a parallel test model underlies item responses is incorrect and without psychometric merit. Instead, if a set of items is unidimensional, estimators of reliability are available even if the factor model underlying the set of items does not have a highly constrained form. Thus, dimensionality of a set of items is the key issue, and whether strict constraints on parameter estimates do or do not hold dictate the appropriate way to estimate reliability. McNeish and Wolf also claimed that more precise forms of scoring, such as estimating factor scores, would be preferable to sum scores. We provide analytic bases for reliability estimation and then provide several demonstrations of reliability estimation and the relative advantages of sum scores and factor scores. We contend that several claims by McNeish and Wolf are questionable and that, as a result, multiple recommendations they made and conclusions they drew are incorrect. The upshot is that, once the dimensional structure of a set of items is verified, sum scores often have a solid psychometric basis and therefore are frequently quite adequate for psychological research.
Indices of cumulative risk (CR) have long been used in developmental research to encode the number of risk factors a child or adolescent experiences that may impede optimal developmental outcomes. ...Initial contributions concentrated on indices of cumulative environmental risk; more recently, indices of cumulative genetic risk have been employed. In this article, regression analytic methods are proposed for interrogating strongly the validity of risk indices by testing optimality of compositing weights, enabling more informative modeling of effects of CR indices. Reanalyses of data from two studies are reported. One study involved 10 environmental risk factors predicting Verbal IQ in 215 four-year-old children. The second study included an index of genetic CR in a G×E interaction investigation of 281 target participants assessed at age 15 years and then again at age 31 years for observed hostility during videotaped interactions with close family relations. Principles to guide evaluation of results of statistical modeling are presented, and implications of results for research and theory are discussed. The ultimate goals of this paper are to develop stronger tests of conjectures involving CR indices and to promote methods for improving replicability of results across studies.
We examined the life-span development of self-esteem and tested whether self-esteem influences the development of important life outcomes, including relationship satisfaction, job satisfaction, ...occupational status, salary, positive and negative affect, depression, and physical health. Data came from the Longitudinal Study of Generations. Analyses were based on 5 assessments across a 12-year period of a sample of 1,824 individuals ages 16 to 97 years. First, growth curve analyses indicated that self-esteem increases from adolescence to middle adulthood, reaches a peak at about age 50 years, and then decreases in old age. Second, cross-lagged regression analyses indicated that self-esteem is best modeled as a cause rather than a consequence of life outcomes. Third, growth curve analyses, with self-esteem as a time-varying covariate, suggested that self-esteem has medium-sized effects on life-span trajectories of affect and depression, small to medium-sized effects on trajectories of relationship and job satisfaction, a very small effect on the trajectory of health, and no effect on the trajectory of occupational status. These findings replicated across 4 generations of participants-children, parents, grandparents, and their great-grandparents. Together, the results suggest that self-esteem has a significant prospective impact on real-world life experiences and that high and low self-esteem are not mere epiphenomena of success and failure in important life domains.
Common factor analysis (FA) and principal component analysis (PCA) are commonly used to obtain lower-dimensional representations of matrices of correlations among manifest variables. Whereas some ...experts argue that differences in results from use of FA and PCA are small and relatively unimportant in empirical studies, the fundamental rationales for the two methods are very different. Here, FA and PCA are contrasted on four key issues: the range of possible dimensional loadings, the range of potential correlations among dimensions, the structure of residual covariances and correlations, and the relation between population parameters and the correlational structures with which they are associated. For decades, experts have emphasized indeterminacies of the FA model, particularly indeterminacy of common factor scores. Determinate in most respects, a heretofore unacknowledged, pernicious indeterminacy of PCA is demonstrated: the indeterminacy between PCA structural representations and the correlational structures from which they are derived. Researchers are often advised to use either FA or PCA in exploratory rounds of data analysis to understand and refine the dimensional structure of a domain before moving to Structural Equation Modeling in later theory-testing, confirmatory, replication studies. Results from the current study suggest that PCA is an unreliable method to use for such purposes and may lead to serious misrepresentation of the structure of a domain. Hence, PCA should never be used if the goal is to understand and represent the latent structure of a domain; only FA techniques should be used for this purpose, as only FA provides reliable structural representations as the basis for confirmatory tests in future studies.
The import or force of the result of a statistical test has long been portrayed as consistent with deductive reasoning. The simplest form of deductive argument has a first premise with conditional ...form, such as p→q, which means that “if p is true, then q must be true.” Given the first premise, one can either affirm or deny the antecedent clause (p) or affirm or deny the consequent claim (q). This leads to four forms of deductive argument, two of which are valid forms of reasoning and two of which are invalid. The typical conclusion is that only a single form of argument—denying the consequent, also known as modus tollens—is a reasonable analog of decisions based on statistical hypothesis testing. Now, statistical evidence is never certain, but is associated with a probability (i.e., a p-level). Some have argued that modus tollens, when probabilified, loses its force and leads to ridiculous, nonsensical conclusions. Their argument is based on specious problem setup. This note is intended to correct this error and restore the position of modus tollens as a valid form of deductive inference in statistical matters, even when it is probabilified.
The relative advantages and disadvantages of sum scores and estimated factor scores are issues of concern for substantive research in psychology. Recently, while championing estimated factor scores ...over sum scores, McNeish offered a trenchant rejoinder to an article by Widaman and Revelle, which had critiqued an earlier paper by McNeish and Wolf. In the recent contribution, McNeish misrepresented a number of claims by Widaman and Revelle, rendering moot his criticisms of Widaman and Revelle. Notably, McNeish chose to avoid confronting a key strength of sum scores stressed by Widaman and Revelle—the greater comparability of results across studies if sum scores are used. Instead, McNeish pivoted to present a host of simulation studies to identify relative strengths of estimated factor scores. Here, we review our prior claims and, in the process, deflect purported criticisms by McNeish. We discuss briefly issues related to simulated data and empirical data that provide evidence of strengths of each type of score. In doing so, we identified a second strength of sum scores: superior cross-validation of results across independent samples of empirical data, at least for samples of moderate size. We close with consideration of four general issues concerning sum scores and estimated factor scores that highlight the contrasts between positions offered by McNeish and by us, issues of importance when pursuing applied research in our field.
Millions of children worldwide experience acute medical events. Children’s responses to these events range from transient distress to significant posttraumatic stress disorder symptoms (PTSS). While ...many models suggest explanations for the development and maintenance of PTSS in adults, very few have focused on children. Current models of child PTSS are primarily restricted to the post-trauma period, thus neglecting the critical peri-trauma period when screening and preventive interventions may be most easily implemented. Research on PTSS in response to pediatric medical trauma typically examines predictors in isolation, often overlooking potentially important interactions. This paper proposes a new model utilizing the bio-psycho-social framework and focusing on peri-trauma processes of acute medical events. Understanding the relationships among bio-psycho-social factors during peri-trauma can inform early identification of at-risk children, preventive interventions and clinical care. Recommendations for future research, including the need to examine PTSS in the context of multiple influences, are discussed.
Background
Most gene‐environment interaction (GXE) research, though based on clear, vulnerability‐oriented hypotheses, is carried out using exploratory rather than hypothesis‐informed statistical ...tests, limiting power and making formal evaluation of competing GXE propositions difficult.
Method
We present and illustrate a new regression technique which affords direct testing of theory‐derived predictions, as well as competitive evaluation of alternative diathesis‐stress and differential‐susceptibility propositions, using data on the moderating effect of DRD4 with regard to the effect of childcare quality on children's social functioning.
Results
Results show that (a) the new approach detects interactions that the traditional one does not; (b) the discerned GXE fit the differential‐susceptibility model better than the diathesis‐stress one; and (c) a strong rather than weak version of differential susceptibility is empirically supported.
Conclusion
The new method better fits the theoretical ‘glove’ to the empirical ‘hand,’ raising the prospect that some failures to replicate GXE results may derive from standard statistical approaches being less than ideal.
In this study, the authors consider several indices to indicate whether multidimensional data are “unidimensional enough” to fit with a unidimensional measurement model, especially when the goal is ...to avoid excessive bias in structural parameter estimates. They examine two factor strength indices (the explained common variance and omega hierarchical) and several model fit indices (root mean square error of approximation, comparative fit index, and standardized root mean square residual). These statistics are compared in population correlation matrices determined by known bifactor structures that vary on the (a) relative strength of general and group factor loadings, (b) number of group factors, and (c) number of items or indicators. When fit with a unidimensional measurement model, the degree of structural coefficient bias depends strongly and inversely on explained common variance, but its effects are moderated by the percentage of correlations uncontaminated by multidimensionality, a statistic that rises combinatorially with the number of group factors. When the percentage of uncontaminated correlations is high, structural coefficients are relatively unbiased even when general factor strength is low relative to group factor strength. On the other hand, popular structural equation modeling fit indices such as comparative fit index or standardized root mean square residual routinely reject unidimensional measurement models even in contexts in which the structural coefficient bias is low. In general, such statistics cannot be used to predict the magnitude of structural coefficient bias.
The relative advantages and disadvantages of sum scores and estimated factor scores are issues of concern for substantive research in psychology. Recently, while championing estimated factor scores ...over sum scores, McNeish offered a trenchant rejoinder to an article by Widaman and Revelle, which had critiqued an earlier paper by McNeish and Wolf. In the recent contribution, McNeish misrepresented a number of claims by Widaman and Revelle, rendering moot his criticisms of Widaman and Revelle. Notably, McNeish chose to avoid confronting a key strength of sum scores stressed by Widaman and Revelle—the greater comparability of results across studies if sum scores are used. Instead, McNeish pivoted to present a host of simulation studies to identify relative strengths of estimated factor scores. Here, we review our prior claims and, in the process, deflect purported criticisms by McNeish. We discuss briefly issues related to simulated data and empirical data that provide evidence of strengths of each type of score. In doing so, we identified a second strength of sum scores: superior cross-validation of results across independent samples of empirical data, at least for samples of moderate size. We close with consideration of four general issues concerning sum scores and estimated factor scores that highlight the contrasts between positions offered by McNeish and by us, issues of importance when pursuing applied research in our field.