Abstract
Empirical studies in psychology commonly report Cronbach's alpha as a measure of internal consistency reliability despite the fact that many methodological studies have shown that Cronbach's ...alpha is riddled with problems stemming from unrealistic assumptions. In many circumstances, violating these assumptions yields estimates of reliability that are too small, making measures look less reliable than they actually are. Although methodological critiques of Cronbach's alpha are being cited with increasing frequency in empirical studies, in this tutorial we discuss how the trend is not necessarily improving methodology used in the literature. That is, many studies continue to use Cronbach's alpha without regard for its assumptions or merely cite methodological articles advising against its use to rationalize unfavorable Cronbach's alpha estimates. This tutorial first provides evidence that recommendations against Cronbach's alpha have not appreciably changed how empirical studies report reliability. Then, we summarize the drawbacks of Cronbach's alpha conceptually without relying on mathematical or simulation-based arguments so that these arguments are accessible to a broad audience. We continue by discussing several alternative measures that make less rigid assumptions which provide justifiably higher estimates of reliability compared to Cronbach's alpha. We conclude with empirical examples to illustrate advantages of alternative measures of reliability including omega total, Revelle's omega total, the greatest lower bound, and Coefficient H. A detailed software appendix is also provided to help researchers implement alternative methods.
Translational Abstract
Scales are commonly used in psychological research to measure directly unobservable constructs like motivation or depression. These scales are comprised of multiple items, each aiming to provide information about various aspects of the construct of interest. Whenever a scale is used in a psychological study, it is important to report on its reliability. Since the 1950s, the primary method for capturing reliability has been Cronbach's alpha, a method whose status is perhaps best exemplified by its place as one of the most cited scientific articles of all-time, in any field. Despite its overwhelming popularity, the underlying assumptions of Cronbach's alpha have been questioned recently in the statistical literature because these assumptions were commonplace 65 years ago but have largely disappeared from more modern statistical methods for constructing scales. Though the ideas in these statistical articles have the potential to significantly alter how psychological research is conducted and reported, recommendations from the statistical literature have yet to permeate the psychological literature. In this article, the goal is to demonstrate why Cronbach's alpha is no longer the optimal method for reporting on reliability. To differentiate this article from articles appearing in the statistical literature, we approach issues with Cronbach's alpha with very little focus on mathematical or computational detail so that the deficiencies of Cronbach's alpha are illustrated in words and examples rather than proofs and simulations so that these ideas can impact a larger group of researchers-namely, the researchers who most often report Cronbach's alpha.
Full text
Available for:
CEKLJ, FFLJ, NUK, ODKLJ, PEFLJ, UPUK
2.
Single Item Measures in Psychological Science Allen, Mark S.; Iliescu, Dragos; Greiff, Samuel
European journal of psychological assessment : official organ of the European Association of Psychological Assessment,
2022, Volume:
38, Issue:
1
Journal Article
Peer reviewed
Single-item measures have a bad reputation. For a long time, adopting single-item measures was considered one of the surest methods of receiving a letter of rejection from journal editors (Wanous et ...al., 1997). As one research team noted, “it is virtually impossible to get a journal article accepted ... unless it includes multiple-item measures of the main constructs” (Bergkvist & Rossiter, 2007, p. 175). However, a series of articles published in the late 1990s and 2000s began to challenge the conventional view that single-item measures are an unsound approach to measuring cognitive and affective outcomes (Bergkvist & Rossiter, 2007; Fuchs & Diamantopoulos, 2009; Jordan & Turner, 2008; Loo, 2002; Nagy, 2002; Wanous et al., 1997). These articles did much to alleviate the stigma surrounding single-item measures, but even today, many researchers remain unconvinced that single-item measures can provide valid and reliable assessments of important psychological phenomena. Of course, there are many instances in which single-item measures would be a poor choice – for example, in research aiming to capture the breadth of human personality or emotion. However, when a construct is unambiguous or narrow in scope, the use of single items can be appropriate and should not necessarily be considered unsound (Wanous et al., 1997). The last few decades have seen a marked increase in the use of large national-level panel data in psychological research. Given the considerable volume of data and the diversity of constructs included in these panel surveys, it is often necessary to measure psychological constructs using just a few or even only one item. For example, the Household, Income and Labour Dynamics in Australia Survey (HILDA; Watson & Wooden, 2021) assesses body weight satisfaction using the single item “How satisfied are you with your current weight?” with response categories of 1 (= very satisfied), 2 (= satisfied), 3 (= neither satisfied nor dissatisfied), 4 (= dissatisfied), and 5 (= very dissatisfied). Although there are multi-item measures of body satisfaction available, on face value, there is no reason to think that this single item does not adequately capture a person’s general satisfaction with their body weight. The increasing use of large panel surveys in psychological research means that now more than ever, it is essential to ensure that single-item measures are valid and reliable. (PsycInfo Database Record (c) 2022 APA, all rights reserved)
Full text
Available for:
CEKLJ, FFLJ, NUK, ODKLJ, PEFLJ, UPUK
Cronbach’s alpha is a statistic commonly quoted by authors to demonstrate that tests and scales that have been constructed or adopted for research projects are fit for purpose. Cronbach’s alpha is ...regularly adopted in studies in science education: it was referred to in 69 different papers published in 4 leading science education journals in a single year (2015)—usually as a measure of reliability. This article explores how this statistic is used in reporting science education research and what it represents. Authors often cite alpha values with little commentary to explain why they feel this statistic is relevant and seldom interpret the result for readers beyond citing an arbitrary threshold for an
acceptable
value. Those authors who do offer readers qualitative descriptors interpreting alpha values adopt a diverse and seemingly arbitrary terminology. More seriously, illustrative examples from the science education literature demonstrate that alpha may be
acceptable
even when there are recognised problems with the scales concerned. Alpha is also sometimes inappropriately used to claim an instrument is unidimensional. It is argued that a high value of alpha offers limited evidence of the
reliability
of a research instrument, and that indeed a very high value may actually be undesirable when developing a test of scientific knowledge or understanding. Guidance is offered to authors reporting, and readers evaluating, studies that present Cronbach’s alpha statistic as evidence of instrument quality.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OBVAL, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
The reliable change index (RCI) is a widely used statistical tool to account for measurement error when evaluating difference scores. However, there is considerable debate regarding its use. Several ...researchers have demonstrated ways that the RCI is insufficient or invalid, and others have defended its use for various applications. The aims of this article are to describe the formulation, rationale, and operationalization of the RCI, and critically evaluate whether it is appropriate when using self-report data, especially in clinical psychology. This evaluation finds that the RCI is rarely the best available method; is easily miscalculated, misinterpreted, and misunderstood; and produces incorrect inferences more often than alternatives, largely because it is highly insensitive to real changes. It is argued that the RCI effectively discourages the collection of appropriate data for longitudinal analysis which would benefit from more than two observations, and many applications of the RCI are inaccurate because they use inappropriate estimates of reliability. Better approaches to determining the reliability of changes are required to meet clinical needs and operationalize research questions. Several alternative methods to conceptualize and operationalize reliability of change and treatment outcomes are presented. While the RCI is easy to use, it is also easy to misuse and it fails to address the central issue: two observations of a noisy measure are insufficient data to estimate change and error. (PsycInfo Database Record (c) 2024 APA, all rights reserved) (Source: journal abstract)
Full text
Available for:
BFBNIB, CEKLJ, FFLJ, FZAB, GIS, IJS, KILJ, NLZOH, NUK, ODKLJ, OILJ, PEFLJ, SBCE, SBMB, UL, UM, UPUK
5.
Reliability From α to ω: A Tutorial Revelle, William; Condon, David M.
Psychological assessment,
12/2019, Volume:
31, Issue:
12
Journal Article
Peer reviewed
Open access
Reliability is a fundamental problem for measurement in all of science. Although defined in multiple ways, and estimated in even more ways, the basic concepts seem straightforward and need to be ...understood by practitioners as well as methodologists. Reliability theory is not just for the psychometrician estimating latent variables, it is for everyone who wants to make inferences from measures of individuals or of groups. For the case of a single test administration, we consider multiple measures of reliability, ranging from the worst (β) to average (α, λ3) to best (λ4) split half reliabilities, and consider why model-based estimates (ωh, ωt) should be reported. We also address the utility of test-retest and alternate form reliabilities. The advantages of immediate versus delayed retests to decompose observed score variance into specific, state, and trait scores are discussed. But reliability is not just for test scores, it is also important when evaluating the use of ratings. Estimates that may be applied to continuous data include a set of intraclass correlations while discrete categorical data needs to take advantage of the family of κ statistics. Examples of these various reliability estimates are given using state and trait measures of anxiety given with different delays and under different conditions. An online supplemental materials is provided with more detail and elaboration. The online supplemental materials is also used to demonstrate applications of open source software to examples of real data, and comparisons are made between the many types of reliability.
Public Significance Statement
A tutorial on the estimation of the reliability of test scores considers classical and model based approaches. Examples using open source software applied to several real world data sets are provided.
Full text
Available for:
CEKLJ, FFLJ, NUK, ODKLJ, PEFLJ, UPUK
Coefficient α, although ubiquitous in the research literature, is frequently criticized for being a poor estimate of test reliability. In this note, we consider the range of α and prove that it has ...no lower bound (i.e., α ∈ ( - ∞, 1). While outlining our proofs, we present algorithms for generating data sets that will yield any fixed value of α in its range. We also prove that for some data sets-even those with appreciable item correlations-α is undefined. Although α is a putative estimate of the correlation between parallel forms, it is not a correlation as α can assume any value below-1 (and α values below 0 are nonsensical reliability estimates). In the online supplemental materials, we provide R code for replicating our empirical findings and for generating data sets with user-defined α values. We hope that researchers will use this code to better understand the limitations of α as an index of scale reliability. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Full text
Available for:
CEKLJ, FFLJ, NUK, ODKLJ, PEFLJ, UPUK
Cognitive tasks that produce reliable and robust effects at the group level often fail to yield reliable and valid individual differences. An ongoing debate among attention researchers is whether ...conflict resolution mechanisms are task-specific or domain-general, and the lack of correlation between most attention measures seems to favor the view that attention control is not a unitary concept. We have argued that the use of difference scores, particularly in reaction time (RT), is the primary cause of null and conflicting results at the individual differences level, and that methodological issues with existing tasks preclude making strong theoretical conclusions. The present article is an empirical test of this view in which we used a toolbox approach to develop and validate new tasks hypothesized to reflect attention processes. Here, we administered existing, modified, and new attention tasks to over 400 participants (final N = 396). Compared with the traditional Stroop and flanker tasks, performance on the accuracy-based measures was more reliable, had stronger intercorrelations, formed a more coherent latent factor, and had stronger associations to measures of working memory capacity and fluid intelligence. Further, attention control fully accounted for the relationship between working memory capacity and fluid intelligence. These results show that accuracy-based measures can be better suited to individual differences investigations than traditional RT tasks, particularly when the goal is to maximize prediction. We conclude that attention control is a unitary concept.
Full text
Available for:
CEKLJ, FFLJ, NUK, ODKLJ, PEFLJ, UPUK
The psychometric soundness of measures has been a central concern of articles published in the Journal of Applied Psychology (JAP) since the inception of the journal. At the same time, it isn’t clear ...that investigators and reviewers prioritize psychometric soundness to a degree that would allow one to have sufficient confidence in conclusions regarding constructs. The purposes of the present article are to (a) examine current scale development and evaluation practices in JAP; (b) compare these practices to recommended practices, previous practices, and practices in other journals; and (c) use these comparisons to make recommendations for reviewers, editors, and investigators regarding the creation and evaluation of measures including Excel-based calculators for various indices. Finally, given that model complexity appears to have increased the need for short scales, we offer a user-friendly R Shiny app (https://orgscience.uncc.edu/about-us/resources) that identifies the subset of items that maximize a variety of psychometric criteria rather than merely maximizing alpha. (PsycInfo Database Record (c) 2022 APA, all rights reserved) (Source: journal abstract)
Full text
Available for:
CEKLJ, FFLJ, NUK, ODKLJ, PEFLJ, UPUK
We introduce and investigate the philosophical concept of 'speciesism' - the assignment of different moral worth based on species membership - as a psychological construct. In five studies, using ...both general population samples online and student samples, we show that speciesism is a measurable, stable construct with high interpersonal differences, that goes along with a cluster of other forms of prejudice, and is able to predict real-world decision-making and behavior. In Study 1 we present the development and empirical validation of a theoretically driven Speciesism Scale, which captures individual differences in speciesist attitudes. In Study 2, we show high test-retest reliability of the scale over a period of four weeks, suggesting that speciesism is stable over time. In Study 3, we present positive correlations between speciesism and prejudicial attitudes such as racism, sexism, homophobia, along with ideological constructs associated with prejudice such as social dominance orientation, system justification, and right-wing authoritarianism. These results suggest that similar mechanisms might underlie both speciesism and other well-researched forms of prejudice. Finally, in Studies 4 and 5, we demonstrate that speciesism is able to predict prosociality towards animals (both in the context of charitable donations and time investment) and behavioral food choices above and beyond existing related constructs. Importantly, our studies show that people morally value individuals of certain species less than others even when beliefs about intelligence and sentience are accounted for. We conclude by discussing the implications of a psychological study of speciesism for the psychology of human-animal relationships.
Full text
Available for:
CEKLJ, FFLJ, NUK, ODKLJ, PEFLJ, UPUK
Propriedades psicométricas da Escala de Conexão Social Silva Soares, Ana Karla; Ferreira Goedert, Maria Celina; Campanhã, Camila ...
Revista interamericana de psicología,
10/2023, Volume:
57, Issue:
2
Journal Article
Peer reviewed
A conexão social (CS) percebida é um construto unidimensional conceituado como a avaliação cognitiva de que existe uma relação próxima com os outros, com os quais o contato pode ser feito. Este ...estudo objetivou adaptar a Escala de Conexão Social (ECS) para o contexto brasileiro, reunindo evidências de adequação psicométrica. Procederam-se dois estudos, o primeiro (N = 285; idade média 24 anos; DP = 4,92; 62% masculino) para obter a estrutura fatorial exploratória e o segundo (N = 300; idade média 23 anos; DP = 5,43; 51% masculino) direcionado a análise confirmatória da unidimensionalidade e dos parâmetros da teoria de resposta ao item. A consistência interna foi satisfatória, assim como as evidências de validade convergente obtidas a partir das correlações negativas com a escala de depressão, ansiedade e estresse. Os resultados revelam que a versão em português da ECS reuniu evidências psicométricas adequadas apoiando seu uso na mensuração da CS.