A norm-referenced score expresses the position of an individual test taker in the reference population, thereby enabling a proper interpretation of the test score. Such normed scores are derived from ...test scores obtained from a sample of the reference population. Typically, multiple reference populations exist for a test, namely when the norm-referenced scores depend on individual characteristic(s), as age (and sex). To derive normed scores, regression-based norming has gained large popularity. The advantages of this method over traditional norming are its flexible nature, yielding potentially more realistic norms, and its efficiency, requiring potentially smaller sample sizes to achieve the same precision. In this tutorial, we introduce the reader to regression-based norming, using the generalized additive models for location, scale, and shape (GAMLSS). This approach has been useful in norm estimation of various psychological tests. We discuss the rationale of regression-based norming, theoretical properties of GAMLSS and their relationships to other regression-based norming models. Based on 6 steps, we describe how to: (a) design a normative study to gather proper normative sample data; (b) select a proper GAMLSS model for an empirical scale; (c) derive the desired normed scores for the scale from the fitted model, including those for a composite scale; and (d) visualize the results to achieve insight into the properties of the scale. Following these steps yields regression-based norms with GAMLSS for a psychological test, as we illustrate with normative data of the intelligence test IDS-2. The complete R code and data set is provided as online supplemental material.
Translational Abstract
Standardized psychological tests are widely used. Examples include intelligence, developmental, and neuropsychological tests. They are used for purposes as monitoring, selection, and diagnosing individuals. High-quality standardized tests have normed scores, like the well-known IQ scores for intelligence tests. Normed scores allow for properly interpreting an individual's test score. They are derived in the test construction phase, based on scores in a large normative sample. Normed scores express the position of an individual test taker in the reference population. The reference population for a test typically depends on individual characteristic(s), like age and possibly sex. This tutorial introduces the reader to a method to compute normed scores that depend on individual characteristic(s), making optimal use of all background knowledge and the scores in the whole normative sample. Therefore, the method yields potentially more realistic norms, and more precise norms than traditional methods, using the same amount of data. This is an important asset, because gathering sufficient data is difficult and costly. In this tutorial, we explain the technical background of the method, called regression-based norming with the generalized additive models for location, scale, and shape (GAMLSS), and explain how to apply it based on six steps. Following these steps yield regression-based norms with GAMLSS for a psychological test, as we illustrate with normative data of the intelligence test IDS-2. The complete R code and data set is provided as online supplemental material, so that test developers can apply the method to derive high-quality norms for their own test.
We investigated whether the accuracy of normed test scores derived from non-demographically representative samples can be improved by combining continuous norming methods with compensatory weighting ...of test results. To this end, we introduce Raking, a method from social sciences, to psychometrics. In a simulated reference population, we modeled a latent cognitive ability with a typical developmental gradient, along with three demographic variables that were correlated to varying degrees with the latent ability. We simulated five additional populations representing patterns of non-representativeness that might be encountered in the real world. We subsequently drew smaller normative samples from each population and used an one-parameter logistic Item Response Theory (IRT) model to generate simulated test results for each individual. Using these simulated data, we applied norming techniques, both with and without compensatory weighting. Weighting reduced the bias of the norm scores when the degree of non-representativeness was moderate, with only a small risk of generating new biases.
Continuous norming is an increasingly popular approach to establish norms when the performance on a test is dependent on age. However, current continuous norming methods rely on a number of ...assumptions that are quite restrictive and may introduce bias. In this study, quantile regression was introduced as more flexible alternative. Bias and precision of quantile regression-based norming were investigated with (age-)group as covariate, varying sample sizes and score distributions, and compared with bias and precision of two other norming methods: traditional norming and mean regression-based norming. Simulations showed the norms obtained using quantile regression to be most precise in almost all conditions. Norms were nevertheless biased when the score distributions reflected a ceiling effect. Quantile regression-based norming can thus be considered a promising alternative to traditional norming and mean regression-based norming, but only if the shape of the score distribution can be expected to be close to normal.
The interpretation of psychometric test results is usually based on norm scores. We compared semiparametric continuous norming (SPCN) with conventional norming methods by simulating results for test ...scales with different item numbers and difficulties via an item response theory approach. Subsequently, we modeled the norm scores based on random samples with varying sizes either with a conventional ranking procedure or SPCN. The norms were then cross-validated by using an entirely representative sample of N = 840,000 for which different measures of norming error were computed. This process was repeated 90,000 times. Both approaches benefitted from an increase in sample size, with SPCN reaching optimal results with much smaller samples. Conventional norming performed worse on data fit, age-related errors, and number of missings in the norm tables. The data fit in conventional norming of fixed subsample sizes varied with the granularity of the age brackets, calling into question general recommendations for sample sizes in test norming. We recommend that test norms should be based on statistical models of the raw score distributions instead of simply compiling norm tables via conventional ranking procedures.
Test norms enable determining the position of an individual test taker in the group. The most frequently used approach to obtain test norms is traditional norming. Regression-based norming may be ...more efficient than traditional norming and is rapidly growing in popularity, but little is known about its technical properties. A simulation study was conducted to compare the sample size requirements for traditional and regression-based norming by examining the 95% interpercentile ranges for percentile estimates as a function of sample size, norming method, size of covariate effects on the test score, test length, and number of answer categories in an item. Provided the assumptions of the linear regression model hold in the data, for a subdivision of the total group into eight equal-size subgroups, we found that regression-based norming requires samples 2.5 to 5.5 times smaller than traditional norming. Sample size requirements are presented for each norming method, test length, and number of answer categories. We emphasize that additional research is needed to establish sample size requirements when the assumptions of the linear regression model are violated.
Investigation of affective and semantic dimensions of words is essential for studying word processing. In this study, we expanded Tse et al.'s (Behav Res Methods 49:1503-1519, 2017; Behav Res Methods ...55:4382-4402, 2023) Chinese Lexicon Project by norming five word dimensions (valence, arousal, familiarity, concreteness, and imageability) for over 25,000 two-character Chinese words presented in traditional script. Through regression models that controlled for other variables, we examined the relationships among these dimensions. We included ambiguity, quantified by the standard deviation of the ratings of a given lexical variable across different raters, as separate variables (e.g., valence ambiguity) to explore their connections with other variables. The intensity-ambiguity relationships (i.e., between normed variables and their ambiguities, like valence with valence ambiguity) were also examined. In these analyses with a large pool of words and controlling for other lexical variables, we replicated the asymmetric U-shaped valence-arousal relationship, which was moderated by valence and arousal ambiguities. We also observed a curvilinear relationship between valence and familiarity and between valence and concreteness. Replicating Brainerd et al.'s (J Exp Psychol Gen 150:1476-1499, 2021; J Mem Lang 121:104286, 2021) quadratic intensity-ambiguity relationships, we found that the ambiguity of valence, arousal, concreteness, and imageability decreases as the value of these variables is extremely low or extremely high, although this was not generalized to familiarity. While concreteness and imageability were strongly correlated, they displayed different relationships with arousal, valence, familiarity, and valence ambiguity, suggesting their distinct conceptual nature. These findings further our understanding of the affective and semantic dimensions of two-character Chinese words. The normed values of all these variables can be accessed via https://osf.io/hwkv7 .
We study an inverse spectral problem for Jacobi matrices from a principal submatrix together with a subset of pairs of the eigenvalues and first components of normalized eigenvectors. Necessary and ...sufficient conditions are presented under which this problem is solvable. The corresponding algorithm and two examples are given.
To interpret a person’s change score, one typically transforms the change score into, for example, a percentile, so that one knows a person’s location in a distribution of change scores. Transformed ...scores are referred to as norms and the construction of norms is referred to as norming. Two often-used norming methods for change scores are the regression-based change approach and the T Scores for Change method. In this article, we discuss the similarities and differences between these norming methods, and use a simulation study to systematically examine the precision of the two methods and to establish the minimum sample size requirements for satisfactory precision.
An element $(x_1, \ldots, x_n)\in E^n$ is called a {\em norming point} of $T\in {\mathcal L}_s(^n E)$ if $\|x_1\|=\cdots=\|x_n\|=1$ and$|T(x_1, \ldots, x_n)|=\|T\|,$ where ${\mathcal L}_s(^n E)$ ...denotes the space of all symmetric continuous $n$-linear forms on $E.$For $T\in {\mathcal L}_s(^n E),$ we define $$\mathop{\rm Norm}(T)=\{(x_1, \ldots, x_n)\in E^n: (x_1, \ldots, x_n)~\mbox{is a norming point of}~T\}.$$$\mathop{\rm Norm}(T)$ is called the {\em norming set} of $T$. We classify $\mathop{\rm Norm}(T)$ for every $T\in {\mathcal L}_s(^2l_{\infty}^2)$.
On spaces of finite signed Borel measures on a metric space one has introduced the Fortet-Mourier and Dudley norms, by embedding the measures into the dual space of the Banach space of bounded ...Lipschitz functions, equipped with different – but equivalent – norms: the FM-norm and the BL-norm, respectively. The norm of such a measure is then obtained by maximising the value of the measure when applied by integration to extremal functions of the unit ball. We introduce Lipschitz extension operators, essentially based on those defined by McShane, and investigate their properties. A remarkable one is that non-trivial extreme points are mapped to non-trivial extreme points of FM- and BL-norm unit balls. Using these extension operators, we define suitable ‘small’ subsets of extremal functions that are weak-star dense in the full set of extreme points of the unit ball, for any underlying metric space. For connected metric spaces, we additionally find a larger set of extremal functions for the BL-norm, similar to such a set that was defined previously by J. Johnson for the FM-norm. This set is then also weak-star dense in the extremal functions. These results may open an avenue to obtaining computational approaches for the Dudley norm on signed Borel measures.