The purpose of this instructional piece was to provide a nontechnical synthesis of common internal consistency reliability estimates used in professional counseling and in related fields. The article ...begins with an overview of coefficients alpha, omega, omega hierarchical, and H, with guidelines for their selection. Next, I provide recommendations for interpretive cutoff scores for higher and lower stakes testing followed by commentary on the limitations of relying too heavily on such guidelines. I discuss the importance of reporting confidence intervals (CIs) for reliability estimates to enhance reliability generalizations. When evaluating internal consistency reliability estimates of scores based on sample data counselors are advised to (a) determine the intended use of test scores in terms of higher or lower stakes testing, (b) take a multifaceted approach and consider the requirements of each reliability index, (c) include CIs, and (d) refer to interpretive cutoff scores as tentative general guidelines, not absolute standards.
This study described the development of the Social Skills Improvement System Social Emotional Learning Edition Rating Forms (SSIS SEL RF) for teachers, parents, and students. This new multirater ...assessment is a reconfiguration of the SSIS Rating Scales items inspired by the CASEL Social Emotional Competency framework. The internal structure and score reliability estimates were examined across three raters for a common sample of more than 200 individual children ages 3 to 18 years. Confirmatory factor analyses tested against the CASEL five-dimensional SEL theoretical model demonstrated adequate fit for the SSIS SEL Parent and Student RFs and mediocre fit of the Teacher RF. Internal consistency, test-retest, and interrater reliability estimates for scores on each of the SSIS SEL RFs all met or exceeded acceptable criteria. Thus, researchers and practitioners interested in measuring the social–emotional behavior of children ages 3 to 18 can expect reliable scores and structurally meaningful behavior content within the Collaborative on Academic Social Emotional Learning (CASEL) SEL competency framework. Limitations to the present findings and suggestions for future research conclude the report.
Short measures are commonly used when conducting research involving emotions. However, obtaining appropriate estimates of reliability for short measures is traditionally problematic and is a ...reoccurring concern in emotion research. To address this issue, we compare the within-session test-retest and factor analysis methods for estimating the reliability of items in the Positive and Negative Affect Schedule-Expanded Form. Results indicate that within-session test-retest (rXX(d)) estimates outperform the factor analysis method by demonstrating stronger relationships with item properties relevant to reliability and validity-related criteria. In addition, rXX(d) estimates appropriately generalize across samples with various instruction stems and prevent corrections for attenuation greater than 1.00. Therefore, we encourage researchers to use the corresponding average item-level rXX(d) estimates reported here to correct for attenuation when examining single items from the Positive and Negative Affect Schedule-Expanded Form if a test-retest design is not feasible.
We present a method to improve computed regression prediction values for unseen data. It aims at obtaining more accurate results by adjusting the calculated predictions instead of by constructing a ...different regression model. As a result, it can be helpful to improve the prediction of a specific observation provided by an existing benchmark regression model or predictor system. The proposed methodology uses individual point reliability estimates that indicate if a single regression prediction is likely to produce an error considered critical by the user of the regression. We tested the method in two sets of experiments, one using synthetically produced data, and the other using data from the public data repository UCI Machine Learning. The experiments with synthetic data were performed to verify the efficiency of the method under controlled situations. In this case, the method produced superior results improving predictions for cleaner data with progressive worsening with the increase of the noise level. Experiments with ten databases from the UCI data repository were executed to investigate the applicability of the methodology using real world data. The method was able to correctly adjust regressions prediction values in experiments with all the ten databases, achieving statistically significant improvement in eight of them.
Items that capture group members’ outcomes from small group processes (e.g., satisfaction, cohesion) are often nonindependent. A primary assumption of most measurement models is that the data are ...independent; applying such models to group-outcome data measured at the individual level of analysis is thus likely to produce inaccurate estimates. A solution to the measurement of nonindependent data involves the use of multilevel modeling to estimate variances at item, individual, and group levels of analysis. Examples from several different statistics programs are provided, and Monte Carlo simulations are used to evaluate the effects of group size and number of items on reliability estimates.
It has long been well known that actual system reliability typically falls well short of early estimates. Failure rates are often ten or more times higher than anticipated. Many reasons have been ...given for this, but over-optimism is the fundamental cause of too-favorable reliability predictions. Most forecasts of reliability are essentially best-case scenarios, as are predictions of budget and schedule. Confident engineers assemble estimates bottom-up, including the known factors and ignoring problems that they hope won't happen. Traditional reliability estimation is based on simply summing up the component failure rates. This ignores most actual failure causes. The way to reduce over-optimism is to use the historical system level failure rate from similar projects. Adjustments should not be made based purely on engineering judgment, but only if there is so logical quantitative justification. The traditional component-based reliability estimate is useful as a lower bound on the system failure rate. The difference between this lower bound component-based reliability and the historical system level reliability indicates how much of the total failure rate is due to system level problems rather than component failures.
A novel method is proposed to estimate the primary tonal frequency of speech. The method is based on singular spectrum analysis. A singular model of a vocalized segment of a speech signal is ...presented that considers the direct and inverse problems. A study is conducted of the process of singular estimation of the primary tonal frequency of speech. Experimental research is carried out using a model that yields adequacy and reliability estimates. The concept of singular estimation of the primary tonal frequency of speech is introduced.
Researchers sometimes need a statistical test of the hypothesis that two values of Cronbach's alpha reliability coefficient are equal. The situation may involve scores from two different measures ...administered to independent random samples or from the same measure administered to random samples from two different populations. Feldt derived a test that functions well with large or moderate numbers of subjects. However, he validated this test only when the number of parts (k) of the measurement was fairly large, as it would be if the parts were individual test items. He did not consider instances in which the parts were raters, and hencek would be as small as 2 or 3. In this article, the Feldt test is investigated for such situations. It is found to function quite well in its control of Type I error.