Student's t test (t test), analysis of variance (ANOVA), and analysis of covariance (ANCOVA) are statistical methods used in the testing of hypothesis for comparison of means between the groups. The ...Student's t test is used to compare the means between two groups, whereas ANOVA is used to compare the means among three or more groups. In ANOVA, first gets a common P value. A significant P value of the ANOVA test indicates for at least one pair, between which the mean difference was statistically significant. To identify that significant pair(s), we use multiple comparisons. In ANOVA, when using one categorical independent variable, it is called one-way ANOVA, whereas for two categorical independent variables, it is called two-way ANOVA. When using at least one covariate to adjust with dependent variable, ANOVA becomes ANCOVA. When the size of the sample is small, mean is very much affected by the outliers, so it is necessary to keep sufficient sample size while using these methods.
The data presented herein represents the simulated datasets of a recently conducted larger study which investigated the behaviour of Bayesian indices of significance and effect size as alternatives ...to traditional p-values. The study considered the setting of Student's and Welch's two-sample t-test often used in medical research. It investigated the influence of the sample size, noise, the selected prior hyperparameters and the sensitivity to type I errors. The posterior indices used included the Bayes factor, the region of practical equivalence, the probability of direction, the MAP-based p-value and the e-value in the Full Bayesian Significance Test. The simulation study was conducted in the statistical programming language R.
The R script files for simulation of the datasets used in the study are presented in this article. These script files can both simulate the raw datasets and run the analyses. As researchers may be faced with different effect sizes, noise levels or priors in their domain than the ones studied in the original paper, the scripts extend the original results by allowing to recreate all analyses of interest in different contexts. Therefore, they should be relevant to other researchers.
Statistical inference in psychology has traditionally relied heavily on p-value significance testing. This approach to drawing conclusions from data, however, has been widely criticized, and two ...types of remedies have been advocated. The first proposal is to supplement p values with complementary measures of evidence, such as effect sizes. The second is to replace inference with Bayesian measures of evidence, such as the Bayes factor. The authors provide a practical comparison of p values, effect sizes, and default Bayes factors as measures of statistical evidence, using 855 recently published t tests in psychology. The comparison yields two main results. First, although p values and default Bayes factors almost always agree about what hypothesis is better supported by the data, the measures often disagree about the strength of this support; for 70% of the data sets for which the p value falls between .01 and .05, the default Bayes factor indicates that the evidence is only anecdotal. Second, effect sizes can provide additional evidence to p values and default Bayes factors. The authors conclude that the Bayesian approach is comparatively prudent, preventing researchers from overestimating the evidence in favor of an effect.
The replication crisis hit the medical sciences about a decade ago, but today still most of the flaws inherent in null hypothesis significance testing (NHST) have not been solved. While the drawbacks ...of p-values have been detailed in endless venues, for clinical research, only a few attractive alternatives have been proposed to replace p-values and NHST. Bayesian methods are one of them, and they are gaining increasing attention in medical research, as some of their advantages include the description of model parameters in terms of probability, as well as the incorporation of prior information in contrast to the frequentist framework. While Bayesian methods are not the only remedy to the situation, there is an increasing agreement that they are an essential way to avoid common misconceptions and false interpretation of study results. The requirements necessary for applying Bayesian statistics have transitioned from detailed programming knowledge into simple point-and-click programs like JASP. Still, the multitude of Bayesian significance and effect measures which contrast the gold standard of significance in medical research, the p-value, causes a lack of agreement on which measure to report.
Therefore, in this paper, we conduct an extensive simulation study to compare common Bayesian significance and effect measures which can be obtained from a posterior distribution. In it, we analyse the behaviour of these measures for one of the most important statistical procedures in medical research and in particular clinical trials, the two-sample Student's (and Welch's) t-test.
The results show that some measures cannot state evidence for both the null and the alternative. While the different indices behave similarly regarding increasing sample size and noise, the prior modelling influences the obtained results and extreme priors allow for cherry-picking similar to p-hacking in the frequentist paradigm. The indices behave quite differently regarding their ability to control the type I error rates and regarding their ability to detect an existing effect.
Based on the results, two of the commonly used indices can be recommended for more widespread use in clinical and biomedical research, as they improve the type I error control compared to the classic two-sample t-test and enjoy multiple other desirable properties.
Inference in Experiments With Matched Pairs Bai, Yuehao; Romano, Joseph P.; Shaikh, Azeem M.
Journal of the American Statistical Association,
10/2022, Volume:
117, Issue:
540
Journal Article
Peer reviewed
Open access
This article studies inference for the average treatment effect in randomized controlled trials where treatment status is determined according to a "matched pairs" design. By a "matched pairs" ...design, we mean that units are sampled iid from the population of interest, paired according to observed, baseline covariates and finally, within each pair, one unit is selected at random for treatment. This type of design is used routinely throughout the sciences, but fundamental questions about its implications for inference about the average treatment effect remain. The main requirement underlying our analysis is that pairs are formed so that units within pairs are suitably "close" in terms of the baseline covariates, and we develop novel results to ensure that pairs are formed in a way that satisfies this condition. Under this assumption, we show that, for the problem of testing the null hypothesis that the average treatment effect equals a prespecified value in such settings, the commonly used two-sample t-test and "matched pairs" t-test are conservative in the sense that these tests have limiting rejection probability under the null hypothesis no greater than and typically strictly less than the nominal level. We show, however, that a simple adjustment to the standard errors of these tests leads to a test that is asymptotically exact in the sense that its limiting rejection probability under the null hypothesis equals the nominal level. We also study the behavior of randomization tests that arise naturally in these types of settings. When implemented appropriately, we show that this approach also leads to a test that is asymptotically exact in the sense described previously, but additionally has finite-sample rejection probability no greater than the nominal level for certain distributions satisfying the null hypothesis. A simulation study and empirical application confirm the practical relevance of our theoretical results.
With the broad deployment of Wi-Fi networks, the Received Signal Strength (RSS) based Wi-Fi indoor localization has attained much interest of both academia and industry. At present, most of the ...currently available Wi-Fi indoor localization techniques focus on increasing the localization accuracy. However, few of them take into account the diversity of Wi-Fi signal distributions and the measurement error associated with RSS values owing to the complicated indoor environment, which consequently results in the low robustness of indoor localization systems. Thus, with the motivation to tackle this gripping problem, we design a new hybrid hypothesis test based on the idea of Asymptotic Relative Efficiency (ARE), which exploits signal distributions by considering different Access Point (AP) contributions to the Wi-Fi indoor localization accuracy. In concrete terms, first of all, the Jarque-Bera (JB) test is used to perform the normality test on the Wi-Fi signal distribution at each Reference Point (RP), and then the Chi-squared Automatic Interaction Detection (CHAID) approach is applied to obtain each AP contribution degree. Secondly, based on the evaluation of the JB test on the Wi-Fi signal distribution, the hybrid Mann-Whitney U and T test is applied to find the set of matching RPs corresponding to each newly-collected RSS data. Finally, the target location estimate is acquired by using the K-Nearest Neighbor (KNN), where the contribution degree of each AP is assigned as the weight during the calculation to find matching RPs. From the extensive experimental results, it is evident that the proposed approach can successfully improve the system performance by achieving a higher localization accuracy and enhanced robustness when compared with the state-of-the-art Wi-Fi indoor localization techniques.
Ceiling and floor effects are often observed in social and behavioral science. The current study examines ceiling/floor effects in the context of the
t-
test and ANOVA, two frequently used ...statistical methods in experimental studies. Our literature review indicated that most researchers treated ceiling or floor data as if these data were true values, and that some researchers used statistical methods such as discarding ceiling or floor data in conducting the
t
-test and ANOVA. The current study evaluates the performance of these conventional methods for
t
-test and ANOVA with ceiling or floor data. Our evaluation also includes censored regression with regard to its capacity for handling ceiling/floor data. Furthermore, we propose an easy-to-use method that handles ceiling or floor data in
t
-tests and ANOVA by using properties of truncated normal distributions. Simulation studies were conducted to compare the performance of the methods in handling ceiling or floor data for
t
-test and ANOVA. Overall, the proposed method showed greater accuracy in effect size estimation and better-controlled Type I error rates over other evaluated methods. We developed an easy-to-use software package and web applications to help researchers implement the proposed method. Recommendations and future directions are discussed.
Using the paired T-Test to compare suppliers De Brito, Caroline Soares; Silva, Dayana Elizabeth Werderits; Aguiar, Luiz Guilherme de Andrade ...
GeSec : Revista de Gestão e Secretariado,
10/2023, Volume:
14, Issue:
10
Journal Article
Peer reviewed
Open access
The manufacture of industrial products requires rigorous quality, which is why they need to be in compliance. It is therefore essential that two different suppliers present products within the same ...specifications. The aim of this article is to show a case study carried out in a company in the south of the state of Rio de Janeiro, which used the Paired T Test to compare two types of foam for hospital mattresses. The results showed that supplier 1 produces a foam below the specified thickness of 11 cm while supplier 2 produces a foam within the specified value, so supplier 1 would be rejected and supplier 2's service would be used.
Comparing areas under the ROC curve (AUCs) is a popular approach to compare prognostic biomarkers. The aim of this paper is to present an efficient method to control the family‐wise error rate when ...multiple comparisons are performed. We suggest to combine the max‐t test and the closed testing procedures. We build on previous work on asymptotic results for ROC curves and on general multiple testing methods to efficiently take into account both the correlations between the test statistics and the logical constraints between the null hypotheses. The proposed method results in an uniformly more powerful procedure than both the single‐step max‐t test procedure and popular stepwise extensions of the Bonferroni procedure, such as Bonferroni–Holm. As demonstrated in this paper, the method can be applied in most usual contexts, including the time‐dependent context with right censored data. We show how the method works in practice through a motivating example where we compare several psychometric scores to predict the t‐year risk of Alzheimer's disease. The example illustrates several multiple testing settings and demonstrates the advantage of using the proposed methods over common alternatives. R code has been made available to facilitate the use of the methods by others.