Multicollinearity represents a high degree of linear intercorrelation between explanatory variables in a multiple regression model and leads to incorrect results of regression analyses. Diagnostic ...tools of multicollinearity include the variance inflation factor (VIF), condition index and condition number, and variance decomposition proportion (VDP). The multicollinearity can be expressed by the coefficient of determination (Rh2) of a multiple regression model with one explanatory variable (Xh) as the model's response variable and the others (Xi i ≠ h) as its explanatory variables. The variance (σh2) of the regression coefficients constituting the final regression model are proportional to the VIF. Hence, an increase in Rh2 (strong multicollinearity) increases σh2. The larger σh2 produces unreliable probability values and confidence intervals of the regression coefficients. The square root of the ratio of the maximum eigenvalue to each eigenvalue from the correlation matrix of standardized explanatory variables is referred to as the condition index. The condition number is the maximum condition index. Multicollinearity is present when the VIF is higher than 5 to 10 or the condition indices are higher than 10 to 30. However, they cannot indicate multicollinear explanatory variables. VDPs obtained from the eigenvectors can identify the multicollinear variables by showing the extent of the inflation of σh2 according to each condition index. When two or more VDPs, which correspond to a common condition index higher than 10 to 30, are higher than 0.8 to 0.9, their associated explanatory variables are multicollinear. Excluding multicollinear explanatory variables leads to statistically stable multiple regression models.
•Confidence intervals (CI) measure the uncertainty around effect estimates.•Frequentist 95% CI: we can be 95% confident that the true estimate would lie within the interval.•Bayesian 95% CI: there is ...a 95% probability that the true estimate would lie within the interval.•Decision-making should not be made considering only the dichotomized interpretation of CIs.•Training and education may enhance knowledge related to understanding and interpreting CIs.
Reporting confidence intervals in scientific articles is important and relevant for evidence-based practice. Clinicians should understand confidence intervals in order to determine if they can realistically expect results similar to those presented in research studies when they implement the scientific evidence in clinical practice. The aims of this masterclass are: (1) to discuss confidence intervals around effect estimates; (2) to understand confidence intervals estimation (frequentist and Bayesian approaches); and (3) to interpret such uncertainty measures.
Confidence intervals are measures of uncertainty around effect estimates. Interpretation of the frequentist 95% confidence interval: we can be 95% confident that the true (unknown) estimate would lie within the lower and upper limits of the interval, based on hypothesized repeats of the experiment. Many researchers and health professionals oversimplify the interpretation of the frequentist 95% confidence interval by dichotomizing it in statistically significant or non-statistically significant, hampering a proper discussion on the values, the width (precision) and the practical implications of such interval. Interpretation of the Bayesian 95% confidence interval (which is known as credible interval): there is a 95% probability that the true (unknown) estimate would lie within the interval, given the evidence provided by the observed data.
The use and reporting of confidence intervals should be encouraged in all scientific articles. Clinicians should consider using the interpretation, relevance and applicability of confidence intervals in real-world decision-making. Training and education may enhance knowledge and skills related to estimating, understanding and interpreting uncertainty measures, reducing the barriers for their use under either frequentist or Bayesian approaches.
TRAINING OF PROCESSING AND ANALYSIS OF STATISTICAL DATA STUDENTS OF STISA PAMEKASAN WITH SPSS. This training was motivated because of the problems of the final semester students of STISA Sumber Duko ...Pakong Pamekasan in processing and analyzing data in quantitative research. Apart from the low basic abilities of students in calculating, this is also because the course material for the research methods they obtain is still general and leads more towards qualitative research; Most of the available examples of thesis are more likely to be qualitative research in nature. Through training in processing and analyzing statistical data with SPSS, it is hoped that it can help them understand how to process statistical data and interpret the output, as well as improve students' skills in processing statistical data using the help of SPSS statistical software. The flow of work stages in this training is: problem identification, selection of statistical data processing software, preparation of training materials, training, mentoring, and evaluation. The results of this training are: (1) the training participants gain new knowledge about statistical data processing and analysis using the SPSS software; (2) the training participants responded 100% positively to the training; (3) the training participants are very enthusiastic about the implementation of the training and are able to understand the training material very well. It can be seen that more than 50% of the training participants were able to complete statistical data processing with SPSS for different data and were able to analyze the SPSS output.
Abstract
Common tasks encountered in epidemiology, including disease incidence estimation and causal inference, rely on predictive modelling. Constructing a predictive model can be thought of as ...learning a prediction function (a function that takes as input covariate data and outputs a predicted value). Many strategies for learning prediction functions from data (learners) are available, from parametric regressions to machine learning algorithms. It can be challenging to choose a learner, as it is impossible to know in advance which one is the most suitable for a particular dataset and prediction task. The super learner (SL) is an algorithm that alleviates concerns over selecting the one ‘right’ learner by providing the freedom to consider many, such as those recommended by collaborators, used in related research or specified by subject-matter experts. Also known as stacking, SL is an entirely prespecified and flexible approach for predictive modelling. To ensure the SL is well specified for learning the desired prediction function, the analyst does need to make a few important choices. In this educational article, we provide step-by-step guidelines for making these decisions, walking the reader through each of them and providing intuition along the way. In doing so, we aim to empower the analyst to tailor the SL specification to their prediction task, thereby ensuring their SL performs as well as possible. A flowchart provides a concise, easy-to-follow summary of key suggestions and heuristics, based on our accumulated experience and guided by SL optimality theory.
Conventional statistical tests are usually called parametric tests. Parametric tests are used more frequently than nonparametric tests in many medical articles, because most of the medical ...researchers are familiar with and the statistical software packages strongly support parametric tests. Parametric tests require important assumption; assumption of normality which means that distribution of sample means is normally distributed. However, parametric test can be misleading when this assumption is not satisfied. In this circumstance, nonparametric tests are the alternative methods available, because they do not required the normality assumption. Nonparametric tests are the statistical methods based on signs and ranks. In this article, we will discuss about the basic concepts and practical use of nonparametric tests for the guide to the proper use.
Electric taxis can help reducing air pollution in crowded urban areas but their city-wide operation requires a distributed rapid charging infrastructure. Building such rapid charging networks is ...currently capital intensive and therefore requires careful planning. Here, we propose a novel data-driven framework for deploying suitable rapid charging infrastructures in large urban areas for electric taxis. This framework combines an iterative clustering technique with a modified numerical optimisation method to determine the smallest feasible infrastructure for a level of charging availability that ensures uninterrupted electric taxi service. We provide a case study for Istanbul using real-time global positioning data from fossil-fuel taxis currently operational in the city. This case study tests the performance of the proposed infrastructure, determined by the framework, by simulating a taxi fleet of the same size as the one currently operating in Istanbul. Our results show that a sufficient charging infrastructure to serve a fully electric taxi fleet of 17,395 vehicles in a large city like Istanbul should consist of around 1,363–1,834 charging stations depending on the roll-out strategy. In the most suitable case, each charging station on average provides a daily amount of energy of 449.61 kWh and usually serves about 20 electric taxis per day. Furthermore, we observe that infrastructures with less than 1,300 charging stations would result in significant shortages of charging availability and adversely impact a reliable electric taxi service operation. While exact numbers of required charging stations would vary depending on the city characteristics and fleet size, the roll-out strategies, in addition to the underlying feasibility analysis presented here, would support transport authorities and other decision makers in shaping an appropriate urban transition strategy that accommodates electric taxi services.
•A novel density-based approach to deploy rapid charging infrastructure for electric taxis.•A new solution to continuous domain optimisation that abates computational costs.•Taxi driver behaviour is characterised using real-time GPS information.•The trade-off between charging locations and average travel distance for charging.•Possible roll-out strategies to reduce the total number of chargers at each location.
•GC–MS was used to investigate the polar metabolite pool of mozzarella.•Microbial features of buffalo and cow mozzarella were characterised.•The low molecular weight metabolites profile reflects ...microbial complexity of mozzarella.•Metabolomics approach can be used to protect Italian buffalo mozzarella uniqueness.
Italian buffalo mozzarella (BM) cheese metabolite profile and microbial communities were characterised and compared to cow mozzarella (CM). Polar metabolite profiles were studied by gas-chromatography mass-spectrometry (GC–MS) and results elaborated by multivariate analysis (MVA). BM produced using natural whey starter cultures (NWS) exhibited a higher microbial diversity with less psychrotrophic bacteria. BM samples were higher in threonine, serine, valine, and lower in orotic acid and urea. CM produced with commercial starters (CMS) had the highest count of Streptococcus thermophilus and higher levels of galactose and phenylalanine. CM obtained by direct acidification (CMA) had lower microbial counts and higher levels of urea and sugars. Orotic acid was the only metabolite linked to milk animal origin. Results indicated that this metabolite pool well reflects the different production protocols and microbial complexity of these dairy products. This approach can help to protect the designation of origin of Italian buffalo mozzarella.
Statistical hypothesis testing compares the significance probability value and the significance level value to determine whether or not to reject the null hypothesis. This concludes "significant or ...not significant." However, since this process is a process of statistical hypothesis testing, the conclusion of "statistically significant or not statistically significant" is more appropriate than the conclusion of "significant or not significant." Also, in many studies, the significance level is set to 0.05 to compare with the significance probability value,
-value. If the
-value is less than 0.05, it is judged as "significant," and if the
-value is greater than 0.05, it is judged as "not significant." However, since the significance probability is a value set by the researcher according to the circumstances of each study, it does not necessarily have to be 0.05. In a statistical hypothesis test, the conclusion depends on the setting of the significance level value, so the researcher must carefully set the significance level value. In this study, the stages of statistical hypothesis testing were examined in detail, and the exact conclusions accordingly and the contents that should be considered carefully when interpreting them were mentioned with emphasis on statistical hypothesis testing and significance level. In 11 original articles published in the
in 2022, the interpretation of hypothesis testing and the contents of the described conclusions were reviewed from the perspective of statistical hypothesis testing and significance level, and the content that I would like to be supplemented was mentioned.
The evidence based medicine paradigm demands scientific reliability, but modern research seems to overlook it sometimes. The power analysis represents a way to show the meaningfulness of findings, ...regardless to the emphasized aspect of statistical significance. Within this statistical framework, the estimation of the effect size represents a means to show the relevance of the evidences produced through research. In this regard, this paper presents and discusses the main procedures to estimate the size of an effect with respect to the specific statistical test used for hypothesis testing. Thus, this work can be seen as an introduction and a guide for the reader interested in the use of effect size estimation for its scientific endeavour.