This study posits that all innovations meet consumer resistance, and overcoming this opposition must occur prior to product adoption. Factors driving service innovation resistance remain unclear. To ...better understand this behavior, the present study examines how five theory-driven adoption barriers—usage, value, risk, tradition, and image – as well as three consumer demographics—gender, age, and income—influence consumer adoption versus rejection decisions in Internet and mobile banking. Data from two large nationwide surveys conducted in Finland (n=1736 consumers) test hypotheses using binary logit models comparing mobile banking adopters versus non-adopters, mobile banking postponers versus rejecters, and Internet banking postponers versus rejecters. Study results find that the value barrier is the strongest inhibitor of Internet and mobile banking adoption. In addition, the image barrier slows mobile banking adoption, and the tradition barrier explains the rejection of Internet banking. Gender and age significantly predict adoption and rejection decisions. The results demonstrate notable differences between these seemingly similar service innovations.
The Random Forest (RF) algorithm for regression and classification has considerably gained popularity since its introduction in 2001. Meanwhile, it has grown to a standard classification approach ...competing with logistic regression in many innovation-friendly scientific fields.
In this context, we present a large scale benchmarking experiment based on 243 real datasets comparing the prediction performance of the original version of RF with default parameters and LR as binary classification tools. Most importantly, the design of our benchmark experiment is inspired from clinical trial methodology, thus avoiding common pitfalls and major sources of biases.
RF performed better than LR according to the considered accuracy measured in approximately 69% of the datasets. The mean difference between RF and LR was 0.029 (95%-CI =0.022,0.038) for the accuracy, 0.041 (95%-CI =0.031,0.053) for the Area Under the Curve, and - 0.027 (95%-CI =-0.034,-0.021) for the Brier score, all measures thus suggesting a significantly better performance of RF. As a side-result of our benchmarking experiment, we observed that the results were noticeably dependent on the inclusion criteria used to select the example datasets, thus emphasizing the importance of clear statements regarding this dataset selection process. We also stress that neutral studies similar to ours, based on a high number of datasets and carefully designed, will be necessary in the future to evaluate further variants, implementations or parameters of random forests which may yield improved accuracy compared to the original version with default values.
ℓ1 regularization has been used for logistic regression to circumvent the overfitting and use the estimated sparse coefficient for feature selection. However, the challenge of such regularization is ...that the ℓ1 regularization is not differentiable, making the standard convex optimization algorithm not applicable to this problem. This paper presents a simple projection neural network for ℓ1-regularized logistics regression. In contrast to many available solvers in the literature, the proposed neural network does not require any extra auxiliary variable nor smooth approximation, and its complexity is almost identical to that of the gradient descent for logistic regression without ℓ1 regularization, thanks to the projection operator. We also investigate the convergence of the proposed neural network by using the Lyapunov theory and show that it converges to a solution of the problem with any arbitrary initial value. The proposed neural solution significantly outperforms state-of-the-art methods concerning the execution time and is competitive in terms of accuracy and AUROC.
Development on Rinca Island by the Indonesian Government has received a lot of reaction from the community. Masses expressed their opinion through social media, especially Twitter regarding the ...matter. The research was conducted to analyze the public’s sentiment about this development which was divided into three categories: pro, contra, and neutral. There are two Doc2Vec models used in this research, the distributed model, and the distributed bag of words, and using support vector machines and logistic regression as classifiers. Each combination of the models and classifier has an accuracy rate above 75% and shows that almost all are against the development of Rinca Island.
Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other ...disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature.
Different study designs and population size may require different sample size for logistic regression. This study aims to propose sample size guidelines for logistic regression based on observational ...studies with large population.
We estimated the minimum sample size required based on evaluation from real clinical data to evaluate the accuracy between statistics derived and the actual parameters. Nagelkerke r-squared and coefficients derived were compared with their respective parameters.
With a minimum sample size of 500, results showed that the differences between the sample estimates and the population was sufficiently small. Based on an audit from a medium size of population, the differences were within ± 0.5 for coefficients and ± 0.02 for Nagelkerke
-squared. Meanwhile for large population, the differences are within ± 1.0 for coefficients and ± 0.02 for Nagelkerke
-squared.
For observational studies with large population size that involve logistic regression in the analysis, taking a minimum sample size of 500 is necessary to derive the statistics that represent the parameters. The other recommended rules of thumb are EPV of 50 and formula;
= 100 + 50
where
refers to number of independent variables in the final model.
Exposure of cells to xenobiotic human-made products can lead to genotoxicity and cause DNA damage. It is an urgent need to quickly identify the chemicals that cause DNA damage, and their toxicity ...should be predicted. In this study, recursive partitioning (RP), binary logistic regression, and one machine learning approach, namely, random forest (RF) classifier, were used to predict the active and inactive compounds of a total 5036 data based on the assay conducted by a β-lactamase reporter gene under control of the p53 response element (p53RE) from Tox21 library. Results show that the binary logistic regression model with a threshold of 0.5 has a high accuracy rate (83%) to distinguish active and inactive compounds. The RF classifier method has satisfactory results, with an accuracy rate (84.38%) approximately higher than that of binary logistic regression. The models established can identify compounds that induce DNA damage and activate p53, and provide a scientific basis for the risk assessment of organic chemicals in the environment.
•Simple algorithms are applied to identify chemicals causing genotoxicity.•Binary logistic regression and RF classifier models have satisfactory results.•The parameters logS and acid pKa play dominant roles in the prediction.•Models can be used to predict genotoxicity across understudied chemicals.
Students in statistics or data science usually learn early on that when the sample size n is large relative to the number of variables p, fitting a logistic model by the method of maximum likelihood ...produces estimates that are consistent and that there are well-known formulas that quantify the variability of these estimates which are used for the purpose of statistical inference. We are often told that these calculations are approximately valid if we have 5 to 10 observations per unknown parameter. This paper shows that this is far from the case, and consequently, inferences produced by common software packages are often unreliable. Consider a logistic model with independent features in which n and p become increasingly large in a fixed ratio. We prove that (i) the maximum-likelihood estimate (MLE) is biased, (ii) the variability of the MLE is far greater than classically estimated, and (iii) the likelihood-ratio test (LRT) is not distributed as a X². The bias of the MLE yields wrong predictions for the probability of a case based on observed values of the covariates. We present a theory, which provides explicit expressions for the asymptotic bias and variance of the MLE and the asymptotic distribution of the LRT. We empirically demonstrate that these results are accurate in finite samples. Our results depend only on a single measure of signal strength, which leads to concrete proposals for obtaining accurate inference in finite samples through the estimate of this measure.
In recent years, machine learning techniques have been widely deployed in various fields. However, machine learning faces problems like high computation overhead, low training accuracy, and poor ...security due to data silos, privacy issues and communication limitations, especially in the environment of cloud computing. Logistic regression (LR) is a popular machine learning method used for prediction, while current LR algorithms suffer from high computation cost and communication burden due to interactions between users and cloud servers. In this paper, we propose a Privacy-Preserving Multi-party Logistic Regression (PPMLR) algorithm, which achieves privacy-preserving and non-interactive gradient descent regression training in machine learning. PPMLR uses the Distributed two Trapdoors Public-Key Cryptosystem (DT-PKC) as a main building block, which satisfies additive homomorphic encryption. Specifically, users go off-line after encrypting local private data, then the service provider (SP) trains the global logistic regression model by interacting with the cloud server (CS), so that the confidentiality and privacy of user’s private data can be guaranteed during the training process. We prove by detailed security proof that PPMLR guarantees data and model privacy. Finally, we conduct experiments on two popular medical datasets from the UCI machine learning repository. The experimental results show that PPMLR can conduct privacy-preserving training efficiently. Comparison with the stat-of-the-art Privacy-Preserving Logistic Regression Algorithm (PPLRA) shows that the model training time is reduced by about 4 times.
•We propose a PPMLR scheme that supports authentication.•Our scheme is fault tolerant and does not need the data owner to interact with the cloud server.•Our scheme is enabled to create highly accurate models because it does not require the use of polynomial approximations for nonlinear activation functions.
•Conducted ecological momentary assessment with residents of four European cities.•Positive mood improved within 10 min of natural environment exposure.•Findings varied by city, gender, age and ...residential green space exposure.•Negative mood declined within 10 min of natural environment exposure.•Weaker associations were found within 30 min of natural environment exposure.
Exposure to natural outdoor environments (NOE) has been shown in population-level studies to reduce anxiety and psychological distress. This study investigated how exposure to one’s everyday natural outdoor environments over one week influenced mood among residents of four European cities including Barcelona (Spain), Stoke-on-Trent (United Kingdom), Doetinchem (The Netherlands) and Kaunas (Lithuania). Participants (n = 368) wore a smartphone equipped with software applications to track location and mood (using mobile ecological momentary assessment (EMA) software), for seven consecutive days. We estimated random-effects ordered logistic regression models to examine the association between mood (positive and negative affect), and exposure to green space, represented by two binary variables indicating exposure versus no exposure to NOE using GPS tracking and satellite and aerial imagery, 10 and 30 min prior to participants’ completing the EMA. Models were adjusted for home city, day of the week, hour of the day, EMA survey type, residential NOE exposure, and sex, age, education level, mental health status and neighbourhood socioeconomic status. In addition, we tested for heterogeneity of effect by city, sex, age, residential NOE exposure and mental health status. Within 10 min of NOE exposure, compared to non-exposure, we found that overall there was a positive relationship with positive affect (OR: 1.39, 95% CI: 1.06, 1.81) of EMA surveys, and non-significant negative association with negative affect (OR: 0.80, 95% CI: 0.58, 1.10). When stratifying, associations were consistently found for Stoke-on-Trent inhabitants and men, while findings by age group were inconsistent. Weaker and less consistent associations were found for exposure 30 min prior to EMA. Our findings support increasing evidence of psychological and mental health benefits of exposure to natural outdoor environments, especially among urban populations such as those included in our study.