The main objective of the present study was to compare the performance of a classifier that implements the Logistic Regression and a classifier that employs a Naïve Bayes algorithm in landslide ...susceptibility assessments. The study provides an evaluation concerning the influence of model's complexity and the size of the training data, while it identifies the most accurate and reliable classifier.
The comparison of the two classifiers was based on the assessment of a database containing 116 sites located at the mountains of Epirus, Greece, where serious landslides events have been encountered. The sites are classified into two categories, non-landslide and landslide areas. The identification of those areas was established by analysing airborne imagery, extensive field investigation and the examination of previous research studies. The geo-environmental conditions in those locations where analyzed in regard with their susceptibility to slide. In particular, seven variables where analyzed: engineering geological units, slope angle, slope aspect, mean annual rainfall, distance from river network, distance from tectonic features and distance from road network.
Multicollinearity analysis and feature selection was implemented in order to estimate the conditional independence among the variables and to rank the variables according to their significance in estimating landslide susceptibility. By the above processes the construction of nine different datasets was accomplished. Further partition allowed creating subsets of training and validating data from the original 116 sites. Each dataset was characterized by the number of the variables used and the size of the training datasets.
The comparison and validation of the outcomes of each model was achieved using statistical evaluation measures, the receiving operating characteristic and the area under the success and predictive rate curves. The results indicated that model's complexity and the size of the training dataset influence the accuracy and the predictive power of the models concerning landslide susceptibility. In particular, the most accurate model with high predictive power was the eighth model (five variables and 92 training data), with the Naïve Bayes classifier having a slightly higher overall performance and accuracy than the Logistic Regression classifier, 87.50% and 82.61% on the validation datasets, respectively. The highest area under the curve was achieved by the Naïve Bayes classifier for both the training and validating datasets (0.875 and 0.806 respectively) while the Logistic Regression classifier achieved a lower AUC values for the training and validating datasets (0.844 and 0.711, respectively). When limited data are available it seems that more accurate and reliable results could be obtained by generative classifiers, like Naïve Bayes classifiers. Overall, landslide susceptibility assessments could serve as a useful tool for the local and national authorities, in order to evaluate strategies to prevent and mitigate the adverse impacts of landslide events.
•Logistic regression and Naïve Bayes were used in landslide susceptibility zoning.•Model complexity and the size of training data influence the prediction accuracy.•The reduction in model's complexity improved the generalization performance.•The Naïve Bayes model outperforms the Logistic regression.
In most low- and middle-income countries, milk is produced by smallholders, thereby contributing to the livelihood of their households. With the increasing importance of milk production in these ...countries, it is essential that milk quality is of a high level to ensure a safe product for consumers. It is, however, unclear whether smallholder dairy farmers are aware of the quality of their milk. The aim of this cross-sectional study was to gain insight on Indonesian smallholder dairy farmer awareness of milk quality parameters and to identify factors associated with the total plate count (TPC) and somatic cell count (SCC). A stratified sampling method was used to select smallholder farms in 4 districts in West Java, Indonesia, that were interviewed between August and September 2017. Factors putatively associated with awareness of TPC were investigated with multinomial regression models, whereas a Firth-type logistic regression was applied to identify factors associated with SCC awareness. Of the total 600 farmers surveyed, 264 (44%), 109 (18%), 170 (28%), 111 (19%), and 23 (4%) farmers were aware of TPC, total solid, fat content, milk density, and SCC, respectively, but did not know its value. Those that were conceptually aware of these quality parameters were generally unaware of their value. Furthermore, this study revealed that the following variables were significantly associated with dairy farmers' awareness of TPC: cooperative to which the farmer belonged, distance to neighboring dairy farmer, technology adoption index, TPC as the most important quality factor for the buyer, milk production information from cooperatives, and cow health information from veterinarians. Similarly, cooperative, dairy business experience, and milk quality test adoption were significantly associated with dairy farmers' awareness of SCC. Cooperative was the only variable that was significant in both final statistical models. This indicates that cooperatives play an important role in increasing farmer awareness of milk quality parameters in these smallholder dairies. This may be valid for other regions in the world also where milk production is dominated by smallholder dairy farmers.
The Ising model has received significant attention in network psychometrics during the past decade. A popular estimation procedure is IsingFit, which uses nodewise
l
1
-regularized logistic ...regression along with the extended Bayesian information criterion to establish the edge weights for the network. In this paper, we report the results of a simulation study comparing IsingFit to two alternative approaches: (1) a nonregularized nodewise stepwise logistic regression method, and (2) a recently proposed global
l
1
-regularized logistic regression method that estimates all edge weights in a single stage, thus circumventing the need for nodewise estimation. MATLAB scripts for the methods are provided as supplemental material. The global
l
1
-regularized logistic regression method generally provided greater accuracy and sensitivity than IsingFit, at the expense of lower specificity and much greater computation time. The stepwise approach showed considerable promise. Relative to the
l
1
-regularized approaches, the stepwise method provided better average specificity for all experimental conditions, as well as comparable accuracy and sensitivity at the largest sample size.
Climate change and habitat loss are both key threatening processes driving the global loss in biodiversity. Yet little is known about their synergistic effects on biological populations due to the ...complexity underlying both processes. If the combined effects of habitat loss and climate change are greater than the effects of each threat individually, current conservation management strategies may be inefficient and at worst ineffective. Therefore, there is a pressing need to identify whether interacting effects between climate change and habitat loss exist and, if so, quantify the magnitude of their impact. In this article, we present a meta‐analysis of studies that quantify the effect of habitat loss on biological populations and examine whether the magnitude of these effects depends on current climatic conditions and historical rates of climate change. We examined 1319 papers on habitat loss and fragmentation, identified from the past 20 years, representing a range of taxa, landscapes, land‐uses, geographic locations and climatic conditions. We find that current climate and climate change are important factors determining the negative effects of habitat loss on species density and/or diversity. The most important determinant of habitat loss and fragmentation effects, averaged across species and geographic regions, was current maximum temperature, with mean precipitation change over the last 100 years of secondary importance. Habitat loss and fragmentation effects were greatest in areas with high maximum temperatures. Conversely, they were lowest in areas where average rainfall has increased over time. To our knowledge, this is the first study to conduct a global terrestrial analysis of existing data to quantify and test for interacting effects between current climate, climatic change and habitat loss on biological populations. Understanding the synergistic effects between climate change and other threatening processes has critical implications for our ability to support and incorporate climate change adaptation measures into policy development and management response.
Display omitted
•A deep investigation of previous works and a qualitative comparison among them.•Using a new feature selection method based on logistic regression to reduce computational cost.•Using ...a novel deep neural network, which can be tuned to increase accuracy or decrease computational cost.•Using three different datasets to evaluate the proposed methods.•Using a comprehensive set of evaluation metrics, including false positive rate.•Reaching a high level of accuracy for all three datasets.
Breast cancer is the most common cancer among women such that the existence of a precise and reliable system for the diagnosis of benign or malignant tumors is critical. Nowadays, using the results of Fine Needle Aspiration (FNA) cytology and machine learning techniques, detection and early diagnosis of this cancer can be done with greater accuracy. In this paper, we propose a method consisting of two steps: in the first step, to eliminate the less important features, logistic regression has been used. In the second step, the Group Method Data Handling (GMDH) neural network is used for the diagnosis of benign and malignant samples. To evaluate the performance of the proposed method, three datasets WBCD, WDBC and WPBC are investigated with metrics: precision, the Area Under the ROC (AUC), true positive rate, false positive rate, accuracy and F-criteria. Simulation results show that the proposed method reaches a precision of 99.4% for WBCD, 99.6% for WDBC and a precision of 96.9% for WPBC dataset.
Fully Homomorphic encryption (FHE) has been gaining in popularity as an emerging means of enabling an unlimited number of operations in an encrypted message without decryption. A major drawback of ...FHE is its high computational cost. Specifically, a bootstrapping step that refreshes the noise accumulated through consequent FHE operations on the ciphertext can even take minutes of time. This significantly limits the practical use of FHE in numerous real applications.By exploiting the massive parallelism available in FHE, we demonstrate the first instance of the implementation of a GPU for bootstrapping CKKS, one of the most promising FHE schemes supporting the arithmetic of approximate numbers. Through analyzing CKKS operations, we discover that the major performance bottleneck is their high main-memory bandwidth requirement, which is exacerbated by leveraging existing optimizations targeted to reduce the required computation. These observations motivate us to utilize memory-centric optimizations such as kernel fusion and reordering primary functions extensively.Our GPU implementation shows a 7.02× speedup for a single CKKS multiplication compared to the state-of-the-art GPU implementation and an amortized bootstrapping time of 0.423us per bit, which corresponds to a speedup of 257× over a single-threaded CPU implementation. By applying this to logistic regression model training, we achieved a 40.0× speedup compared to the previous 8-thread CPU implementation with the same data.
This study assessed potential risk factors associated with introduction of Mycobacterium avium ssp. paratuberculosis (MAP) into dairy cattle herds in the Galicia region, northwestern Spain. The study ...was carried out with data collected from 93 dairies enrolled in a voluntary MAP control program. Information on potential risk factors was obtained through personal interviews with the farmers and veterinarians in charge of the control program of each farm. In addition, blood samples were taken annually over 2 years from cows on the farms in the program, and analyzed with a commercial ELISA to detect antibodies to MAP. Fecal samples of all ELISA-positive cows were analyzed using PCR. Based on χ2 test and Fisher's exact test, purchase practices, shared manure truck, shared materials, and visitors per month who contacted animals were found to be significantly associated with farm MAP infection status. Multiple logistic regression indicated that purchase practices and herd size (included as a potential confounder) are the variables that best predict MAP status.
The aim of the study is to select adequate econometric instruments for building a scoring model on a specific array of initial data, which contains the vast majority of fictitious variables. Despite ...a significant number of developments devoted to the construction of scoring models, a universal method allowing to obtain a highly efficient classifier for any data has not been identified. Therefore, the task of selection of the best method for building a scoring model remains relevant, depending on the characteristics of the available data. The most successful approach when selecting a model for solving the problem of binary classification is the use of several types of econometric models and the choice of the best of them according to the results of classification. In the presented study, the following types of models were applied: discriminant model, logit and probit regressions, and polynomial logistic regression. Training samples with different structure were used. Comparison of all obtained models allows us to conclude that polynomial logistic regression is preferable in this case. This model demonstrates high classification rates for all introduced object classes and has an important advantage compared to models that make a binary selection. The advantage of polynomial logistic regression is also the possibility of selecting in each case a convenient scale for dividing borrowers into more than two classes and determining the level of probability of reliability of the borrower acceptable for its own conditions, at which it should be assigned to one of the selected classes. Prospects for further research in this direction are the use of machine learning methods that will be able to use ensembles of the best of the considered models. In addition, the proposed models can be used in solving similar problems in other spheres of economic activity.
Along with the increasing availability of electronic medical record (EMR) data, phenome-wide association studies (PheWAS) and phenome-disease association studies (PheDAS) have become a prominent, ...first-line method of analysis for uncovering the secrets of EMR
.
Despite this recent growth, there is a lack of approachable software tools for conducting these analyses on large-scale EMR cohorts. In this article, we introduce
pyPheWAS
, an open-source python package for conducting PheDAS and related analyses. This toolkit includes 1) data preparation, such as cohort censoring and age-matching; 2) traditional PheDAS analysis of ICD-9 and ICD-10 billing codes; 3) PheDAS analysis applied to a novel EMR phenotype mapping: current procedural terminology (CPT) codes; and 4) novelty analysis of significant disease-phenotype associations found through PheDAS. The pyPheWAS toolkit is approachable and comprehensive, encapsulating data prep through result visualization all within a simple command-line interface. The toolkit is designed for the ever-growing scale of available EMR data, with the ability to analyze cohorts of 100,000 + patients in less than 2 h. Through a case study of Down Syndrome and other intellectual developmental disabilities, we demonstrate the ability of pyPheWAS to discover both known and potentially novel disease-phenotype associations across different experiment designs and disease groups. The software and user documentation are available in open source at
https://github.com/MASILab/pyPheWAS
.