It is well known that active learning can simultaneously improve the quality of the classification model and decrease the complexity of training instances. However, several previous studies have ...indicated that the performance of active learning is easily disrupted by an imbalanced data distribution. Some existing imbalanced active learning approaches also suffer from either low performance or high time consumption. To address these problems, this paper describes an efficient solution based on the extreme learning machine (ELM) classification model, called active online-weighted ELM (AOW-ELM). The main contributions of this paper include: 1) the reasons why active learning can be disrupted by an imbalanced instance distribution and its influencing factors are discussed in detail; 2) the hierarchical clustering technique is adopted to select initially labeled instances in order to avoid the missed cluster effect and cold start phenomenon as much as possible; 3) the weighted ELM (WELM) is selected as the base classifier to guarantee the impartiality of instance selection in the procedure of active learning, and an efficient online updated mode of WELM is deduced in theory; and 4) an early stopping criterion that is similar to but more flexible than the margin exhaustion criterion is presented. The experimental results on 32 binary-class data sets with different imbalance ratios demonstrate that the proposed AOW-ELM algorithm is more effective and efficient than several state-of-the-art active learning algorithms that are specifically designed for the class imbalance scenario.
The scale of the radius for constructing neighborhood relation has a great effect on the results of neighborhood rough sets and corresponding measures. A very small radius frequently brings us ...nothing because any two different samples are separated from each other, though these two samples have the same label. If the radius is growing, then there is a serious risk that samples with different labels may fall into the same neighborhood. Obviously, the radius based neighborhood relation does not take the labels of samples into account, which will lead to unsatisfactory discrimination. To fill such gap, a pseudo-label strategy is systematically studied in rough set theory. Firstly, a pseudo-label neighborhood relation is proposed. Such relation can differentiate samples by not only the distance but also the pseudo labels of samples. Therefore, both the neighborhood rough set and some corresponding measures can be re-defined. Secondly, attribute reductions are explored based on the re-defined measures. The heuristic algorithm is also designed to compute reducts. Finally, the experimental results over UCI data sets tell us that our pseudo-label strategy is superior to the traditional neighborhood approach. This is mainly because the former can significantly reduce the uncertainties and improve the classification accuracies. The Wilcoxon signed rank test results also show that neighborhood approach and pseudo-label neighborhood approach are so different from the viewpoints of the measures and attribute reductions in rough set theory.
In DNA microarray data, class imbalance problem occurs frequently, causing poor prediction performance for minority classes. Moreover, its other features, such as high-dimension, small sample, high ...noise etc., intensify this damage. In this study, we propose ACOSampling that is a novel undersampling method based on the idea of ant colony optimization (ACO) to address this problem. The algorithm starts with feature selection technology to eliminate noisy genes in data. Then we randomly and repeatedly divided the original training set into two groups: training set and validation set. In each division, one modified ACO algorithm as a variant of our previous work is conducted to filter less informative majority samples and search the corresponding optimal training sample subset. At last, the statistical results from all local optimal training sample subsets are given in the form of frequence list, where each frequence indicates the importance of the corresponding majority sample. We only extracted those high frequency ones and combined them with all minority samples to construct the final balanced training set. We evaluated the method on four benchmark skewed DNA microarray datasets by support vector machine (SVM) classifier, showing that the proposed method outperforms many other sampling approaches, which indicates its superiority.
► ACO algorithm is modified for undersampling skewed DNA microarray data. ► The significance of each majority sample is estimated by ranking frequence list. ► ACOSampling increases classification performance but spends more time. ► Selecting a few feature genes helps to improve classification performance. ► Some classification tasks are harmful and the others are unharmful.
Training classifiers on skewed data can be technically challenging tasks, especially if the data is high-dimensional simultaneously, the tasks can become more difficult. In biomedicine field, skewed ...data type often appears. In this study, we try to deal with this problem by combining asymmetric bagging ensemble classifier (asBagging) that has been presented in previous work and an improved random subspace (RS) generation strategy that is called feature subspace (FSS). Specifically, FSS is a novel method to promote the balance level between accuracy and diversity of base classifiers in asBagging. In view of the strong generalization capability of support vector machine (SVM), we adopt it to be base classifier. Extensive experiments on four benchmark biomedicine data sets indicate that the proposed ensemble learning method outperforms many baseline approaches in terms of Accuracy, F-measure, G-mean and AUC evaluation criterions, thus it can be regarded as an effective and efficient tool to deal with high-dimensional and imbalanced biomedical data.
This study aims to investigate the effect of miR-21-5p on process of colon adenocarcinoma (COAD) cells and its connection with neural cell adhesion molecule L1 (CHL1).
Different expressions of mRNAs ...and miRNAs were calculated with microarray analysis. QRT-PCR and western blot were performed to quantify miR-21-5p and CHL1 expression. Flow Cytometry, MTT assay, colony formation assay, transwell assay and ELISA were performed to evaluate propagation and invasiveness of COAD cells. Dual luciferase reporter assay was employed to scrutinize the relationship between miR-21-5P and CHL1. We performed in vivo experiment to detect the impact of miR-21-5p and CHL1 on COAD tumor growth.
Expression level of miR-21-5p increased in both COAD tissues and cells. MTT and Cell cycle assay showed that overexpression of miR-21-5p accelerated proliferation of COAD cells. Transwell assay indicated that miR-21-5p promoted cell invasion. The result of dual luciferase reporter assay indicated that miR-21-5p targeted CHL1 directly and inhibited its expression. The result of in vivo experiments showed that down-regulation of miR-21-5p decreased the volume and weight of tumor, while knockdown of CHLI stimulated tumor growth.
The overexpression of miR-21-5p can promote propagation and invasiveness of COAD cells through inhibiting the expression of CHL1.
This work reports the development of an electrochemical sensor using graphene-peptide conjugates for detecting colorectal cancer (CRC) biomarker leucine-rich alpha-2 glycoprotein-1 (LRG1). To enable ...LRG1 quantification, we rationally designed peptides with dual graphene anchoring motifs for optimal orientation and binding activity when immobilized on a reduced graphene oxide (rGO) electrochemical transducer surface. The graphene nanomaterial provides several advantages such as high conductivity, large surface area, and excellent stability that can enhance the sensor's analytical performance metrics. Furthermore, the synthetic peptides offer benefits like smaller size, specificity, ease of modification and cost-effective production compared to traditional antibody receptors. Under optimized conditions, the peptide sensor exhibited high sensitivity of 22.3 μA/(ng/mL.cm2), low limit of detection (75 pg/mL LRG1 in serum), accuracy of 101.1 % spiked recovery, and precision within 6 % RSD. Testing with colonoscopy-classified patient serum specimens discriminated normal, precancerous adenomatous polyps and malignant carcinoma stages based on LRG1 overexpression. A 24 % elevation for adenomas and 103 % higher levels in CRC were observed. Validation with spiked plasma samples indicated 97–104 % recovery and <7 % RSD, proving accurate detection capability. Comparison to antibody-based sensors showed superior linear range, sensitivity, reproducibility, and faster assay time. This demonstrates the promise of computational peptide designing combined with advanced nanomaterials for electrochemical detection of CRC progression through serum protein biomarkers.
Curcumin-loaded polypropylene/rice husk/SiO2 composites were fabricated with tunable nano-SiO2 filler loadings (0–10 wt%) via melt-blending and injection molding. XRD analysis revealed effective ...intercalation of nanoparticles and interaction with the polymer chains up to 6 wt% loading. Incorporation of SiO2 fillers significantly enhanced mechanical strength with 6 % nano-SiO2 formulation demonstrating 27 % higher tensile strength, 52 % increased flexural modulus and 51 % greater impact resistance. The silica nanoparticles also reduced heat release rate through protective charring and decreased moisture absorption by over 8 % due to hydrophilic surface groups. In vitro studies showed that curcumin nano-composites with 6 % nano-SiO2 exhibited maximal cytotoxicity against HT-29 and HCT-116 colon cancer cells with CC50 values of 12.4±1.1 μg/ml and 10.3±0.9 μg/ml respectively, attributed to optimal drug release profile. The formulation arrested 22.1% cells in sub-G1 phase inducing apoptosis. Oral administration in tumor-bearing mice led to 3-fold increase in plasma exposure (Cmax 186±14 ng/ml) and relative bioavailability versus free curcumin, indicating capacity to evade presystemic clearance. The composites also enabled 3 times higher tumor accumulation highlighting potential for targeted colorectal cancer therapy via improved pharmacokinetics and site-specific action. Overall, precise tuning of nanostructured fillers augments mechanical and transport properties while potentiating anticancer performance.
Social media emerged as an important resource of information to improve the emergency situation awareness of flooding disasters. However, the online microblog text stream is unstructured and ...unbalanced obviously. Given the big, real-time, and noisy flood disaster microblog text flow, a new regional emergency situation awareness model to automatic assess flood disaster risk is proposed. Firstly, according to the established online disaster event-meta frame, a multi-label classification algorithm for the flood microbloggings is constructed based on the historical dataset. This algorithm helps to assign the relevant event-meta tags to each situation microbloggings. Second, a new machine learning method for dynamic assessment of flood risk for online microbloggings is developed. The flood event-metas are considered to be feature vectors, and the four different levels of flood risk are considered to be four classes. Then, the flood risk assessment task is innovatively transformed into a multi-classification task. By the logistic regression ordered multi-classification algorithm, the dynamic quantitative evaluation of event-meta, users and regional risks is realized. Finally, the proposed model is applied in the case of the Yuyao Flood. The results of the case study show that the Yuyao Flood’s online quantitative risk assessment results are consistent with real accumulated precipitation data, which illustrate that the proposed machine learning model could realize the bottom-up automatic disaster information collecting by processing victim user-generated content effectively. Social media is proven to supplement the deficiencies of traditional disaster statistics and provide real-time, scientific information support for the implementation of flood emergency processes.
Gas explosion has always been an important factor restricting coal mine production safety. The application of machine learning techniques in coal mine gas concentration prediction and early warning ...can effectively prevent gas explosion accidents. Nearly all traditional prediction models use a regression technique to predict gas concentration. Considering there exist very few instances of high gas concentration, the instance distribution of gas concentration would be extremely imbalanced. Therefore, such regression models generally perform poorly in predicting high gas concentration instances. In this study, we consider early warning of gas concentration as a binary-class problem, and divide gas concentration data into warning class and non-warning class according to the concentration threshold. We proposed the probability density machine (PDM) algorithm with excellent adaptability to imbalanced data distribution. In this study, we use the original gas concentration data collected from several monitoring points in a coal mine in Datong city, Shanxi Province, China, to train the PDM model and to compare the model with several class imbalance learning algorithms. The results show that the PDM algorithm is superior to the traditional and state-of-the-art class imbalance learning algorithms, and can produce more accurate early warning results for gas explosion.
The credible data about the burden of early-onset colorectal cancer (EOCRC) in China when compared to other countries in the group of twenty (G20) remained unavailable. We aimed to assess the burden ...and trends of EOCRC and attributable risk factors in China. Meanwhile, the comparison in the burden and attributable risk factors between China and other G20 countries was also evaluated.
Data on the incidence, prevalence, mortality, disability-adjusted life years (DALYs), and attributable risk factors of EOCRC in China were obtained from Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2019 and compared with other G20countries. Temporal trends of age-standardized rates for incidence, prevalence, mortality, and DALYs were evaluated by estimated annual percentage change (EAPC). The autoregressive integrated moving average (ARIMA) model was used to forecast the incidence, mortality, and DALY rates of EOCRC in China from 2020 to 2029.
From 1990 to 2019, the age-standardized incidence rate (ASIR) and age-standardized prevalence rate (ASPR) of EOCRC in China increased with the EAPCs of 4.61 95% confidence interval (CI): 4.45-4.77 and 5.82 (95% CI: 5.60-6.05). When compared to G20 countries, China was ranked 13
in the ASIR in 1990 and then increased to 2
in 2019, second only to Japan. The ASPRs increased in all G20 countries, being highest in Saudi Arabia, followed by China and Mexico. Moreover, China had the highest age-standardized mortality rate and highest age-standardized DALY rate in 2019. In China, the five leading risk factors, for both sexes, were diet low in milk 18.54% (95% UI: 12.71-24.07), diet low in calcium 15.06% (95% UI: 10.70-20.03), alcohol use 12.16% (95% UI: 8.87-15.64), smoking 9.08% (95% UI: 3.39-14.11), and diet high in red meat 9.08% (95% UI: 3.39-14.11) in 2019. Over the next 10 years, ASIR, ASMR, and age-standardized DALY rate of EOCRC will increase continuously in males and females.
The burden of EOCRC in China and other G20 countries is worrisome, indicating that coordinated efforts are needed to conduct high-quality researches, allocate medical resources, adjust screening guidelines, and develop effective treatment and prevention strategies in the G20 countries.