The detection of abnormal electricity consumption behavior has been of great importance in recent years. However, existing research often focuses on algorithm improvement and ignores the process of ...obtaining features. The optimal feature set, which reflects customers' electricity consumption behavior, has a significant influence on the final detection results. Moreover, it is not straightforward to obtain datasets with label information. In this paper, a method based on feature engineering for unsupervised detection of abnormal electricity consumption behavior is proposed. First, the original feature set is constructed by brainstorming in the feature engineering step. Then, the optimal feature set, which reflects the customers' electricity consumption behavior, is obtained by features selected based on the variance and similarity between them. After that, in the abnormal detection step, a density-based clustering algorithm, in which the best clustering parameters are selected through iteration and evaluation, combined with unsupervised clustering evaluation indexes, is used to detect abnormal electricity consumption behaviors. Finally, using the load dataset of an industrial park, several typical feature strategies are applied for comparison with the feature engineering proposed in this paper. To perform the evaluation, the label information of abnormal behaviors is obtained by combining the original electricity consumption behavior detection results with abnormal data injections. The abnormal detection method proposed has given good results and outperformed typical feature strategies in an effective and generalizable way.
Accurate and reliable fault diagnosis is critical for battery systems to ensure their safe and stable operation. Battery faults cause severe decline of the pack performance and even lead to ...catastrophic thermal runaway events. This paper presents a vehicle-cloud collaborative method for multi-type fault diagnosis of lithium-ion batteries based on the cell difference model and machine learning. Firstly, experiments of different types of battery module faults are carried out to establish the simulation model of battery system. The charging-discharging conditions of normal and faulty battery modules are simulated to obtain massive cycle data for the algorithm training on the cloud. Then, the cell difference model is used to extract feature differences on the vehicle end. Combined with feature engineering and parameter optimization, the decision tree classifier is trained, and the judgment thresholds in the cloud algorithm are used for real-time tracking of vehicle signals to achieve the purpose of vehicle-cloud collaboration. Finally, the classifier is verified by multiple sets of experiments that can be carried out on the vehicle end. The results show that the proposed method can identify internal short circuit fault before end stage, and accurately distinguish conventional faults, including internal short circuit fault, resistance fault, and capacity fault.
•The method improves the efficiency of algorithm development.•The vehicle end estimates cell features in real time.•Massive data is created for cloud algorithm training using battery system simulation model.•The cloud applies fault classifier with high accuracy and low computational complexity.•The method is verified by multiple sets of experiments.
Good feature engineering is a prerequisite for accurate classification, especially in challenging scenarios such as detecting the breathing of living persons trapped under building rubble using ...bioradar. Unlike monitoring patients' breathing through the air, the measuring conditions of a rescue bioradar are very complex. The ultimate goal of search and rescue is to determine the presence of a living person, which requires extracting representative features that can distinguish measurements with the presence of a person and without. To address this challenge, we conducted a bioradar test scenario under laboratory conditions and decomposed the radar signal into different range intervals to derive multiple virtual scenes from the real one. We then extracted physical and statistical quantitative features that represent a measurement, aiming to find those features that are robust to the complexity of rescue-radar measuring conditions, including different rubble sites, breathing rates, signal strengths, and short-duration disturbances. To this end, we utilized two methods, Analysis of Variance (ANOVA), and Minimum Redundancy Maximum Relevance (MRMR), to analyze the significance of the extracted features. We then trained the classification model using a linear kernel support vector machine (SVM). As the main result of this work, we identified an optimal feature set of four features based on the feature ranking and the improvement in the classification accuracy of the SVM model. These four features are related to four different physical quantities and independent from different rubble sites.
Parkinson's Disease (PD) is currently the fastest growing neurodegenerative disease. It decreases the quality of life for patients, especially when not diagnosed properly and timely. Accurate ...diagnostic of PD is complicated by the fact that there exist several neurodegenerative diseases with similar motor symptoms, e.g. essential tremor. In this work, we report on a second opinion system based on the video analysis and classification of subjects using machine learning methods including feature extraction, dimensionality reduction and classification. Our approach serves for avoiding a typical misdiagnosis of PD by essential tremor. Consequently, we designed 15 common tasks and recorded the movement video. Video data was collected from 89 subjects at a medical center and labeled by doctors. We first demonstrate classification between the healthy subjects and subjects with PD suspected case followed by the classification between the subjects with true PD and the subjects with essential tremor. We achieved f1 score 0.90 for the first classification and f1 score 0.84 for the second classification. The proposed unobtrusive approach demonstrated its feasibility through a pilot study. It opens up wide vista for differentiating PD patients against other patients and not against a cohort of healthy subjects.
The classification of ships based on their trajectory descriptors is a common practice that is helpful in various contexts, such as maritime security and traffic management. For the most part, the ...descriptors are either geometric, which capture the shape of a ship’s trajectory, or kinematic, which capture the motion properties of a ship’s movement. Understanding the implications of the type of descriptor that is used in classification is important for feature engineering and model interpretation. However, this matter has not yet been deeply studied. This article contributes to feature engineering within this field by introducing proper similarity measures between the descriptors and defining sound benchmark classifiers, based on which we compared the predictive performance of geometric and kinematic descriptors. The performance profiles of geometric and kinematic descriptors, along with several standard tools in interpretable machine learning, helped us provide an account of how different ships differ in movement. Our results indicated that the predictive performance of geometric and kinematic descriptors varied greatly, depending on the classification problem at hand. We also showed that the movement of certain ship classes solely differed geometrically while some other classes differed kinematically and that this difference could be formulated in simple terms. On the other hand, the movement characteristics of some other ship classes could not be delineated along these lines and were more complicated to express. Finally, this study verified the conjecture that the geometric–kinematic taxonomy could be further developed as a tool for more accessible feature selection.
Dynamical systems play a fundamental role related understanding phenomena inherent to several fields of science. Technological advances over the previous several decades have generated a large amount ...of data that might be used in the inference of dynamical systems. Regardless of the sensor types adopted to perform the data acquisition procedure, it is useful to verify the existence of certain noise corruption in data. Generically, system identification is directly affected by noisy scenarios, which result in the false discovery of non-spurious models. In this work, we demonstrate how the hybridization of several machine learning techniques improves the robustness to noise with respect to the system identification assignment, advancing a pioneer methodology known as Sparse Identification of Nonlinear Dynamics (SINDy). Specifically, in the current work, we show the success of the proposed strategy from numerical examples, such as a logistic equation with forcing, Duffing oscillator, FitzHugh–Nagumo model, Lorenz attractor and a Susceptible–Infectious–Recovered modeling of SARS-CoV-2.
•A robustness to noise version of the Sparse Identification.•A feature engineering approach to improve the identification task.•A Hybridization of machine learning techniques.
Pathological complete response (pCR) is associated with favorable prognosis in patients with triple-negative breast cancer (TNBC). However, only 30-40% of TNBC patients treated with neoadjuvant ...chemotherapy (NAC) show pCR, while the remaining 60-70% show residual disease (RD). The role of the tumor microenvironment in NAC response in patients with TNBC remains unclear. In this study, we developed a machine learning-based two-step pipeline to distinguish between various histological components in hematoxylin and eosin (H&E)-stained whole slide images (WSIs) of TNBC tissue biopsies and to identify histological features that can predict NAC response.
H&E-stained WSIs of treatment-naïve biopsies from 85 patients (51 with pCR and 34 with RD) of the model development cohort and 79 patients (41 with pCR and 38 with RD) of the validation cohort were separated through a stratified eightfold cross-validation strategy for the first step and leave-one-out cross-validation strategy for the second step. A tile-level histology label prediction pipeline and four machine-learning classifiers were used to analyze 468,043 tiles of WSIs. The best-trained classifier used 55 texture features from each tile to produce a probability profile during testing. The predicted histology classes were used to generate a histology classification map of the spatial distributions of different tissue regions. A patient-level NAC response prediction pipeline was trained with features derived from paired histology classification maps. The top graph-based features capturing the relevant spatial information across the different histological classes were provided to the radial basis function kernel support vector machine (rbfSVM) classifier for NAC treatment response prediction.
The tile-level prediction pipeline achieved 86.72% accuracy for histology class classification, while the patient-level pipeline achieved 83.53% NAC response (pCR vs. RD) prediction accuracy of the model development cohort. The model was validated with an independent cohort with tile histology validation accuracy of 83.59% and NAC prediction accuracy of 81.01%. The histological class pairs with the strongest NAC response predictive ability were tumor and tumor tumor-infiltrating lymphocytes for pCR and microvessel density and polyploid giant cancer cells for RD.
Our machine learning pipeline can robustly identify clinically relevant histological classes that predict NAC response in TNBC patients and may help guide patient selection for NAC treatment.