Abstract In 2023, Turkiye faced a series of devastating earthquakes and these earthquakes affected millions of people due to damaged constructions. These earthquakes demonstrated the urgent need for ...advanced automated damage detection models to help people. This study introduces a novel solution to address this challenge through the AttentionPoolMobileNeXt model, derived from a modified MobileNetV2 architecture. To rigorously evaluate the effectiveness of the model, we meticulously curated a dataset comprising instances of construction damage classified into five distinct classes. Upon applying this dataset to the AttentionPoolMobileNeXt model, we obtained an accuracy of 97%. In this work, we have created a dataset consisting of five distinct damage classes, and achieved 97% test accuracy using our proposed AttentionPoolMobileNeXt model. Additionally, the study extends its impact by introducing the AttentionPoolMobileNeXt-based Deep Feature Engineering (DFE) model, further enhancing the classification performance and interpretability of the system. The presented DFE significantly increased the test classification accuracy from 90.17% to 97%, yielding improvement over the baseline model. AttentionPoolMobileNeXt and its DFE counterpart collectively contribute to advancing the state-of-the-art in automated damage detection, offering valuable insights for disaster response and recovery efforts.
A physics-informed neural network is developed to solve conductive heat transfer partial differential equation (PDE), along with convective heat transfer PDEs as boundary conditions (BCs), in ...manufacturing and engineering applications where parts are heated in ovens. Since convective coefficients are typically unknown, current analysis approaches based on trial-and-error finite element (FE) simulations are slow. The loss function is defined based on errors to satisfy PDE, BCs and initial condition. An adaptive normalizing scheme is developed to reduce loss terms simultaneously. In addition, theory of heat transfer is used for feature engineering. The predictions for 1D and 2D cases are validated by comparing with FE results. While comparing with theory-agnostic ML methods, it is shown that only by using physics-informed activation functions, the heat transfer beyond the training zone can be accurately predicted. Trained models were successfully used for real-time evaluation of thermal responses of parts subjected to a wide range of convective BCs.
Display omitted
•A Physics-informed neural network is developed to solve heat transfer PDE.•Boundary conditions in terms of convective coefficients are used as NN features.•Feature engineering is used to train accurate NN models beyond the training zone.•An adaptive normalizing scheme is developed to scale loss terms during training.•Trained model was successfully validated against 1D and 2D FE simulations.
•Compared performances of both complex deep learning and simple statistical learning models with different level of feature engineering in modeling the IoT testbed system.•Demonstrated that feature ...engineering plays a key role in developing successful data-driven machine learning models for predicting key process information of the IoT testbed.•Demonstrated that IoT sensors have great potential in advancing manufacturing process modeling and monitoring.
As IoT-enabled manufacturing is still in its infancy, there are several key research gaps that need to be addressed. These gaps include the understanding of the characteristics of the big data generated from industrial IoT sensors, the challenges they present to process data analytics, as well as the specific opportunities that the IoT big data could bring to advance manufacturing. In this paper, we use an inhouse-developed IoT-enabled manufacturing testbed to study the characteristics of the big data generated from the testbed. Since the quality of the data usually has the most impact on process modeling, data veracity is often the most challenging characteristic of big data. To address that, we explore the role of feature engineering in developing effective machine learning models for predicting key process variables. We compare complex deep learning approaches to a simple statistical learning approach, with different level or extent of feature engineering, to explore their pros and cons for potential industrial IoT-enabled manufacturing applications.
Data engineering for fraud detection Baesens, Bart; Höppner, Sebastiaan; Verdonck, Tim
Decision Support Systems,
November 2021, 2021-11-00, 20211101, Volume:
150
Journal Article
Peer reviewed
Open access
Financial institutions increasingly rely upon data-driven methods for developing fraud detection systems, which are able to automatically detect and block fraudulent transactions. From a machine ...learning perspective, the task of detecting suspicious transactions is a binary classification problem and therefore many techniques can be applied. Interpretability is however of utmost importance for the management to have confidence in the model and for designing fraud prevention strategies. Moreover, models that enable the fraud experts to understand the underlying reasons why a case is flagged as suspicious will greatly facilitate their job of investigating the suspicious transactions. Therefore, we propose several data engineering techniques to improve the performance of an analytical model while retaining the interpretability property. Our data engineering process is decomposed into several feature and instance engineering steps. We illustrate the improvement in performance of these data engineering steps for popular analytical models on a real payment transactions data set.
•Companies increasingly rely upon data-driven methods for detecting fraud.•Data engineering is of utmost importance to improve the performance of most machine learning models.•Our data engineering process is decomposed into several feature and instance engineering steps.•The benefits of data engineering is illustrated on a payment transactions data set from a large European Bank.
High-throughput data generation methods and machine learning (ML) algorithms have given rise to a new era of computational materials science by learning the relations between composition, structure, ...and properties and by exploiting such relations for design. However, to build these connections, materials data must be translated into a numerical form, called a representation, that can be processed by an ML model. Data sets in materials science vary in format (ranging from images to spectra), size, and fidelity. Predictive models vary in scope and properties of interest. Here, we review context-dependent strategies for constructing representations that enable the use of materials as inputs or outputs for ML models. Furthermore, we discuss how modern ML techniques can learn representations from data and transfer chemical and physical information between tasks. Finally, we outline high-impact questions that have not been fully resolved and thus require further investigation.
The purpose of present review paper is to introduce the reader to key directions of Machine Learning techniques on the diagnosis and predictions of knee osteoarthritis.
This survey was based on ...research articles published between 2006 and 2019. The articles were divided into four categories, namely (i) predictions/regression, (ii) classification, (iii) optimum post-treatment planning techniques and (iv) segmentation. The grouping was based on the application domain of each study.
The survey findings are reported outlining the main characteristics of the proposed learning algorithms, the application domains, the data sources investigated and the quality of the results.
Knee osteoarthritis is a big data problem in terms of data complexity, heterogeneity and size as it has been commonly considered in the literature. Machine Learning has attracted significant interest from the scientific community to cope with the aforementioned challenges and thus lead to new automated pre- or post-treatment solutions that utilize data from the greatest possible variety of sources.
Due to digitization, a huge volume of data is being generated across several sectors such as healthcare, production, sales, IoT devices, Web, organizations. Machine learning algorithms are used to ...uncover patterns among the attributes of this data. Hence, they can be used to make predictions that can be used by medical practitioners and people at managerial level to make executive decisions. Not all the attributes in the datasets generated are important for training the machine learning algorithms. Some attributes might be irrelevant and some might not affect the outcome of the prediction. Ignoring or removing these irrelevant or less important attributes reduces the burden on machine learning algorithms. In this work two of the prominent dimensionality reduction techniques, Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are investigated on four popular Machine Learning (ML) algorithms, Decision Tree Induction, Support Vector Machine (SVM), Naive Bayes Classifier and Random Forest Classifier using publicly available Cardiotocography (CTG) dataset from University of California and Irvine Machine Learning Repository. The experimentation results prove that PCA outperforms LDA in all the measures. Also, the performance of the classifiers, Decision Tree, Random Forest examined is not affected much by using PCA and LDA.To further analyze the performance of PCA and LDA the eperimentation is carried out on Diabetic Retinopathy (DR) and Intrusion Detection System (IDS) datasets. Experimentation results prove that ML algorithms with PCA produce better results when dimensionality of the datasets is high. When dimensionality of datasets is low it is observed that the ML algorithms without dimensionality reduction yields better results.
For the smart grid energy theft identification, this letter introduces a gradient boosting theft detector (GBTD) based on the three latest gradient boosting classifiers (GBCs): 1) extreme gradient ...boosting; 2) categorical boosting; and 3) light gradient boosting method. While most of existing machine learning (ML) algorithms just focus on fine tuning the hyperparameters of the classifiers, our ML algorithm, GBTD, focuses on the feature engineering-based preprocessing to improve detection performance as well as time-complexity. GBTD improves both detection rate and false positive rate (FPR) of those GBCs by generating stochastic features like standard deviation, mean, minimum, and maximum value of daily electricity usage. GBTD also reduces the classifier complexity with weighted feature-importance-based extraction techniques. Emphasis has been laid upon the practical application of the proposed ML for theft detection by minimizing FPR and reducing data storage space and improving time-complexity of the GBTD classifiers. Additionally, this letter proposes an updated version of the existing six theft cases to mimic real-world theft patterns and applies them to the dataset for numerical evaluation of the proposed algorithm.
•Through scripting and 3D finite element method, a large number of stress intensity factors of different cracked lugs were calculated, establishing a comprehensive database.•Various complex ...influencing factors in actual situations are also considered, such as the impact of the interference fit between the pin and the lug, as well as different loading directions and crack positions of tapered lugs.•By incorporating physical features, the predictive capability of neural networks for SIF can be significantly enhanced.•For tapered lugs that are not applicable to the formulas, physical features through analytical formulas can still reduce prediction errors due to structural similarities.•For the nonlinearity induced by interference fit contact, adding a certain number of polynomial features can greatly reduce model errors.
The lug type joint serves as a critical connecting structure in aerospace vehicles, allowing for convenient assembly and disassembly, and transfer the load during service. However, cracks often initiate at the stress concentration location near the lugs hole, posing a significant threat to the safety of the aircraft structure. Therefore, evaluating the stress intensity factors for complex lug structures is of great significance. In this study, a physics knowledge-based neural network method of calculating the stress intensity factors of attachment lugs is proposed. Three types of cracked lug structures are analyzed: through-thickness crack in straight lug, quarter elliptical corner crack in straight lug and quarter elliptical corner crack in tapered lug. Various complex influencing factors in actual situations are also considered, such as the impact of the interference fit between the pin and the lug, as well as different loading directions and crack positions of tapered lugs. The stress intensity factor dataset generated by 3D finite element method, and then the neural network with its powerful learning ability and prior physical knowledge are used to achieve accurate fitting of the data. The results demonstrate that incorporating physical knowledge features significantly improves the model performance of the neural network. This method can be extended to analyze crack in structures with arbitrary geometry and complex loading conditions.
By data mining and cross assembly strategies, three promising MOFs for Xe/Kr separation was designed.
Display omitted
•Different from previous works either high-throughput screening or assembly of ...MOF materials, in this work, we assembled novel MOFs aiming to the separation of Xe/Kr based on the SAPR high-throughput screening results of MOFs;•The key “genes” (metal nodes and organic linkers) governing the Xe uptake were found from the screened 1,499 frameworks based on SAPR by data-mining technique of feature engineering, and then three novel promising MOFs were crossly assembled;•The assembled Al2O6-fum_B-hmof8_No1 has both large Xe uptake (4.2857 mmol/g) and Xe/Kr selectivity (19.70), overcoming the key “trade-off” problem. Because the large 1D pore channel of Al2O6-fum_B-hmof8_No1 can accommodate double-atom chain with large electrostatic potential gradient (EPG).
The adsorptive separation of xenon (Xe) and krypton (Kr) becomes increasingly important for the treatment of used nuclear fuel (UNF), and thus the novel high-performing metal–organic frameworks (MOFs) on Xe/Kr adsorption separation are urgently needed. In this work, the 200 MOFs formally used for ethane/ethylene separation were adapted to construct structure-adsorption property relationships (SAPR) for Xe/Kr mixture (20/80 v/v) at 298 K and 1 bar, to screen for MOFs with large Xe/Kr selectivities and Xe uptakes in the CoRE MOF, G-MOFs and hMOFs databases with more than 320,000 structures. Then based on the screened 1499 MOFs, the important metal nodes and organic linkers of MOFs (genes) governing the Xe uptake were identified by data-mining of feature engineering, which were assembled crossly into three novel promising MOFs according to material genomics strategy. After considering of the regenerabilities, it is found that Xe uptake (4.2857 mmol/g) and Xe/Kr selectivity (19.70) of the assembled Al2O6-fum_B-hmof8_No1 are larger than those of the most of experimentally synthesized frameworks including Al-Fum-Me, overcoming the “trade-off” problem between adsorption selectivity and capacity. From the multiscale calculations at GCMC and DFT levels, it is found that the large 1D pore size that can accommodate a double-atom chain and large electrostatic potential gradient (EPG) should be responsible for the high Xe uptake and Xe/Kr selectivity of Al2O6-fum_B-hmof8_No1, respectively. Note that the present work first report double-atom chain of rare gas adsorbed in MOF materials. The present data mining and cross assembly strategies are expected to assist the discovery of novel high-performing MOF absorbents for the separation of Xe-Kr even light hydrocarbon in the future.