Vision transformers (ViTs) achieve remarkable performance on large datasets, but tend to perform worse than convolutional neural networks (CNNs) when trained from scratch on smaller datasets, ...possibly due to a lack of local inductive bias in the architecture. Recent studies have therefore added locality to the architecture and demonstrated that it can help ViTs achieve performance comparable to CNNs in the small-size dataset regime. Existing methods, however, are architecture-specific or have higher computational and memory costs. Thus, we propose a module called Local InFormation Enhancer (LIFE) that extracts patch-level local information and incorporates it into the embeddings used in the self-attention block of ViTs. Our proposed module is memory and computation efficient, as well as flexible enough to process auxiliary tokens such as the classification and distillation tokens. Empirical results show that the addition of the LIFE module improves the performance of ViTs on small image classification datasets. We further demonstrate how the effect can be extended to downstream tasks, such as object detection and semantic segmentation. In addition, we introduce a new visualization method, Dense Attention Roll-Out, specifically designed for dense prediction tasks, allowing the generation of class-specific attention maps utilizing the attention maps of all tokens. The code for this project is available on Github (https://github.com/NeurAI-Lab/LIFEhttps://github.com/NeurAI-Lab/LIFE).
Display omitted
•Introducing LIFE module to complement global info with local context in ViT•Versatile & Efficient: LIFE integrates seamlessly into ViT, with minimal costs.•Boost Small-Dataset Results: LIFE enhances ViTs on small datasets & dense tasks.•Introducing Dense Attention Roll-Out to visualize attention in dense tasks.
Deep neural network (DNN) exhibits state-of-the-art performance in many fields including microstructure recognition where big dataset is used in training. However, DNN trained by conventional methods ...with small datasets commonly shows worse performance than traditional machine learning methods, e.g. shallow neural network and support vector machine. This inherent limitation prevented the wide adoption of DNN in material study because collecting and assembling big dataset in material science is a challenge. In this study, we attempted to predict solidification defects by DNN regression with a small dataset that contains 487 data points. It is found that a pre-trained and fine-tuned DNN shows better generalization performance over shallow neural network, support vector machine, and DNN trained by conventional methods. The trained DNN transforms scattered experimental data points into a map of high accuracy in high-dimensional chemistry and processing parameters space. Though DNN with big datasets is the optimal solution, DNN with small datasets and pre-training can be a reasonable choice when big datasets are unavailable in material study.
Display omitted
•The deep neural network model for predicting solidification cracking susceptibility of stainless steels are developed.•Stacked auto-encoder is used to pre-train deep neural network with a small dataset for optimization of initial weights.•Deep neural network model shows better generalization performance than shallow neural network and support vector machine.
Due to the proliferation of biomedical imaging modalities, such as Photoacoustic Tomography, Computed Tomography (CT), Optical Microscopy and Tomography, etc., massive amounts of data are generated ...on a daily basis. While massive biomedical data sets yield more information about pathologies, they also present new challenges of how to fully explore the data. Data fusion methods are a step forward towards a better understanding of data by bringing multiple data observations together to increase the consistency of the information. However, data generation is merely the first step, and there are many other factors involved in the fusion process like noise, missing data, data scarcity, and high dimensionality. In this paper, an overview of the advances in data preprocessing in biomedical data fusion is provided, along with insights stemming from new developments in the field.
Display omitted
•VHCF life prediction was explored with SVM, ANN and Z-parameter based PINN models.•Datasets were extended by the Z-parameter model with sound reliability.•An evaluation method was ...constructed to analyze and compare model performance.•A larger dataset was found having more desirable implications for model prediction.•Z-parameter based PINN model outperformed others in terms of predictive accuracy and reliability.
The research on life prediction for mechanical structures in very high cycle fatigue regime is pivotal to improve structure service, but it can be costly and time-consuming to collect fatigue data. In response, the data-driven approach of machine learning emerged as a solution to data insufficiency. In this work, after extracting a small dataset of GCr15 bearing steel subjected to very high cycle fatigue tests from open literature, the Z-parameter model was applied to obtain extended datasets to establish models driven by support vector machine, artificial neural network, and Z-parameter based physics-informed neural network, respectively. With training on extended datasets and the original data as test set, fatigue life prediction for GCr15 steel was carried out and evaluated between these models. Results showed that the physics-informed neural network calibrated by Z-parameter model trained on a larger dataset featured more accurate and reliable prediction than other models did, which demonstrated effectiveness of Z-parameter in data extension and model construction as priori physics knowledge for a data-driven approach. Looking into the future, Z-parameter model deserves more attention to its employment in life prediction for more engineering materials and structures serving in the very high cycle fatigue regime.
Fault diagnosis techniques (FDT) face the challenge of implementing model learning in the presence of limited, imbalanced, or non-ideal data, which is a fundamental and crucial problem that hinders ...their applications in real industrial scenarios. In this paper, a novel deep neural network (DNN), densely-connected semi-Bayesian network (DSBNet) is proposed to implement feature learning of vibration signals for machinery fault diagnosis in non-ideal data scenarios. Firstly, deep Bayesian learning is embedded into the multi-scale semi-Bayesian block (MSBB) of DSBNet as a local feature extraction and enhancement module. The re-parameterization operations of Bayesian convolutional layers in MSBB perform uncertainty inference on the features by learning the mean and variance of the Gaussian convolution kernel, achieving local expansion and enhancement of network features. Furthermore, convolutional features are integrated into MSBB to generate multi-scale semi-Bayesian features. An adaptive selector based on the multi-class and multi-scale attention mechanism is proposed to enhance effective semi-Bayesian features and suppress redundant features. The proposed methodology facilitates the adaptive end-to-end training of DSBNet, which enables it to match the scale of the current dataset and achieve optimal performance. The effectiveness of DSBNet is verified on two testbeds and multiple in-service computing devices in real industry. The testing results illustrate that DSBNet outperforms other state-of-the-art DNNs, especially in the non-ideal training data scenarios.
Ultrasound imaging has been widely used for tumor detection and diagnosis. In ultrasound based computer-aided diagnosis, feature representation is a crucial step. In recent years, deep learning (DL) ...has achieved great success in feature representation learning. However, it generally suffers from the small sample size problem. Since the medical datasets usually have small training samples, texture features are still very commonly used for small ultrasound image datasets. Compared with the commonly used DL algorithms, the newly proposed deep polynomial network (DPN) algorithm not only shows superior performance on large scale data, but also has the potential to learn effective feature representation from a relatively small dataset. In this work, a stacked DPN (S-DPN) algorithm is proposed to further improve the representation performance of the original DPN, and S-DPN is then applied to the task of texture feature learning for ultrasound based tumor classification with small dataset. The task tumor classification is performed on two image dataset, namely the breast B-mode ultrasound dataset and prostate ultrasound elastography dataset. In both cases, experimental results show that S-DPN achieves the best performance with classification accuracies of 92.40±1.1% and 90.28±2.78% on breast and prostate ultrasound datasets, respectively. This level of accuracy is significantly superior to all other compared algorithms in this work, including stacked auto-encoder and deep belief network. It suggests that S-DPN can be a strong candidate for the texture feature representation learning on small ultrasound datasets.
•We employ DPN to learn texture feature representation for small ultrasound dataset.•We propose the stacked DPN (S-DPN) algorithm for representation learning.•We apply S-DPN to the ultrasound-based tumor classification task.•S-DPN can significantly improve representation performance for small ultrasound dataset.
In the past couple of years, machine learning (ML) has been widely leveraged in discovering functional materials. However, several difficulties seriously impede the application of ML in the field of ...thermoset shape memory polymers (TSMPs), e.g., the intractable feature identification or fingerprinting, inadequate experimental data on recovery stress, programming stress, strain, and lack of multilength scale structural information. Hence there is currently a lack of studies towards ML-assisted discovery of TSMPs. In this study, we propose a series of methodologies to cope with the difficulties, i.e., adopting the most recently proposed linear notation BigSMILES in fingerprinting, supplementing existing dataset by reasonable approximation, leveraging a mixed dimension (1D and 2D) input model, and a type of dual-convolutional-model framework. By doing these, a new ML framework for predicting the recovery stresses of TSMPs is developed, which is validated by synthesizing and testing two new epoxy networks predicted by the ML model. By forging new TSMPs space with 4,459 samples, the ML model identified and screened 14 mostly unknown TSMPs with greater recovery stress than the known TSMPs. One of the 14 predicted polymers was validated by molecular dynamics (MD) simulation. This study demonstrates the capability of our methodologies for discovering new TSMPs with desired recovery stress by a small training dataset, and may be adopted for discovering new TSMPs with other desired properties.
Display omitted
•14 new thermoset shape memory polymers (TSMPs) discovered by machine learning.•BigSMILES used to fingerprint the TSMP molecular structures.•Dual convolutional neural networks (CNNs) established for machine learning.•Features represented by mixed-dimension (1D and 2D) data.•Predictions validated by both experiments and molecular dynamics (MD) simulations.
•Results of wet corrosion studies are predicted using AI methodologies.•Artificial neural network ML algorithm was used to mode the real systems.•Virtual sample generation was conducted with the help ...of CTGAN algorithm.•MSE, RMS and correlation determination were used to validate the method.
The present work deals with the development of a QSAR model based on Artificial Neural Network (ANN) algorithm for the prediction of inhibition efficiencies of 2-alkyl benzimidazole scaffold-based corrosion inhibitors for mild steel corrosion in 1 M HCl. The small dataset problems have been dealt with a Virtual Sample Generation (VSG) methodology using CTGAN algorithm, and credibility and generalizability of the proposed ANN+VSG model has been verified through experimental validation. Two new 2-alkyl benzimidazole scaffold-based corrosion inhibitor compounds namely EBIMOT and PBIMOT, were synthesised, their inhibition efficiencies were experimentally obtained, and the values showed a high resemblance with the predicted one by the model with good accuracy. Synthetic data embedding to the training samples enhances the model's recognition capacity of feature-target relationship and hence stabilizing and improving the correlation of the chemical quantum descriptors with the inhibition efficiency. This proposed method strengthens the prospect of ML for developing material designs, especially in the case of small datasets at a much cost-efficient, user-friendly, and accurate manner and can open doors to new and unexplored venues in the intersection of material science and computational intelligence.
Display omitted
Predicting damage modes of reinforced concrete (RC) panels subjected to impact loading is a difficult task and often involves considerable effort in doing experiments or simulations. The development ...of missiles in terms of strength and destructive power requires an accurate estimation of future damage levels. In addition, data collected from the experiment are often small. Therefore, this study aims to build an artificial neural network (ANN) model to classify the damage modes of RC panel under impact loading and enhance its performance by optimizing the model’s hyperparameters when learning a small dataset (254 observes for four classes in this study). To address this a novel optimization strategy was proposed and two metaheuristic optimization algorithms, i.e., genetic algorithm (GA) and particle swarm optimization (PSO) were presented for automatic selection of hyperparameters to increase the accuracy of the ANN model. The proposed optimization strategy was developed based on the incorporation of a stepwise gridsearch (SG) method into a nested cross-validation (NCV) process to find the optimal parameters for the ANN model, named SG-NCV-ANN. The efficiency of the proposed SG-NCV-ANN model and two hybrid models including ANN-GA, ANN-PSO are evaluated by comparing to each other and other machine-learning-based classification methods including ANN using a randomized cross-validation search (RCV-ANN), oblique random forest (oRF), support vector machine (SVM) and k-nearest neighbors (k-NN). Accuracy, micro f1 score, receiver operating characteristic (ROC) curve, and area under the ROC curve (AUC) were employed to thoroughly assess the obtained results from the model. The experimental results indicated that the ANN-GA model achieves the highest AUC and f1 score compared to other state-of-the-art methods, following by the ANN-PSO model. While the proposed SG-NCV-ANN model obtained the best generalization performance on the present small dataset.
•A novel optimization strategy for neural networks was proposed for a limited dataset.•Genetic algorithm and particle swarm optimization were executed for the comparison.•The impact test dataset was well learned by the artificial neural network model.•Four impact damage levels were well classified using the proposed models.•The proposed models outperformed other state-of-the-art models in generalization.