Large volumes of data from material characterizations call for rapid and automatic data analysis to accelerate materials discovery. Herein, we report a convolutional neural network (CNN) that was ...trained based on theoretical data and very limited experimental data for fast identification of experimental X-ray diffraction (XRD) patterns of metal–organic frameworks (MOFs). To augment the data for training the model, noise was extracted from experimental data and shuffled; then it was merged with the main peaks that were extracted from theoretical spectra to synthesize new spectra. For the first time, one-to-one material identification was achieved. Theoretical MOFs patterns (1012) were augmented to a whole data set of 72 864 samples. It was then randomly shuffled and split into training (58 292 samples) and validation (14 572 samples) data sets at a ratio of 4:1. For the task of discriminating, the optimized model showed the highest identification accuracy of 96.7% for the top 5 ranking on a test data set of 30 hold-out samples. Neighborhood component analysis (NCA) on the experimental XRD samples shows that the samples from the same material are clustered in groups in the NCA map. Analysis on the class activation maps of the last CNN layer further discloses the mechanism by which the CNN model successfully identifies individual MOFs from the XRD patterns. This CNN model trained by the data augmentation technique would not only open numerous potential applications for identifying XRD patterns for different materials, but also pave avenues to autonomously analyze data by other characterization tools such as FTIR, Raman, and NMR spectroscopies.
In this paper, we present a novel technique of constructing phonetic decision trees (PDTs) for acoustic modeling in conversational speech recognition. We use random forests (RFs) to train a set of ...PDTs for each phone state unit and obtain multiple acoustic models accordingly. We investigate several methods of combining acoustic scores from the multiple models, including maximum-likelihood estimation of the weights of different acoustic models from training data, as well as using confidence score of -value or relative entropy to obtain the weights dynamically from online data. Since computing acoustic scores from the multiple models slows down decoding search, we propose clustering methods to compact the RF-generated acoustic models. The conventional concept of PDT-based state tying is extended to RF-based state tying. On each RF tied state, we cluster the Gaussian density functions (GDFs) from multiple acoustic models into classes and compute a prototype for each class to represent the original GDFs. In this way, the number of GDFs in each RF tied state is decreased greatly, which significantly reduces the time for computing acoustic scores. Experimental results on a telemedicine automatic captioning task demonstrate that the proposed RF-PDT technique leads to significant improvements in word recognition accuracy.
We propose a novel approach of using Cross Validation (CV) and Speaker Clustering (SC) based data samplings to construct an ensemble of acoustic models for speech recognition. We also investigate the ...effects of the existing techniques of Cross Validation Expectation Maximization (CVEM), Discriminative Training (DT), and Multiple Layer Perceptron (MLP) features on the quality of the proposed ensemble acoustic models (EAMs). We have evaluated the proposed methods on TIMIT phoneme recognition task as well as on a telemedicine automatic captioning task. The proposed methods have led to significant improvements in recognition accuracy over conventional Hidden Markov Model (HMM) baseline systems, and the integration of EAMs with CVEM, DT, and MLP has also significantly improved the accuracy performances of the single model systems based on CVEM, DT, and MLP, where the increased inter-model diversity is shown to have played an important role in the performance gain.
We investigate a structured sparse spectral transform method for voice conversion (VC) to perform frequency warping and spectral shaping simultaneously on high-dimensional (D) STRAIGHT spectra. ...Learning a large transform matrix for high-D data often results in an overfit matrix with low sparsity, which leads to muffled speech in VC. We address this problem by using the frequency-warping characteristic of a source-target speaker pair to define a region of support (ROS) in a transform matrix, and further optimize it by nonnegative matrix factorization (NMF) to obtain structured sparse transform. We also investigate structural measures of spectral and temporal covariance and variance at different scales for assessing VC speech quality. Our experiments on ARCTIC dataset of 12 speaker pairs show that embedding the ROS in spectral transforms offers flexibility in tradeoffs between spectral distortion and structure preservation, and the structural measures provide quantitatively reasonable results on converted speech. Our subjective listening tests show that the proposed VC method achieves a mean opinion score of "very good" relative to natural speech, and in comparison with three other VC methods, it is the most preferred one in naturalness and in voice similarity to target speakers.
We propose to exploit the potential of multiple word clusterings in class-based recurrent neural network (RNN) language models for ensemble RNN language modeling. By varying the clustering criteria ...and the space of word embedding, different word clusterings are obtained to define different word/class factorizations. For each such word/class factorization, several base RNNLMs are learned, and the word prediction probabilities of the base RNNLMs are then combined to form an ensemble prediction. We use a greedy backward model selection procedure to select a subset of models and combine these models for word prediction. The proposed ensemble language modeling method has been evaluated on Penn Treebank test set as well as Wall Street Journal (WSJ) Eval 92 and 93 test sets, where it improved test set perplexity and word error rate over the state-of-the-art single RNNLMs as well as multiple RNNLMs produced by varying RNN learning conditions.
Novel, general methods for detecting landmine signatures in ground penetrating radar (GPR) using hidden Markov models (HMMs) are proposed and evaluated. The methods are evaluated on real data ...collected by a GPR mounted on a moving vehicle at three different geographical locations. A large library of digital GPR signatures of both landmines and clutter/background was constructed and used for training. Simple, but effective, observation vector representations are constructed to naturally model the time-varying signatures produced by the interaction of the GPR and the landmines as the vehicle moves. The number and definition of the states of the HMMs are based on qualitative signature models. The model parameters are optimized using the Baum-Welch algorithm. The models were trained on landmine and background/clutter signatures from one geographical location and successfully tested at two different locations. The data used in the test were acquired from over 6000 m/sup 2/ of simulated dirt and gravel roads, and also off-road conditions. These data contained approximately 300 landmine signatures, over half of which were plastic-cased or completely nonmetal.
Amyotrophic lateral sclerosis (ALS) results in progressive paralysis of voluntary muscles throughout the body. As speech deteriorates, individuals rely on pre-programmed messages available on ...commercial speech generating devices to communicate using one of the generic electronic voices on the device. To replace these generic voices and restore vocal identity, our aim is to develop personalized voices for people with ALS via the approach of voice conversion. The task is challenging because very few people have large quantities of their premorbid healthy speech recorded. Therefore, we have to rely on small quantities of dysarthric speech concomitant with an individual's disease stage. Further, progressive fatigue prohibits acquisition of large speech datasets and individuals display a range of dysarthria severities resulting from breathing, voice, articulation, resonance, and prosody disturbances. As the first step to address these problems, we use healthy source speakers and propose the approach of combining a structured sparse spectral transform with multiple linear regression-based frequency warping prediction for spectral conversion, and interpolating the transformed spectral frames for speech rate modification. Our experimental data included four healthy source speakers from the ARCTIC dataset, and four target ALS speakers with mild to severe dysarthria, forming 16 speaker pairs. Subjective listening evaluations showed that on average, (i) the proposed approach improved speech intelligibility by about 80% over the target speakers' speech, (ii) the converted voice was 3 times more similar to the target speakers' speech than to the source speakers' speech, and (iii) the converted speech quality was close to the MOS scale "good" relative to the source speakers' speech being "excellent."
This study investigated the association between attachment avoidance and internalizing symptoms and the moderating role of parasympathetic nervous activity (indexed by respiratory sinus arrhythmia ...withdrawal) in the association.
A sample of 109 (Mage = 18.94 years old, SD = 0.92; 69 male) Chinese college students participated in this study. Participants reported attachment avoidance and internalizing symptoms, and their physiological data were collected.
Results showed a positive link between attachment avoidance and internalizing symptoms. Further, respiratory sinus arrhythmia (RSA) withdrawal and attachment avoidance interactively predicted internalizing symptoms. Specifically, the positive relation between attachment avoidance and internalizing symptoms was only found among people of low, but not high, levels of RSA withdrawal.
Our study highlighted the importance of considering psychophysiological interactions in predicting internalizing symptoms in college students, and contributed to our understanding of the complicated factors underlying college students' internalizing problems.
•Attachment avoidance (AA) was positively associated with internalizing symptoms (IS).•The positive relation between AA and IS depends on individual level of respiratory sinus arrhythmia (RSA) withdrawal.•AA was positively related to IS only when RSA withdrawal was low and average, rather than high.
► We perform real and imaginary spectral subtractions in modulation frequency domain. ► Our method enhances both magnitude and phase speech spectra from noise. ► Our method showed superior outcomes ...in segmental SNR, PESQ, and averaged ISD. ► Our method showed superior outcome in mean preference score of listening evaluation. ► We analyze the factors contributing to our method’s winning performance.
In this paper, we propose a novel spectral subtraction method for noisy speech enhancement. Instead of taking the conventional approach of carrying out subtraction on the magnitude spectrum in the acoustic frequency domain, we propose to perform subtraction on the real and imaginary spectra separately in the modulation frequency domain, where the method is referred to as MRISS. By doing so, we are able to enhance magnitude as well as phase through spectral subtraction. We conducted objective and subjective evaluation experiments to compare the performance of the proposed MRISS method with three existing methods, including modulation frequency domain magnitude spectral subtraction (MSS), nonlinear spectral subtraction (NSS), and minimum mean square error estimation (MMSE). The objective evaluation used the criteria of segmental signal-to-noise ratio (Segmental SNR), PESQ, and average Itakura–Saito spectral distance (ISD). The subjective evaluation used a mean preference score with 14 participants. Both objective and subjective evaluation results have demonstrated that the proposed method outperformed the three existing speech enhancement methods. A further analysis has shown that the winning performance of the proposed MRISS method comes from improvements in the recovery of both acoustic magnitude and phase spectrum.
The aim of this study was to test the relation of baseline respiratory sinus arrhythmia (BRSA) to PIU and the mediating role of impulsiveness (IM) and difficulties in emotion regulation (DER) ...underlying this association using data from a group of Chinese young adults. A total of 109 (Mage = 18.94 years old, SD = 0.92; 69 men) Chinese undergraduate students participated in this study and completed questionnaires on IM, DER and PIU. BRSA was collected during a resting condition at a laboratory setting. Results showed that BRSA was negatively related to PIU, with IM and DER both independently and completely mediating this association. These findings enrich our understanding on the psychophysiological correlates of young adults' problematic internet use.
•Baseline respiratory sinus arrhythmia (BRSA) was negatively related to problematic internet use (PIU).•The association between BRSA and PIU was mediated by impulsiveness.•The association between BRSA and PIU was mediated by difficulties in emotion regulation.