Towards Scaling Up Classification-Based Speech Separation Wang, Yuxuan; Wang, DeLiang
IEEE transactions on audio, speech and language processing/IEEE transactions on audio, speech, and language processing,
07/2013, Letnik:
21, Številka:
7
Journal Article
Recenzirano
Odprti dostop
Formulating speech separation as a binary classification problem has been shown to be effective. While good separation performance is achieved in matched test conditions using kernel support vector ...machines (SVMs), separation in unmatched conditions involving new speakers and environments remains a big challenge. A simple yet effective method to cope with the mismatch is to include many different acoustic conditions into the training set. However, large-scale training is almost intractable for kernel machines due to computational complexity. To enable training on relatively large datasets, we propose to learn more linearly separable and discriminative features from raw acoustic features and train linear SVMs, which are much easier and faster to train than kernel SVMs. For feature learning, we employ standard pre-trained deep neural networks (DNNs). The proposed DNN-SVM system is trained on a variety of acoustic conditions within a reasonable amount of time. Experiments on various test mixtures demonstrate good generalization to unseen speakers and background noises.
The formation of novel and complex structures with specific morphologies from nanocrystals via a direct assembly of atoms or ions remains challenging. In recent years, researchers have focused their ...attention on nanocrystals of noble metals and their controlled synthesis, characterization, and potential applications. Although the synthesis of various noble metal nanocrystals with different morphologies has been reported, most studies are limited to low-index facet-terminated nanocrystals. High-index facets, denoted by a set of Miller indices {hkl} with at least one index greater than unity, possess a high density of low-coordinated atoms, steps, edges, and kinks within these structures and serve as more active catalytic sites. With the potential for enhanced catalytic performance, researchers have used the insights from shape-controlled nanocrystal synthesis to construct noble metal nanocrystals bounded with high-index facets. Since the report of Pt tetrahexahedral nanocrystals, researchers have achieved significant progress and have prepared nanocrystals with various high-index facets. Because of the general order of surface energy for noble metals, high-index facets typically vanish faster in a crystal growth stage and are difficult to preserve on the surface of the final nanocrystals. Therefore researchers have had limited opportunities to examine high-indexed noble metal nanocrystals with a controlled morphology and investigate their resultant behaviors in depth. In this Account, we thoroughly discuss the basic concepts and state-of-the-art morphology control of some noble metal nanocrystals enclosed with high-index facets. We briefly introduce high-index facets from both crystallographic and geometrical points of view, both of which serve as methods to classify these high-index facets. Then, we summarize various typical noble metal nanocrystals terminated by different types of high-index facets, including {hk0} (h > k > 0), {hhl} (h > l > 0), {hkk} (h > k > 0), and {hkl} (h > k > l > 0). In each type, we describe several distinct morphologies including convex, concave, and other irregular shapes in detail. Based on these remarks, we discuss key factors that may induce the variations of Miller indices in each class, such as organic capping ligands and metallic cationic species. In a look at applications, we review several typical high-indexed noble metal nanocrystals showing enhanced electrocatalytic or chemical catalytic activities.
On Training Targets for Supervised Speech Separation Yuxuan Wang; Narayanan, Arun; DeLiang Wang
IEEE/ACM transactions on audio, speech, and language processing,
2014-Dec., 2014-Dec, 2014-12-00, 20141201, Letnik:
22, Številka:
12
Journal Article
Recenzirano
Odprti dostop
Formulation of speech separation as a supervised learning problem has shown considerable promise. In its simplest form, a supervised learning algorithm, typically a deep neural network, is trained to ...learn a mapping from noisy features to a time-frequency representation of the target of interest. Traditionally, the ideal binary mask (IBM) is used as the target because of its simplicity and large speech intelligibility gains. The supervised learning framework, however, is not restricted to the use of binary targets. In this study, we evaluate and compare separation results by using different training targets, including the IBM, the target binary mask, the ideal ratio mask (IRM), the short-time Fourier transform spectral magnitude and its corresponding mask (FFT-MASK), and the Gammatone frequency power spectrum. Our results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics. In addition, we find that masking based targets, in general, are significantly better than spectral envelope based targets. We also present comparisons with recent methods in non-negative matrix factorization and speech enhancement, which show clear performance advantages of supervised speech separation.
Achieving long-term stable zinc anodes at high currents/capacities remains a great challenge for practical rechargeable zinc-ion batteries. Herein, we report an imprinted gradient zinc electrode that ...integrates gradient conductivity and hydrophilicity for long-term dendrite-free zinc-ion batteries. The gradient design not only effectively prohibits side reactions between the electrolyte and the zinc anode, but also synergistically optimizes electric field distribution, zinc ion flux and local current density, which induces preferentially deposited zinc in the bottom of the microchannels and suppresses dendrite growth even under high current densities/capacities. As a result, the imprinted gradient zinc anode can be stably cycled for 200 h at a high current density/capacity of 10 mA cm
/10 mAh cm
, with a high cumulative capacity of 1000 mAh cm
, which outperforms the none-gradient counterparts and bare zinc. The imprinted gradient design can be easily scaled up, and a high-performance large-area pouch cell (4*5 cm
) is also demonstrated.
Audio pattern recognition is an important research topic in the machine learning area, and includes several tasks such as audio tagging, acoustic scene classification, music classification, speech ...emotion classification and sound event detection. Recently, neural networks have been applied to tackle audio pattern recognition problems. However, previous systems are built on specific datasets with limited durations. Recently, in computer vision and natural language processing, systems pretrained on large-scale datasets have generalized well to several tasks. However, there is limited research on pretraining systems on large-scale datasets for audio pattern recognition. In this paper, we propose pretrained audio neural networks (PANNs) trained on the large-scale AudioSet dataset. These PANNs are transferred to other audio related tasks. We investigate the performance and computational complexity of PANNs modeled by a variety of convolutional neural networks. We propose an architecture called Wavegram-Logmel-CNN using both log-mel spectrogram and waveform as input feature. Our best PANN system achieves a state-of-the-art mean average precision (mAP) of 0.439 on AudioSet tagging, outperforming the best previous system of 0.392. We transfer PANNs to six audio pattern recognition tasks, and demonstrate state-of-the-art performance in several of those tasks. We have released the source code and pretrained models of PANNs: https://github.com/qiuqiangkong/audioset_tagging_cnn .
Circular RNAs (circRNAs) play important regulatory roles in multiple human malignancies, including non-small cell lung cancer (NSCLC). Here, we explored the role of circRNA vacuole membrane protein 1 ...(circVMP1) in NSCLC progression and cisplatin (DDP) resistance.
The DDP resistance, proliferation, sphere formation ability, migration, invasion, and apoptosis of NSCLC cells were analyzed by Cell Counting Kit-8 (CCK8) assay, 5-ethynyl-2′-deoxyuridine (EdU) assay, sphere formation assay, wound healing assay, Transwell assay, and flow cytometry. Methylated RIP-qPCR (MeRIP-qPCR) was conducted to analyze the m
6
A modification level of SRY-box transcription factor 2 (SOX2). Dual-luciferase reporter assay, RNA immunoprecipitation (RIP) assay, and RNA-pull down assay were performed to confirm the intermolecular interaction. Exosomes were identified by transmission electron microscopy (TEM) and characterized by nanoparticle tracking analysis (NTA).
CircVMP1 expression was markedly elevated in DDP-resistant NSCLC cell lines compared with their parental cell lines. CircVMP1 absence restrained the proliferation, sphere formation, migration, invasion, and DDP resistance and promoted the apoptosis of DDP-resistant NSCLC cells. CircVMP1 acted as microRNA-524-5p (miR-524-5p) sponge to up-regulate the expression of methyltransferase 3, N6-adenosine-methyltransferase complex catalytic subunit (METTL3) and SOX2. CircVMP1 silencing restrained the malignant behaviors and DDP resistance of A549/DDP and H1299/DDP cells by targeting miR-524-5p. Exosomal circVMP1 disseminated the malignant properties and DDP resistance to DDP-sensitive cells. Exosomal circVMP1 elevated the DDP resistance of xenograft tumors in vivo. Exosomal circVMP1 was up-regulated in the serum samples of DDP-resistant NSCLC patients compared with DDP-sensitive patients.
Exosome-mediated transmission of circVMP1 promoted NSCLC progression and DDP resistance by targeting miR-524-5p-METTL3/SOX2 axis.
Highlights
CircVMP1 level is up-regulated in DDP-resistant NSCLC cell lines compared with DDP-sensitive cell lines.
CircVMP1 absence restrains the malignant behaviors and DDP resistance of A549/DDP and H1299/DDP cells.
CircVMP1-miR-524-5p/METTL3/SOX2 axis is identified for the first time.
CircVMP1 plays an oncogenic role by targeting miR-524-5p-METTL3/SOX2 axis in A549/DDP and H1299/DDP cells.
Exosomal circVMP1 transmits the malignant properties and DDP resistance to DDP-sensitive cells.
Complex Ratio Masking for Monaural Speech Separation Williamson, Donald S.; Wang, Yuxuan; Wang, DeLiang
IEEE/ACM transactions on audio, speech, and language processing,
03/2016, Letnik:
24, Številka:
3
Journal Article
Recenzirano
Odprti dostop
Speech separation systems usually operate on the short-time Fourier transform (STFT) of noisy speech, and enhance only the magnitude spectrum while leaving the phase spectrum unchanged. This is done ...because there was a belief that the phase spectrum is unimportant for speech enhancement. Recent studies, however, suggest that phase is important for perceptual quality, leading some researchers to consider magnitude and phase spectrum enhancements. We present a supervised monaural speech separation approach that simultaneously enhances the magnitude and phase spectra by operating in the complex domain. Our approach uses a deep neural network to estimate the real and imaginary components of the ideal ratio mask defined in the complex domain. We report separation results for the proposed method and compare them to related systems. The proposed approach improves over other methods when evaluated with several objective metrics, including the perceptual evaluation of speech quality (PESQ), and a listening test where subjects prefer the proposed approach with at least a 69% rate.
Monaural speech segregation has been a very challenging problem for decades. By casting speech segregation as a binary classification problem, recent advances have been made in computational auditory ...scene analysis on segregation of both voiced and unvoiced speech. So far, pitch and amplitude modulation spectrogram have been used as two main kinds of time-frequency (T-F) unit level features in classification. In this paper, we expand T-F unit features to include gammatone frequency cepstral coefficients (GFCC), mel-frequency cepstral coefficients, relative spectral transform (RASTA) and perceptual linear prediction (PLP). Comprehensive comparisons are performed in order to identify effective features for classification-based speech segregation. Our experiments in matched and unmatched test conditions show that these newly included features significantly improve speech segregation performance. Specifically, GFCC and RASTA-PLP are the best single features in matched-noise and unmatched-noise test conditions, respectively. We also find that pitch-based features are crucial for good generalization to unseen environments. To further explore complementarity in terms of discriminative power, we propose to use a group Lasso approach to select complementary features in a principled way. The final combined feature set yields promising results in both matched and unmatched test conditions.
Estimating exposures to PM2.5 within urban areas requires surface PM2.5 concentrations at high temporal and spatial resolutions. We developed a mixed effects model to derive daily estimations of ...surface PM2.5 levels in Beijing, using the 3 km resolution satellite aerosol optical depth (AOD) calibrated daily by the newly available high-density surface measurements. The mixed effects model accounts for daily variations of AOD-PM2.5 relationships and shows good performance in model predictions (R 2 of 0.81–0.83) and cross-validations (R 2 of 0.75–0.79). Satellite derived population-weighted mean PM2.5 for Beijing was 51.2 μg/m3 over the study period (Mar 2013 to Apr 2014), 46% higher than China’s annual-mean PM2.5 standard of 35 μg/m3. We estimated that more than 19.2 million people (98% of Beijing’s population) are exposed to harmful level of long-term PM2.5 pollution. During 25% of the days with model data, the population-weighted mean PM2.5 exceeded China’s daily PM2.5 standard of 75 μg/m3. Predicted high-resolution daily PM2.5 maps are useful to identify pollution “hot spots” and estimate short- and long-term exposure. We further demonstrated that a good calibration of the satellite data requires a relatively large number of ground-level PM2.5 monitoring sites and more are still needed in Beijing.
We show that a random interacting model exhibits solvable non-Fermi-liquid behavior and exotic pairing behavior. This model, dubbed as the Yukawa-SYK model, describes the random Yukawa coupling ...between M quantum dots each hosting N flavors of fermions and N^{2} bosons that self-tune to criticality at low energies. The diagrammatic expansion is controlled by 1/MN, and the results become exact in a large-M, large-N limit. We find that pairing only develops within a region of the (M,N) plane-even though the pairing interaction is strongly attractive, the incoherence of the fermions can spoil the forming of Cooper pairs, rendering the system a non-Fermi liquid down to zero temperature. By solving the Eliashberg equation and the renormalization group equation, we show that the transition into the pairing phase exhibits Kosterlitz-Thouless quantum-critical behavior.