In this paper, we propose the utterance-level permutation invariant training (uPIT) technique. uPIT is a practically applicable, end-to-end, deep-learning-based solution for speaker independent ...multitalker speech separation. Specifically, uPIT extends the recently proposed permutation invariant training (PIT) technique with an utterance-level cost function, hence eliminating the need for solving an additional permutation problem during inference, which is otherwise required by frame-level PIT. We achieve this using recurrent neural networks (RNNs) that, during training, minimize the utterance-level separation error, hence forcing separated frames belonging to the same speaker to be aligned to the same output stream. In practice, this allows RNNs, trained with uPIT, to separate multitalker mixed speech without any prior knowledge of signal duration, number of speakers, speaker identity, or gender. We evaluated uPIT on the WSJ0 and Danish two- and three-talker mixed-speech separation tasks and found that uPIT outperforms techniques based on nonnegative matrix factorization and computational auditory scene analysis, and compares favorably with deep clustering, and the deep attractor network. Furthermore, we found that models trained with uPIT generalize well to unseen speakers and languages. Finally, we found that a single model, trained with uPIT, can handle both two-speaker, and three-speaker speech mixtures.
Reducing carbon emissions and managing the aging crisis represent two major challenges in China that involve various requirements for continued economic growth. This paper investigated the ...relationships between population factors and carbon emissions and further explored the impact of population aging on carbon emissions at the national and regional levels based on the STIRPAT model and provincial panel data from China. Our results show that at the national level, population aging and population quality are positively correlated with China's carbon emissions. The impact of the population living standard on carbon emissions exhibits an urban-rural difference. At the regional level, the impact of population aging on carbon emissions exhibits regional differences.
In this paper, we propose novel strategies for neutral vector variable decorrelation. Two fundamental invertible transformations, namely, serial nonlinear transformation and parallel nonlinear ...transformation, are proposed to carry out the decorrelation. For a neutral vector variable, which is not multivariate-Gaussian distributed, the conventional principal component analysis cannot yield mutually independent scalar variables. With the two proposed transformations, a highly negatively correlated neutral vector can be transformed to a set of mutually independent scalar variables with the same degrees of freedom. We also evaluate the decorrelation performances for the vectors generated from a single Dirichlet distribution and a mixture of Dirichlet distributions. The mutual independence is verified with the distance correlation measurement. The advantages of the proposed decorrelation strategies are intensively studied and demonstrated with synthesized data and practical application evaluations.
Metaplasticity, a higher order of synaptic plasticity, as well as a key issue in neuroscience, is realized with artificial synapses based on a WO3 thin film, and the activity‐dependent metaplastic ...responses of the artificial synapses, such as spike‐timing‐dependent plasticity, are systematically investigated. This work has significant implications in neuromorphic computation.
Many deep learning-based speech enhancement algorithms are designed to minimize the mean-square error (MSE) in some transform domain between a predicted and a target speech signal. However, ...optimizing for MSE does not necessarily guarantee high speech quality or intelligibility, which is the ultimate goal of many speech enhancement algorithms. Additionally, only little is known about the impact of the loss function on the emerging class of time-domain deep learning-based speech enhancement systems. We study how popular loss functions influence the performance of time-domain deep learning-based speech enhancement systems. First, we demonstrate that perceptually inspired loss functions might be advantageous over classical loss functions like MSE. Furthermore, we show that the learning rate is a crucial design parameter even for adaptive gradient-based optimizers, which has been generally overlooked in the literature. Also, we found that waveform matching performance metrics must be used with caution as they in certain situations can fail completely. Finally, we show that a loss function based on scale-invariant signal-to-distortion ratio (SI-SDR) achieves good general performance across a range of popular speech enhancement evaluation metrics, which suggests that SI-SDR is a good candidate as a general-purpose loss function for speech enhancement systems.
With the development of speech synthesis technology, automatic speaker verification (ASV) systems have encountered the serious challenge of spoofing attacks. In order to improve the security of ASV ...systems, many antispoofing countermeasures have been developed. In the front-end domain, much research has been conducted on finding effective features which can distinguish spoofed speech from genuine speech and the published results show that dynamic acoustic features work more effectively than static ones. In the back-end domain, Gaussian mixture model (GMM) and deep neural networks (DNNs) are the two most popular types of classifiers used for spoofing detection. The log-likelihood ratios (LLRs) generated by the difference of human and spoofing log-likelihoods are used as spoofing detection scores. In this paper, we train a five-layer DNN spoofing detection classifier using dynamic acoustic features and propose a novel, simple scoring method only using human log-likelihoods (HLLs) for spoofing detection. We mathematically prove that the new HLL scoring method is more suitable for the spoofing detection task than the classical LLR scoring method, especially when the spoofing speech is very similar to the human speech. We extensively investigate the performance of five different dynamic filter bank-based cepstral features and constant Q cepstral coefficients (CQCC) in conjunction with the DNN-HLL method. The experimental results show that, compared to the GMM-LLR method, the DNN-HLL method is able to significantly improve the spoofing detection accuracy. Compared with the CQCC-based GMM-LLR baseline, the proposed DNN-HLL model reduces the average equal error rate of all attack types to 0.045%, thus exceeding the performance of previously published approaches for the ASVspoof 2015 Challenge task. Fusing the CQCC-based DNN-HLL spoofing detection system with ASV systems, the false acceptance rate on spoofing attacks can be reduced significantly.
Artificial neurons with functions such as leaky integrate‐and‐fire (LIF) and spike output are essential for brain‐inspired computation with high efficiency. However, previously implemented artificial ...neurons, e.g., Hodgkin–Huxley (HH) neurons, integrate‐and‐fire (IF) neurons, and LIF neurons, only achieve partial functionality of a biological neuron. In this work, quasi‐HH neurons with leaky integrate‐and‐fire functions are physically demonstrated with a volatile memristive device, W/WO3/poly(3,4‐ethylenedioxythiophene): polystyrene sulfonate/Pt. The resistive switching behavior of the device can be attributed to the migration of protons, unlike the migration of oxygen ions normally involved in oxide‐based memristors. With multifunctions similar to their biological counterparts, quasi‐HH neurons are advantageous over the reported HH and LIF neurons, demonstrating their potential for neuromorphic computing applications.
Quasi‐Hodgkin–Huxley (HH) neurons with leaky integrate‐and‐fire functions are physically demonstrated by W/WO3/poly(3,4‐ethylenedioxythiophene):polystyrene sulfonate/Pt memristive devices with a battery effect; in the device, proton migration plays a key role. With the help of a neuromorphic circuit, the neuron successfully emulates the multifunction of a biological neuron, being advantageous over previously reported HH and leaky integrate‐and‐fire neurons.
We propose a novel deep learning training criterion, named permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem. ...Different from the multi-class regression technique and the deep clustering (DPCL) technique, our novel approach minimizes the separation error directly. This strategy effectively solves the long-lasting label permutation problem, that has prevented progress on deep learning based techniques for speech separation. We evaluated PIT on the WSJ0 and Danish mixed-speech separation tasks and found that it compares favorably to non-negative matrix factorization (NMF), computational auditory scene analysis (CASA), and DPCL and generalizes well over unseen speakers and languages. Since PIT is simple to implement and can be easily integrated and combined with other advanced techniques, we believe improvements built upon PIT can eventually solve the cocktail-party problem.
This paper presents a rule-based adaptive protection scheme using machine-learning methodology for microgrids in extensive distribution automation (DA). The uncertain elements in a microgrid are ...first analysed quantitatively by Pearson correlation coefficients from data mining. Then, a so-called hybrid artificial neural network and support vector machine (ANN-SVM) model is proposed for state recognition in microgrids, which utilises the growing massive data streams in smart grids. Based on the state recognition in the algorithm, adaptive reconfigurations can be implemented with enhanced decision-making to modify the protective settings and the network topology to ensure the reliability of the intelligent operation. The effectiveness of the proposed methods is demonstrated on a microgrid model in Aalborg, Denmark and an IEEE 9 bus model, respectively.