Noise-Robust Scream Detection using Wave-U-Net HAYASAKA, Noboru; KASAI, Riku; FUTAGAMI, Takuya
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,
2023
Journal Article
Peer reviewed
Open access
In this paper, we propose a noise-robust scream detection method with the aim of expanding the scream detection system, a sound-based security system. The proposed method uses enhanced screams using ...Wave-U-Net, which was effective as a noise reduction method for noisy screams. However, the enhanced screams showed different frequency components from clean screams and erroneously emphasized frequency components similar to scream in noise. Therefore, Wave-U-Net was applied even in the process of training Gaussian mixture models, which are discriminators. We conducted detection experiments using the proposed method in various noise environments and determined that the false acceptance rate was reduced by an average of 2.1% or more compared with the conventional method.
Cardiovascular disease (CVD) is one of the most serious diseases threatening human health. Arterial blood pressure (ABP) waveforms, containing vivid cardiovascular information, are of great ...significance for the diagnosis and the prevention of CVD. This paper proposes a deep learning model, named ABP-Net, to transform photoplethysmogram (PPG) signals into ABP waveforms that contain vital physiological information related to cardiovascular systems. In order to guarantee the quality of the predicted ABP waveforms, the structure of the network, the input signals and the loss functions are carefully designed. Specifically, a Wave-U-Net, one kind of fully convolutional neural networks (CNN), is taken as the core architecture of the ABP-Net. Besides the original PPG signals, its first derivative and second derivative signals are all utilized as the inputs of the ABP-Net. Additionally, the maximal absolute loss, accompany with the mean squared error loss is employed to ensure the match of the predicted ABP waveform with the reference one. The performance of the proposed ABP network is tested on the public MIMIC II database both in subject-dependent and subject-independent manners. Both results verify the superior performance of the proposed model over those existing methods accordingly. The mean absolute error (MAE) and the root-mean-square error (RMSE) between the predicted waveforms via the ABP-Net and the reference ones are 3.20 mmHg and 4.38 mmHg during the subject-dependent experiments while those are 5.57 mmHg and 7.15 mmHg during the subject-independent experiments. Benefiting from the predicted high-quality ABP waveforms, more ABP related physiological parameters can be better obtained, which effectively expands the application scope of PPG devices.
•Propose an end-to-end ABP-Net to predict ABP waveforms from PPG signals for deriving more physiological parameters.•Employ the original, the first and the second derivatives of PPG signals as the inputs of the ABP-Net inputs to predict the ABP waveforms.•Conduct the calibration-based subject-independent experiment via meta-learning to reduce the amount of ABP reference data.
•We introduce a new convolutional network for speech enhancement: SEWUNet.•This network presents 4 simple techniques that enhance performance and efficiency.•Noise and word error reductions are ...achieved, with performance invariance to lengths.•An implementation is made available at https://github.com/Hguimaraes/SEWUNet.
In this paper, we present Speech Enhancement through Wave-U-Net (SEWUNet), an end-to-end approach to reduce noise from speech signals. This background context is detrimental to several downstream systems, including automatic speech recognition (ASR) and word spotting, which in turn can negatively impact end-user applications. We show that our proposal does improve signal-to-noise ratio (SNR) and word error rate (WER) compared with existing mechanisms in the literature. In the experiments, network input is a 16 kHz sample rate audio waveform corrupted by an additive noise. Our method is based on the Wave-U-Net architecture with some adaptations to our problem. Four simple enhancements are proposed and tested with ablation studies to prove their validity. In particular, we highlight the weight initialization through an autoencoder before training for the main denoising task, which leads to a more efficient use of training time and a higher performance. Through quantitative metrics, we show that our method is prefered over the classical Wiener filtering and shows a better performance than other state-of-the-art proposals.
Noise-Robust Scream Detection Using Wave-U-Net HAYASAKA, Noboru; KASAI, Riku; FUTAGAMI, Takuya
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,
04/2024, Volume:
E107.A, Issue:
4
Journal Article
Peer reviewed
Open access
In this paper, we propose a noise-robust scream detection method with the aim of expanding the scream detection system, a sound-based security system. The proposed method uses enhanced screams using ...Wave-U-Net, which was effective as a noise reduction method for noisy screams. However, the enhanced screams showed different frequency components from clean screams and erroneously emphasized frequency components similar to scream in noise. Therefore, Wave-U-Net was applied even in the process of training Gaussian mixture models, which are discriminators. We conducted detection experiments using the proposed method in various noise environments and determined that the false acceptance rate was reduced by an average of 2.1% or more compared with the conventional method.
With the development and widespread application of voice interaction technology, it has become crucial to improve the accuracy of blind source separation technology. In order to further enhance the ...separation results of vocal and accompaniment, this paper proposes an improved Wave-U-Net model. Based on the skip connection of the Wave-U-Net model, we propose a segmented attention module (SAM) consisting of a spatial attention module (SPAM) and a channel attention module (CAM) to replace the skip connections in this model to solve the semantic gap caused by feature concatenation. Furthermore, we replace the 1D convolution layer of the bottleneck layer in this model with an atrous spatial pyramid pooling (ASPP) module. The purpose is to increase the receptive field and obtain multi-scale features at the same time, thereby improving the speech separation performance of the model. We conduct experimental tests in the Musdb18 dataset, and analyze the performance of the model using the SDR, SIR and SAR evaluation indicators. The research results denote that compared with the Wave-U-Net network that only uses feature concatenation, the SDR values of the restored vocal and restored accompaniment are increased by 4.229dB and 4.626dB, respectively, and the separation performance is better than some existing baseline models.
The purpose of this paper is to show a music mixing system that is capable of automatically mixing separate raw recordings with good quality regardless of the music genre. This work recalls selected ...methods for automatic audio mixing first. Then, a novel deep model based on one-dimensional Wave-U-Net autoencoders is proposed for automatic music mixing. The model is trained on a custom-prepared database. Mixes created using the proposed system are compared with amateur, state-of-the-art software, and professional mixes prepared by audio engineers. The results obtained prove that mixes created automatically by Wave-U-Net can objectively be evaluated as highly as mixes prepared professionally. This is also confirmed by the statistical analysis of the results of the conducted listening tests. Moreover, the results show a strong correlation between the experience of the listeners in mixing and the likelihood of a higher rating of the Wave-U-Net-based and professional mixes than the amateur ones or the mix prepared using state-of-the-art software. These results are also confirmed by the outcome of the similarity matrix-based analysis.