Generative adversarial networks (GAN) have recently been shown to be efficient for speech enhancement. However, most, if not all, existing speech enhancement GANs (SEGAN) make use of a single ...generator to perform one-stage enhancement mapping. In this work, we propose to use multiple generators that are chained to perform multi-stage enhancement mapping, which gradually refines the noisy input signals in a stage-wise fashion. Furthermore, we study two scenarios: (1) the generators share their parameters and (2) the generators' parameters are independent. The former constrains the generators to learn a common mapping that is iteratively applied at all enhancement stages and results in a small model footprint. On the contrary, the latter allows the generators to flexibly learn different enhancement mappings at different stages of the network at the cost of an increased model size. We demonstrate that the proposed multi-stage enhancement approach outperforms the one-stage SEGAN baseline, where the independent generators lead to more favorable results than the tied generators. The source code is available at http://github.com/pquochuy/idsegan .
Limited by the dynamic characteristics of the sensor, the high-frequency signal will be distorted by the dynamic error after passing through the sensor, which will affect the accuracy of the real ...value. To reduce the dynamic error, it is necessary to obtain a high-precision dynamic compensation model. This paper provides a solution of compensation model based on the deep learning method. First, the problem of limited sensor dynamic data is solved by data augmentation through Deep Convolutional Generative Adversarial Network. After that, the sensor compensation model is obtained by Speech Enhancement Generative Adversarial Network and applied to step signals and shockwave signals. This compensation method can compensate a variety of sensors used in the dynamic measurement. It is verified by the pressure sensor as an example in this paper, the results are better than that of traditional ones, the overshoot can be reduced from 119.2% to 2.5%, and the rising time is 5.5μs. The innovation of this paper is that we find a way to use deep learning methods to compensate sensor dynamic error based on a small dataset. At the same time, it is proved that this method has strong versatility, which is not available in traditional sensor compensation methods. It also provides a feasible scheme for the application of deep learning in dynamic compensation model calculation.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Dysarthric speech is the noisy or source distortion speech. Reasonable speech enhancement is required to obtain higher communication quality for non-stationary noises. Owing to complexities in speech ...rate of dysarthric persons, understanding their speech is more critical and complex task. The generic recognition systems do not perform well in speech recognition. Hence, this paper proposes a Fractional Competitive Crow Search Algorithm-based Speech Enhancement Generative Adversarial Network (FCCSA-SEGAN) for enhancing the speech signal. Initially, at the pre-processing stage, the noise from the speech signal is removed using spectral subtraction method. Then, pre-processed signal is fed to speech enhancement, where signal quality is enhanced by the Speech Enhancement Generative Adversarial Network (SEGAN), which is trained by the developed FCCA. By the incorporation of Fractional Calculus (FC) and Competitive Crow Search Algorithm (CSSA), proposed FCCA is obtained, in which CSSA is hybridization of Crow Search Algorithm (CSA) and Competitive Swarm Optimizer (CSO). After that, the features, such as Multiple Kernel Weighted Mel Frequency Cepstral Coefficient (MKMFCC), Linear Prediction Cepstral Coefficient (LPCC), spectral flux, spectral crest, spectral centroid, and pitch chroma are extracted. Moreover, to increase the dimensionality of signal samples, noises are added to the original signal through data augmentation phase. Finally, using Competitive Crow Search Algorithm-based Hierarchical Attention Network (CCSA-based HAN), speech recognition process is done. In addition, the performance of the proposed method is evaluated using the UA speech database and the accuracy, sensitivity, and specificity of 0.930, 0.933, and 0.934 are obtained by the proposed method. By the proposed speech enhancement approach, higher Perceptual Evaluation of Speech Quality (PESQ) and lower Root Mean Square Error (RMSE) of 3.14, and 0.022 are attained.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolution operation, which may obscure temporal dependencies across the sequence input. To remedy this ...issue, we propose a self-attention layer adapted from non-local attention, coupled with the convolutional and deconvolutional layers of a speech enhancement GAN (SEGAN) using raw signal input. Further, we empirically study the effect of placing the self-attention layer at the (de)convolutional layers with varying layer indices as well as at all of them when memory allows. Our experiments show that introducing self-attention to SEGAN leads to consistent improvement across the objective evaluation metrics of enhancement performance. Furthermore, applying at different (de)convolutional layers does not significantly alter performance, suggesting that it can be conveniently applied at the highest-level (de)convolutional layer with the smallest memory overhead 1 .
Segmentation and classification of brain tumor are time-consuming and challenging chore in clinical image processing. Magnetic Resonance Imaging (MRI) offers more information related to human soft ...tissues that assists in diagnosing brain tumor. Precise segmentation of the MRI images is vital to diagnose brain tumor by means of computer-aided medical tools. Afterwards suitable segmentation of MRI brain tumor images, tumor classification is performed that is a hard chore owing to complications. Therefore, Gannet Aquila Optimization Algorithm_deep maxout network (GAOA_DMN) and GAOA_K-Net+speech enhancement generative adversarial network (GAOA_K-Net+Segan) is presented for classification and segmentation of brain tumor utilizing MRI images. Here, pre-processing phase performs noise removal from input image utilizing the Laplacian filter and also the region of interest (ROI) extraction is also carried out. Then, segmentation of brain tumor is conducted by K-Net+Segan, which is combined by Motyka similarity. However, K-Net+Segan for segmentation is trained by GAOA that is an amalgamation of Gannet Optimization Algorithm (GOA) and Aquila Optimizer (AO). From segmented image, features are extracted for performing classification phase. At last, brain tumor classification is conducted by DMN, which is tuned by GAOA and thus, output is obtained. Furthermore, GAOA_K-Net+Segan obtained better outcomes in terms of segmentation accuracy whereas devised GAOA_DMN achieved maximum accuracy, true negative rate (TNR) and true positive rate (TPR) of 92.7%, 94.5% and 91.5%.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
In industrial production, defect detection is a critical task. Traditional methods often require manual visual inspection, which is low in accuracy and time-consuming. As a key technology in the ...development of deep learning, semantic segmentation can serve as an effective defect detection method, locating the defect position and providing pixel-level semantic segmentation results that describe the shape and size of the defect, significantly improving production efficiency and product quality.Based on the features of the DAGM 2007 industrial optical detection datasets, this paper uses the SegAN network architecture that combines GAN network with semantic segmentation model and makes adjustments and improvements to it. Because low-level semantic information, such as edges and textures, is important for industrial defect detection, more skip connections are introduced to the original architecture to improve the sensitivity of the network to low-level semantic information in the datasets.Experimental results demonstrate that our improvements can effectively improve the Dice and MIoU index of semantic segmentation, achieving significant performance improvement compared to the original architectures.
In speech enhancement tasks, noise in speech signals may not only come from the external environment, but also from speech signals that are similar to the original speech, such as regional accents or ...emotional speech. However, most existing speech-enhanced GANs (SEGANs) utilize a single discriminator for the discrimination task. To improve upon existing single-discriminator GAN models, we propose a novel method that introduces two discriminators, with a new training data item for the second discriminator. This second discriminator is specifically designed to distinguish between clean speech signals and emotional speech signals, thereby reducing speech distortion and improving the quality of enhanced speech. The experimental results demonstrate the superior performance of our proposed method compared to the existing GAN model in speech enhancement, which validates the effectiveness and feasibility of our approach.
Current Generative Adversarial Networks only rely on convolution operations when dealing with speech tasks, ignoring the dependencies between time series and have limited learning ability so that ...there is still obvious residual noise in the enhanced speech. To solve this problem, an end-to-end speech enhancement method combining attention mechanisms to improve GAN is proposed to apply a combined attention mechanism fusing channel and space between convolutional layers of SEGAN to obtain more contextual information of speech in both channel and space dimensions and extract more accurate feature information. Experimental results demonstrate that the method outperforms the baseline model in both speech quality and intelligibility. The experimental data show that under different signal-to-noise ratios, the perceptual speech quality assessment (PESQ) is improved by an average of 25.72%, and the objective short-term object intelligibility (STOI) is improved by an average of 1.68%.
For the purpose of speech enhancement, SEGAN, which is one of deep generative models, has attracted attention due to its high performance. In this paper, we propose a method to sparse latent vectors ...to further enhance the noise suppression effect of SEGAN.
Single channel speech de-reverberation and de-noising is a challenging problem, since directional information is not available in a single channel when compared to multi-channel approaches. Several ...deep neural network (DNN) based solutions have been proposed in the recent past to solve this problem. These solutions are sequential and de-reverberate the signal after denoising. Additionally these solutions have not utilized the maximum a posteriori (MAP) method which requires the knowledge of the prior. In this work a MAP method is proposed to solve the speech de-reverberation and de-noising problem jointly. A half quadratic splitting (HQS) method is used to solve the joint MAP problem in a DNN framework by splitting it into two minimization problems. The deep prior is modeled using a latent variable and obtained using an iterative method. The performance of the proposed method is illustrated using subjective and objective measures. Experiments on continuous speech recognition are also used to demonstrate the significance of this method.