The market and the applications of micro-electromechanical systems (MEMS)-based microphones have been in continuous growth over the last decades. This article presents a promising acoustic-sensing ...technology that mixes the consolidated MEMS technology with an innovative optical transduction technique of acoustic signals. The proposed method allows to significantly reduce the intrinsic noise of the system and to increase its signal-to-noise ratio (SNR). The designed digital optical microphone reaches an SNR of 71.6 dBA in a 5 <inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> 5 <inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> 2 mm<inline-formula> <tex-math notation="LaTeX">^{3}</tex-math> </inline-formula> package with an output sensitivity of <inline-formula> <tex-math notation="LaTeX">-</tex-math> </inline-formula>21 dBFS/Pa. This article describes each section of the system-in-package (SiP) microphone, starting from the physics behind the transduction mechanism, covering the application-specified integrated circuit (ASIC) and package design, and the optical stack structure. A final analysis of the obtained experimental results is provided and compared with the state-of-art reported in the literature.
In this paper, an omnidirectional source tracking method based on sensitivity difference of microphone pairs is proposed. In the conventional method, we developed the direction estimation method ...based on a histogram integrating the estimation results of multiple two-microphone pairs. However, the estimation accuracy degradation often occurs despite the number of microphones. When a sound source exists at a wide angle to the microphone pairs, the estimation direction indicates highly sensitive to the measurement error of TDOA(Time-Difference-Of-Arrival) between microphones, and the estimation accuracy is greatly degradated. Therefore, it is considered that the same reliability should not be given to all estimation results to the same sound source. Because the estimation accuracy differs for each microphone pairs even for the same sound source, In the proposed method, such a difficulty is resolved by selecting the microphone pair used for the estimation, then the another pair is used just to judge whether the source exists in front or in back. As a result, the estimation accuracy can be improved with half the number of microphones of the conventional method. The effectiveness of method is shown from several experimental results.
A unidirectional microelectromechanical system (MEMS) microphone single module for handsfree and voice recognition systems in automobiles is presented. Because in-cabin noise and temperature ...variation affect the reduction in the handsfree and voice recognition system performance, noise suppression and increasing the signal-to-noise ratio (SNR) of the microphone are required. In this study, a capacitive MEMS microphone module with a high SNR and a unidirectional characteristic is achieved by designing the structure, package, and module of the MEMS microphone. To improve the SNR, the microphone is developed using a slit-edged membrane. The slit structure is designed to release the residual stress of the membrane to achieve improved sensitivity and SNR. The unidirectional characteristic of the microphone enables suppression of noise signals from undesired directions. The directional characteristic of the microphone is realized by attaching a porous SU-8 filter to delay the time to one of the two acoustic ports on the package. Tests on the proposed unidirectional MEMS microphone package and module show that an SNR of 62.4 dB and a front-back ratio of 27.1 dB are achieved.
The MEMS microphone is a representative device among the MEMS family, which has attracted substantial research interest, and those tailored for human voice have earned distinct success in ...commercialization. Although sustained development persists, challenges such as residual stress, environmental noise, and structural innovation are posed. To collect and summarize the recent advances in this subject, this paper presents a concise review concerning the transduction mechanism, diverse mechanical structure topologies, and effective methods of noise reduction for high-performance MEMS microphones with a dynamic range akin to the audible spectrum, aiming to provide a comprehensive and adequate analysis of this scope.
Bone conduction and in-ear microphones pick up the Bone-conducted (BC) speech, which has traveled through bones and soft tissues. This BC speech is less sensitive to surrounding noise than ...Air-conducted (AC) speech. Therefore, there is an interest in recording this signal. However, the intelligibility and quality of these microphones are known to be a limit to their use. Previous work has noted confusion between vowel sounds in terms of intelligibility. To evaluate the quality and intelligibility, studies rely on subjective and objective tests. This paper aims to determine if the standard objective methods for rating speech quality and intelligibility can be applied to BC speech recorded through these microphones. In order to estimate intelligibility, a subjective test based on vowel recognition was compared to STOI and to a new criterion based on second formant frequencies of oral vowels. For speech quality estimation, MUSHRA and PESQ tests were compared. Results show difficulties in using objective methods instead of subjective ones. Therefore, for ongoing studies, it is suggested to use subjective methods to evaluate speech quality and intelligibility.
•Objective and subjective evaluation of speech quality do not correlate.•Introduction of a new objective metric to evaluate vowel recognition.•The new objective metric for vowel recognition works only for in-ear microphones.•Objective and subjective evaluation of intelligibility do not correlate.
We propose an integrated end-to-end automatic speech recognition (ASR) paradigm by joint learning of the front-end speech signal processing and back-end acoustic modeling. We believe that "only good ...signal processing can lead to top ASR performance" in challenging acoustic environments. This notion leads to a unified deep neural network (DNN) framework for distant speech processing that can achieve both high-quality enhanced speech and high-accuracy ASR simultaneously. Our goal is accomplished by two techniques, namely: (i) a reverberation-time-aware DNN based speech dereverberation architecture that can handle a wide range of reverberation times to enhance speech quality of reverberant and noisy speech, followed by (ii) DNN-based multicondition training that takes both clean-condition and multicondition speech into consideration, leveraging upon an exploitation of the data acquired and processed with multichannel microphone arrays, to improve ASR performance. The final end-to-end system is established by a joint optimization of the speech enhancement and recognition DNNs. The recent REverberant Voice Enhancement and Recognition Benchmark (REVERB) Challenge task is used as a test bed for evaluating our proposed framework. We first report on superior objective measures in enhanced speech to those listed in the 2014 REVERB Challenge Workshop on the simulated data test set. Moreover, we obtain the best single-system word error rate (WER) of 13.28% on the 1-channel REVERB simulated data with the proposed DNN-based pre-processing algorithm and clean-condition training. Leveraging upon joint training with more discriminative ASR features and improved neural network based language models, a low single-system WER of 4.46% is attained. Next, a new multi-channel-condition joint learning and testing scheme delivers a state-of-the-art WER of 3.76% on the 8-channel simulated data with a single ASR system. Finally, we also report on a preliminary yet promising experimentation with the REVERB real test data.
Directional noise reduction is an essential and challenging task when using Bluetooth headsets to make phone calls, using hearing aids to receive sound, or interacting with intelligent robots. Most ...existing methods often suffer from problems such as the fixed desired incidence direction and poor noise reduction. To address these problems, a directional noise suppression method based on dual-microphones with desired direction presetting is proposed. A signal model is first constructed by introducing a virtual microphone. Then, by balancing and offsetting the noise components in the dual-channel differential signals, the amplitude spectrum of the desired speech is presented, where the determinant-based approach is utilized to make the speech presence decision. Finally, considering that the phase distortion caused by noise will affect the speech quality, the phase spectrum of noisy speech is optimized by the compensation function and then combined with the enhanced amplitude spectrum to reconstruct the desired speech. The experimental results show that the proposed method obtains better performance in suppressing directional noise.
Acoustic sensing through optical transduction represents a promising alternative to the conventional capacitive sensing used in MEMS microphones, especially when aiming at ultra-low noise ...applications. In fact, the traditional acoustic to electrical transduction stages are decoupled by the intermediate conversion of the signal into the optical domain. As a result, the mechanical design of the sensor has no direct influence on the electrical readout performance, this allows for a significant reduction of the MEMS transducer noise through aggressive acoustically semi-transparent stator designs, that represent one of the limits of the standard capacitive technologies. This paper reports the design and the modeling of the sensing elements of a MEMS optical microphone. The basic transduction mechanism is presented and the main design parameters and challenges are explained and analyzed with advanced modeling techniques, the measurement results are finally compared to the expected performance.
This letter proposes a method for estimating a convolutional beamformer that can perform denoising and dereverberation simultaneously in an optimal way. The application of dereverberation based on a ...weighted prediction error (WPE) method followed by denoising based on a minimum variance distortionless response (MVDR) beamformer has conventionally been considered a promising approach, however, the optimality of this approach cannot be guaranteed. To realize the optimal integration of denoising and dereverberation, we present a method that unifies the WPE dereverberation method and a variant of the MVDR beamformer, namely a minimum power distortionless response beamformer, into a single convolutional beamformer, and we optimize it based on a single unified optimization criterion. The proposed beamformer is referred to as a weighted power minimization distortionless response beamformer. Experiments show that the proposed method substantially improves the speech enhancement performance in terms of both objective speech enhancement measures and automatic speech recognition performance.