This letter proposes a method for estimating a convolutional beamformer that can perform denoising and dereverberation simultaneously in an optimal way. The application of dereverberation based on a ...weighted prediction error (WPE) method followed by denoising based on a minimum variance distortionless response (MVDR) beamformer has conventionally been considered a promising approach, however, the optimality of this approach cannot be guaranteed. To realize the optimal integration of denoising and dereverberation, we present a method that unifies the WPE dereverberation method and a variant of the MVDR beamformer, namely a minimum power distortionless response beamformer, into a single convolutional beamformer, and we optimize it based on a single unified optimization criterion. The proposed beamformer is referred to as a weighted power minimization distortionless response beamformer. Experiments show that the proposed method substantially improves the speech enhancement performance in terms of both objective speech enhancement measures and automatic speech recognition performance.
An important problem in ad-hoc microphone speech separation is how to guarantee the robustness of a system with respect to the locations and numbers of microphones. The former requires the system to ...be invariant to different indexing of the microphones with the same locations, while the latter requires the system to be able to process inputs with varying dimensions. Conventional optimization-based beamforming techniques satisfy these requirements by definition, while for deep learning-based end-to-end systems those constraints are not fully addressed. In this paper, we propose transform-average-concatenate (TAC), a simple design paradigm for channel permutation and number invariant multi-channel speech separation. Based on the filter-and-sum network (FaSNet), a recently proposed end-to-end time-domain beamforming system, we show how TAC significantly improves the separation performance across various numbers of microphones in noisy reverberant separation tasks with ad-hoc arrays. Moreover, we show that TAC also significantly improves the separation performance with fixed geometry array configuration, further proving the effectiveness of the proposed paradigm in the general problem of multi-microphone speech separation.
Microphones for hearing aid systems are required to have high sensitivity, an appropriate bandwidth, and a wide dynamic range. In this paper, a high sensitivity microphone, 4 mm in diameter and using ...a multilayer graphene–PMMA laminated diaphragm that can be applied in hearing aids, is designed, optimized, and implemented. Typically, polyphenylene sulfide (PPS) has been used for the diaphragm of electret condenser microphones (ECM), and this method provides simple, low cost mass production. Generally, the sensitivity of the commercial 4 mm diameter ECM is about −30 to 35 dB (0 dB = 1 V/Pa). A microphone using a nanometer-thick graphene diaphragm has been found to have higher sensitivity than the conventional ECM. However, nanometer-thick multilayer graphene is vulnerable to large mechanical shocks or high sound pressures, and the practical production of nanometer-thick diaphragms also poses a challenge. However, if a multilayer graphene diaphragm of the same thickness as the conventional ECM is used, displacement during diaphragm vibration will be severely attenuated due to the high elastic modulus of graphene, and the microphone sensitivity will be greatly reduced. In this paper, we fabricate a multilayer graphene/poly(methyl methacrylate) (PMMA) laminated diaphragm with sensitivity higher than that of any other microphones currently available for hearing aids, with the appropriate bandwidth in the auditory range. The high sensitivity arises from the laminated structure of the thin graphene membrane with high elastic modulus and from the PMMA membrane with lower elastic modulus and higher dielectric constant. The optimal thickness ratio of the graphene–PMMA layered diaphragm was studied by both analytical and experimental methods, and then a fabricated diaphragm was assembled in a 4 mm diameter microphone package. The performance of the implemented microphone was evaluated, including the sensitivity and total harmonic distortion. It is demonstrated that the microphone using a multilayer graphene–PMMA diaphragm has an excellent sensitivity of −20 dB and a dynamic range of 90 dB, which is on average 9 dB higher than the microphone using the conventional ECM diaphragm.
This paper is dedicated to the design of fully steerable linear differential microphone arrays (LDMAs). We analyze the steerable ideal spatial responses and explain why conventional LDMAs consisting ...of only omnidirectional microphones have limited steering ability. In order to circumvent this limitation, we suggest to use both omnidirectional and bidirectional (with a dipole shaped directivity pattern) microphones. We discuss the minimum numbers of omnidirectional and bidirectional sensors required for achieving steerable spatial responses and present a method to design fully steerable differential beamformers with LDMAs through the Jacobi-Anger series expansion. Simulations validate the presented technique and the steering flexibility of the designed LDMAs.
This article presents a lumped-parameter model providing a deeper understanding of the compliant backplate in capacitive MEMS microphones. Some previous models simplify the backplate as stationary, ...whereas others treat it as vibrating. This work not only models the backplate as vibrating but also considers the coupling effect between the mechanical and electrical domains. The extended model allows for a more detailed analysis of how the microphone converts sound into an electrical signal. Specifically, the theoretical derivations using Lagrange equations show how backplate motion can impact the microphone's performance. Analysis of the lumped-parameter model aligns well with the results of finite element analysis when the frequency is below the high-order resonance, validating the theoretical concepts. In particular, the model with electrical coupling of the vibrating backplate effectively captures the sensitivity dip resulting from the backplate resonance, unlike models lacking this coupling. The theoretical framework is also extended to the phenomenon of pull-in. A backplate that is overly compliant can narrow the operating frequency range and increase the likelihood of experiencing pull-in. Thus, there is a trade-off between optimizing the microphone's acoustic performance and ensuring its mechanical robustness. This work provides valuable insights into navigating these trade-offs.
This study proposes a complex spectral mapping approach for single- and multi-channel speech enhancement, where deep neural networks (DNNs) are used to predict the real and imaginary (RI) components ...of the direct-path signal from noisy and reverberant ones. The proposed system contains two DNNs. The first one performs single-channel complex spectral mapping. The estimated complex spectra are used to compute a minimum variance distortion-less response (MVDR) beamformer. The RI components of beamforming results, which encode spatial information, are then combined with the RI components of the mixture to train the second DNN for multi-channel complex spectral mapping. With estimated complex spectra, we also propose a novel method of time-varying beamforming. State-of-the-Art performance is obtained on the speech enhancement and recognition tasks of the CHiME-4 corpus. More specifically, our system obtains 6.82%, 3.19% and 1.99% word error rates (WER) respectively on the single-, two-, and six-microphone tasks of CHiME-4, significantly surpassing the current best results of 9.15%, 3.91% and 2.24% WER.
Acoustic feedback cancellation in hearing aids makes use of adaptive filters to continuously identify and track variations to the feedback path. One of the biggest problems remaining in using ...adaptive filters for feedback cancellation is the biased estimation of the filter's coefficients. In order to remove the undesired correlation between the loudspeaker and incoming signal, a recent alternative scheme proposed to employ an additional microphone. This microphone can provide added information to obtain an incoming signal estimate. This estimate is removed from the primary microphone signal to create the error signal which adapts the canceler's coefficients. This letter provides the theoretical analysis for the two microphone method. It presents analytic expressions showing that the optimal solution is no longer dependent on the signal correlation aforementioned but is now mainly determined by the additional feedback path. Finally, it demonstrates simulation results with the prediction error method in terms of misalignment and maximum gain for a proposed microphone placement. The results show that a more stable solution is obtained with the proposed two microphone approach.
This paper presents a comprehensive literature survey of MEMS based piezoelectric microphones along with the fabrication processes involved, application domains, and methodologies used for ...experimentations. Advantages and limitations of existing microphones are presented with the impact of process parameters during the thin film growth. This review identifies the issues faced by the microphone technologies spanning from the invention of microphones to the most recent state-of-the-art solutions implemented to overcome or address them. A detailed comparison of performance in terms of sensitivity and dynamic range is presented here that can be used to decide the piezoelectric material and process to be used to develop sensors based on the bandwidth requirement. Electrical and mechanical properties of different piezoelectric materials such as AlN, ZnO, quartz, PZT, PVDF, and other polymers that has great potential to be used as the sensing membrane in development and deployment of these microphones are presented along with the complications faced during the fabrication. Insights on the future of these sensors and emerging application domains are also discussed.
Microphone array techniques can improve the acoustic sensing performance of drones, compared with the use of a single microphone. However, multichannel sound acquisition systems are not available in ...current commercial drone platforms. We present an embedded multichannel sound acquisition and recording system with eight microphones mounted on a quadcopter. The system is developed based on Bela, an embedded computing system for audio processing. The system can record the sound from multiple microphones simultaneously, can store the data locally for on-device processing, and can transmit the multichannel audio via wireless communication to a ground terminal for remote processing. We disclose the technical details of the hardware and software design and development of the system. We implement two setups that place the microphone array at different locations on the drone body. We present experimental results obtained by the state-of-the-art drone audition algorithms applied to the sound recorded by the embedded system flying with a drone. It is shown that the ego-noise reduction performance achieved by the microphone array varies depending on the array placement and the location of the target sound. This observation provides valuable insights into hardware development for drone audition.