Time delay estimation has been a research topic of significant practical importance in many fields (radar, sonar, seismology, geophysics, ultrasonics, hands-free communications, etc.). It is a first ...stage that feeds into subsequent processing blocks for identifying, localizing, and tracking radiating sources. This area has made remarkable advances in the past few decades, and is continuing to progress, with an aim to create processors that are tolerant to both noise and reverberation. This paper presents a systematic overview of the state-of-the-art of time-delay-estimation algorithms ranging from the simple cross-correlation method to the advanced blind channel identification based techniques. We discuss the pros and cons of each individual algorithm, and outline their inherent relationships. We also provide experimental results to illustrate their performance differences in room acoustic environments where reverberation and noise are commonly encountered.
Noise reduction, which aims at estimating a clean speech from noisy observations, has attracted a considerable amount of research and engineering attention over the past few decades. In the ...single-channel scenario, an estimate of the clean speech can be obtained by passing the noisy signal picked up by the microphone through a linear filter/transformation. The core issue, then, is how to find an optimal filter/transformation such that, after the filtering process, the signal-to-noise ratio (SNR) is improved but the desired speech signal is not noticeably distorted. Most of the existing optimal filters (such as the Wiener filter and subspace transformation) are formulated from the mean-square error (MSE) criterion. However, with the MSE formulation, many desired properties of the optimal noise-reduction filters such as the SNR behavior cannot be seen. In this paper, we present a new criterion based on the Pearson correlation coefficient (PCC). We show that in the context of noise reduction the squared PCC (SPCC) has many appealing properties and can be used as an optimization cost function to derive many optimal and suboptimal noise-reduction filters. The clear advantage of using the SPCC over the MSE is that the noise-reduction performance (in terms of the SNR improvement and speech distortion) of the resulting optimal filters can be easily analyzed. This shows that, as far as noise reduction is concerned, the SPCC-based cost function serves as a more natural criterion to optimize as compared to the MSE.
The problem of noise reduction has attracted a considerable amount of research attention over the past several decades. Among the numerous techniques that were developed, the optimal Wiener filter ...can be considered as one of the most fundamental noise reduction approaches, which has been delineated in different forms and adopted in various applications. Although it is not a secret that the Wiener filter may cause some detrimental effects to the speech signal (appreciable or even significant degradation in quality or intelligibility), few efforts have been reported to show the inherent relationship between noise reduction and speech distortion. By defining a speech-distortion index to measure the degree to which the speech signal is deformed and two noise-reduction factors to quantify the amount of noise being attenuated, this paper studies the quantitative performance behavior of the Wiener filter in the context of noise reduction. We show that in the single-channel case the a posteriori signal-to-noise ratio (SNR) (defined after the Wiener filter) is greater than or equal to the a priori SNR (defined before the Wiener filter), indicating that the Wiener filter is always able to achieve noise reduction. However, the amount of noise reduction is in general proportional to the amount of speech degradation. This may seem discouraging as we always expect an algorithm to have maximal noise reduction without much speech distortion. Fortunately, we show that speech distortion can be better managed in three different ways. If we have some a priori knowledge (such as the linear prediction coefficients) of the clean speech signal, this a priori knowledge can be exploited to achieve noise reduction while maintaining a low level of speech distortion. When no a priori knowledge is available, we can still achieve a better control of noise reduction and speech distortion by properly manipulating the Wiener filter, resulting in a suboptimal Wiener filter. In case that we have multiple microphone sensors, the multiple observations of the speech signal can be used to reduce noise with less or even no speech distortion
Although many microphone-array beamforming algorithms have been developed over the past few decades, most such algorithms so far can only offer limited performance in practical acoustic environments. ...The reason behind this has not been fully understood and further research on this matter is indispensable. In this paper, we treat a microphone array as a multiple-input multiple-output (MIMO) system and study its signal-enhancement performance. Our major contribution is fourfold. First, we develop a general framework for analyzing performance of beamforming algorithms based on the acoustic MIMO channel impulse responses. Second, we study the bounds for the length of the beamforming filter, which in turn shows the performance bounds of beamforming in terms of speech dereverberation and interference suppression. Third, we address the connection between beamforming and the multiple-input/output inverse theorem (MINT). Finally, we discuss the intrinsic relationships among different classical beamforming techniques and explain, from the channel condition perspective, what the prerequisites are for those techniques to work
This paper addresses the problem of noise reduction in the time domain where the clean speech sample at every time instant is estimated by filtering a vector of the noisy speech signal. Such a clean ...speech estimate consists of both the filtered speech and residual noise (filtered noise) as the noisy vector is the sum of the clean speech and noise vectors. Traditionally, the filtered speech is treated as the desired signal after noise reduction. This paper proposes to decompose the clean speech vector into two orthogonal components: one is correlated and the other is uncorrelated with the current clean speech sample. While the correlated component helps estimate the clean speech, it is shown that the uncorrelated component interferes with the estimation, just as the additive noise. Based on this orthogonal decomposition, the paper presents a way to define the error signal and cost functions and addresses the issue of how to design different optimal noise reduction filters by optimizing these cost functions. Specifically, it discusses how to design the maximum SNR filter, the Wiener filter, the minimum variance distortionless response (MVDR) filter, the tradeoff filter, and the linearly constrained minimum variance (LCMV) filter. It demonstrates that the maximum SNR, Wiener, MVDR, and tradeoff filters are identical up to a scaling factor. It also shows from the orthogonal decomposition that many performance measures can be defined, which seem to be more appropriate than the traditional ones for the evaluation of the noise reduction filters.
Time delay estimation (TDE) is a basic technique for numerous applications where there is a need to localize and track a radiating source. The most important TDE algorithms for two sensors are based ...on the generalized cross-correlation (GCC) method. These algorithms perform reasonably well when reverberation or noise is not too high. In an earlier study by the authors, a more sophisticated approach was proposed. It employs more sensors and takes advantage of their delay redundancy to improve the precision of the time difference of arrival (TDOA) estimate between the first two sensors. The approach is based on the multichannel cross-correlation coefficient (MCCC) and was found more robust to noise and reverberation. In this letter, we show that this approach can also be developed on a basis of joint entropy. For Gaussian signals, we show that, in the search of the TDOA estimate, maximizing MCCC is equivalent to minimizing joint entropy. However, with the generalization of the idea to non-Gaussian signals (e.g., speech), the joint entropy-based new TDE algorithm manifests a potential to outperform the MCCC-based method
In this correspondence, we study the performance of differential microphone arrays (DMAs) in terms of noise reduction, speech distortion, and signal-to-noise ratio (SNR) gain. We also investigate ...their beampatterns and array gains. We start by establishing the expressions of these performance measures involving general derivatives of the channel transfer functions. Afterwards, we specify our results in the case of anechoic near-field and far-field propagation models.
Noise reduction for speech enhancement is a useful technique, but in general it is a challenging problem. While a single-channel algorithm is easy to use in practice, it inevitably introduces speech ...distortion to the desired speech signal while reducing noise. Today, the explosive growth in computational power and the continuous drop in the cost and size of acoustic electric transducers are driving the interest of employing multiple microphones in speech processing systems. This opens new opportunities for noise reduction. In this paper, we present an analysis of three multichannel noise reduction algorithms, namely Wiener filter, subspace, and spatial-temporal prediction, in a common framework. We intend to investigate whether it is possible for the multichannel noise reduction algorithms to reduce noise without speech distortion. Finally, we justify what we learn via theoretical analyses by simulations using real impulse responses measured in the varechoic chamber at Bell Labs.
This paper presents a practically efficient implementation for non-linear acoustic echo cancellation (NAEC). The echo path is modeled by a novel hybrid Taylor-Volterra pre-processor followed by a ...linear FIR filter. A cascaded block RLS and unconstrained FLMS adaptive algorithm is developed to jointly identify the pre-processor and the FIR filter. This implementation is validated via simulations.
Bell Laboratories layered space-time (BLAST) wireless systems are multiple-antenna communication schemes that can achieve very high spectral efficiencies in scattering environments with no increase ...in bandwidth or transmitted power. The most popular and, by far, the most practical architecture is the so-called vertical BLAST (V-BLAST). The signal detection algorithm of a V-BLAST system is computationally very intensive. If the number of transmitters is M and is equal to the number of receivers, this complexity is proportional to M/sup 4/ at each sample time. We propose a very simple and efficient algorithm that reduces the complexity by a factor of M.