NUK - logo
E-viri
Celotno besedilo
Recenzirano
  • A joint framework for blind...
    Rennies, Jan; Röttges, Saskia; Huber, Rainer; Hauth, Christopher F.; Brand, Thomas

    Hearing research, December 2022, 2022-12-00, 20221201, Letnik: 426
    Journal Article

    •First validation study of a perceptual model to predict the benefit of spatial unmasking in terms of perceived listening effort.•Unified framework for predicting both speech intelligibility and listening effort from the same model output.•No auxiliary information required, i.e., mixed binaural input signals can be used directly to derive model predictions.•Model framework implemented for online processing, making it applicable to speech perception monitoring in close-to-realtime. Speech perception is strongly affected by noise and reverberation in the listening room, and binaural processing can substantially facilitate speech perception in conditions when target speech and maskers originate from different directions. Most studies and proposed models for predicting spatial unmasking have focused on speech intelligibility. The present study introduces a model framework that predicts both speech intelligibility and perceived listening effort from the same output measure. The framework is based on a combination of a blind binaural processing stage employing a blind equalization cancelation (EC) mechanism, and a blind backend based on phoneme probability classification. Neither frontend nor backend require any additional information, such as the source directions, the signal-to-noise ratio (SNR), or the number of sources, allowing for a fully blind perceptual assessment of binaural input signals consisting of target speech mixed with noise. The model is validated against a recent data set in which speech intelligibility and perceived listening effort were measured for a range of acoustic conditions differing in reverberation and binaural cues Rennies and Kidd (2018), J. Acoust. Soc. Am. 144, 2147-2159. Predictions of the proposed model are compared with a non-blind binaural model consisting of a non-blind EC stage and a backend based on the speech intelligibility index. The analyses indicated that all main trends observed in the experiments were correctly predicted by the blind model. The overall proportion of variance explained by the model (R² = 0.94) for speech intelligibility was slightly worse than for the non-blind model (R² = 0.98). For listening effort predictions, both models showed lower prediction accuracy, but still explained significant proportions of the observed variance (R² = 0.88 and R² = 0.71 for the non-blind and blind model, respectively). Closer inspection showed that the differences between data and predictions were largest for binaural conditions at high SNRs, where the perceived listening effort of human listeners tended to be underestimated by the models, specifically by the blind version.