The performance of objective speech and audio quality measures for the prediction of the perceived quality of frequency-compressed speech in hearing aids is investigated in this paper. A number of ...existing quality measures have been applied to speech signals processed by a hearing aid, which compresses speech spectra along frequency in order to make information contained in higher frequencies audible for listeners with severe high-frequency hearing loss. Quality measures were compared with subjective ratings obtained from normal hearing and hearing impaired children and adults in an earlier study. High correlations were achieved with quality measures computed by quality models that are based on the auditory model of Dau et al., namely, the measure PSM, computed by the quality model PEMO-Q; the measure qc, computed by the quality model proposed by Hansen and Kollmeier; and the linear subcomponent of the HASQI. For the prediction of quality ratings by hearing impaired listeners, extensions of some models incorporating hearing loss were implemented and shown to achieve improved prediction accuracy. Results indicate that these objective quality measures can potentially serve as tools for assisting in initial setting of frequency compression parameters.
A new method for the objective assessment and prediction of perceived audio quality is introduced. It represents an expansion of the speech quality measure q C , introduced by Hansen and Kollmeier, ...and is based on a psychoacoustically validated, quantitative model of the "effective" peripheral auditory processing by Dau et al. To evaluate the audio quality of a given distorted signal relative to a corresponding high-quality reference signal, the auditory model is employed to compute "internal representations" of the signals, which are partly assimilated in order to account for assumed cognitive aspects. The linear cross correlation coefficient of the assimilated internal representations represents the perceptual similarity measure (PSM). PSM shows good correlations with subjective quality ratings if different types of audio signals are considered separately, whereas a better accuracy of signal-independent quality prediction is achieved by a second quality measure PSM t represented by the fifth percentile of the sequence of instantaneous audio quality PSM(t). The new measures were evaluated using a large database of subjective listening tests that were originally carried out on behalf of the International Telecommunication Union (ITU) and Moving Pictures Experts Group (MPEG) for the evaluation of various low bit-rate audio codecs. Additional tests with data unknown in the development phase of the model were carried out. Except for linear distortions, the new method shows a higher prediction accuracy than the ITU-R recommendation BS.1387 ("PEAQ") for the tested data
•First validation study of a perceptual model to predict the benefit of spatial unmasking in terms of perceived listening effort.•Unified framework for predicting both speech intelligibility and ...listening effort from the same model output.•No auxiliary information required, i.e., mixed binaural input signals can be used directly to derive model predictions.•Model framework implemented for online processing, making it applicable to speech perception monitoring in close-to-realtime.
Speech perception is strongly affected by noise and reverberation in the listening room, and binaural processing can substantially facilitate speech perception in conditions when target speech and maskers originate from different directions. Most studies and proposed models for predicting spatial unmasking have focused on speech intelligibility. The present study introduces a model framework that predicts both speech intelligibility and perceived listening effort from the same output measure. The framework is based on a combination of a blind binaural processing stage employing a blind equalization cancelation (EC) mechanism, and a blind backend based on phoneme probability classification. Neither frontend nor backend require any additional information, such as the source directions, the signal-to-noise ratio (SNR), or the number of sources, allowing for a fully blind perceptual assessment of binaural input signals consisting of target speech mixed with noise. The model is validated against a recent data set in which speech intelligibility and perceived listening effort were measured for a range of acoustic conditions differing in reverberation and binaural cues Rennies and Kidd (2018), J. Acoust. Soc. Am. 144, 2147-2159. Predictions of the proposed model are compared with a non-blind binaural model consisting of a non-blind EC stage and a backend based on the speech intelligibility index. The analyses indicated that all main trends observed in the experiments were correctly predicted by the blind model. The overall proportion of variance explained by the model (R² = 0.94) for speech intelligibility was slightly worse than for the non-blind model (R² = 0.98). For listening effort predictions, both models showed lower prediction accuracy, but still explained significant proportions of the observed variance (R² = 0.88 and R² = 0.71 for the non-blind and blind model, respectively). Closer inspection showed that the differences between data and predictions were largest for binaural conditions at high SNRs, where the perceived listening effort of human listeners tended to be underestimated by the models, specifically by the blind version.
The effort required to listen to and understand noisy speech is an important factor in the evaluation of noise reduction schemes. This paper introduces a model for Listening Effort prediction from ...Acoustic Parameters (LEAP). The model is based on methods from automatic speech recognition, specifically on performance measures that quantify the degradation of phoneme posteriorgrams produced by a deep neural net: Noise or artifacts introduced by speech enhancement often result in a temporal smearing of phoneme representations, which is measured by comparison of phoneme vectors. This procedure does not require a priori knowledge about the processed speech, and is therefore single-ended. The proposed model was evaluated using three datasets of noisy speech signals with listening effort ratings obtained from normal hearing and hearing impaired subjects. The prediction quality was compared to several baseline models such as the ITU-T standard P.563 for single-ended speech quality assessment, the American National Standard ANIQUE+ for single-ended speech quality assessment, and a single-ended SNR estimator. In all three datasets, the proposed new model achieved clearly better prediction accuracies than the baseline models; correlations with subjective ratings were above 0.9. So far, the model is trained on the specific noise types used in the evaluation. Future work will be concerned with overcoming this limitation by training the model on a variety of different noise types in a multi-condition way in order to make it generalize to unknown noise types.
•A new single-ended listening effort prediction method is proposed.•The method achieves high correlations (r > 0.9) with subjective ratings.•It clearly outperforms standard methods for single-ended speech quality assessment.
Objective: The perceived qualities of nine different single-microphone noise reduction (SMNR) algorithms were to be evaluated and compared in subjective listening tests with normal hearing and ...hearing impaired (HI) listeners. Design: Speech samples added with traffic noise or with party noise were processed by the SMNR algorithms. Subjects rated the amount of speech distortions, intrusiveness of background noise, listening effort and overall quality, using a simplified MUSHRA (ITU-R,
2003
) assessment method. Study sample: 18 normal hearing and 18 moderately HI subjects participated in the study. Results: Significant differences between the rating behaviours of the two subject groups were observed: While normal hearing subjects clearly differentiated between different SMNR algorithms, HI subjects rated all processed signals very similarly. Moreover, HI subjects rated speech distortions of the unprocessed, noisier signals as being more severe than the distortions of the processed signals, in contrast to normal hearing subjects. Conclusions: It seems harder for HI listeners to distinguish between additive noise and speech distortions or/and they might have a different understanding of the term "speech distortion" than normal hearing listeners have. The findings confirm that the evaluation of SMNR schemes for hearing aids should always involve HI listeners.
This study investigated the effects of different adjustment criteria and sound scenes on self-adjusted hearing-aid gain settings. Self-adjusted settings were evaluated for speech recognition in ...noise, perceived listening effort, and preference.
This study evaluated a semi-supervised self-adjustment fine-tuning procedure that presents realistic everyday sound scenes in a laboratory environment, using a two-dimensional user interface, and enabling simultaneous changes in amplitude and spectral slope. While exploring the two-dimensional space of parameter settings, the hearing-aid users were instructed to optimise either listening comfort or speech understanding.
Twenty experienced hearing aid users (median age 69.5 years) were invited to participate in this study.
Adjustment criterion and sound scenes had a significant effect on preferred gain settings. No differences in signal-to-noise ratios required for 50% speech intelligibility or in the perceived listening effort were observed between the adjusted settings of the two adjustment criteria. There was a preference for the self-adjusted settings over the prescriptive first fit.
Listeners could reliably select their preferred gains to the two adjustment criteria and for different speech stimuli.
A method for an automated system for speech audiometry is introduced and evaluated using pre-recorded responses as well as spontaneous utterances produced by listeners during a real measurement. A ...hearing test is performed under the use of automatic speech recognition (ASR) based on the matrix sentence test, which is used clinically for diagnostics and fitting of hearing devices as well as in psychoacoustic research. The test measures the speech reception threshold (SRT), i.e., the signal-to-noise ratio at which the subject achieves 50% word recognition rate. A major disadvantage of current testing procedures is the requirement of a human expert supervising the test and logging the listener’s responses. An automated system reduces the required resources and therefore provides a tool for frequent assessment of the SRT, which can contribute to an early diagnosis of hearing loss. The accuracy of the ASR-based SRT measurement is compared to results obtained with a human supervisor. To this end, two databases are used that contain either well-controlled read utterances that resemble typical responses during SRT measurements produced by 17 speakers, or spontaneous responses collected during real SRT measurements using ASR. Twenty normal-hearing and seven slightly to moderate hearing-impaired subjects participated in the collection of this spontaneous speech. In order to assess the SRT accuracy for read speech, two simulation schemes are proposed that employ Monte Carlo tests to simulate a listener’s profile and corresponding responses, which are validated with the real measurement data. We show that ASR deletion rates of 0.9% and insertion rates of 2.9% for matrix text words are sufficiently low to obtain accurate SRT measurements in the range of 0.5 dB SNR. This is comparable to the test-retest accuracy obtained by human supervisors. While ASR errors are overestimated when using the controlled speech material in comparison to spontaneous speech, this error type has minimal effect on SRT estimation. Hence, the use of pre-recorded, read speech material is sufficient when evaluating the accuracy of speech-controlled, automated listening tests.
Listening to the audio of TV broadcast signals can be challenging for hearing-impaired as well as normal-hearing listeners, especially when background sounds are prominent or too loud compared to the ...speech signal. This can result in a reduced satisfaction and increased listening effort of the listeners. Since the broadcast sound is usually premixed, we perform a subjective evaluation for quantifying the potential of speech enhancement systems based on audio source separation and recurrent neural networks (RNN). Recently, RNNs have shown promising resultsin the context of sound source separation and real-time signal processing. In this paper, we separate the speech from the background signals and remix the separated sounds at a higher signal-to-noise ratio. This differs from classic speech enhancement, where usually only the extracted speech signal is exploited. The subjective evaluation with 20 normal-hearing subjects on real TV-broadcast material shows that our proposed enhancement system is able to reduce the listening effort by around 2 points on a 13-point listening effort rating scale and increases the perceived sound quality compared to the original mixture.
Objective: The aim of the study was, based on the individualisation of hearing aids (HA) and pre-sets for audio devices, to develop a questionnaire to determine the basis for profiling sound ...preferences and hearing habits to gather additional information usable for HA fitting and adjustment tools for audio-devices. Methods: We developed a questionnaire consisting of 46 items. A postal survey was conducted with N = 622 users with a mean age of 66 years (47.9% aided with HA, 45.7% female). Results: Seven factors were identified by means of Explanatory and Confirmatory Factor Analyses: F1: 'Annoyance/distraction by background noise', F2: 'Importance of sound quality', F3: 'Noise Sensitivity', F4: 'Avoidance of unpredictable sounds', F5: 'Openness towards loud/new sounds', F6: 'Preferences for warm sounds', and F7: 'Details of environmental sounds/music'. Only the first of these factors was related to the audiogram of the user. No difference with any of the factors could be observed with HA use/non-use. In contrast, gender effects were found with female respondents preferring warm sounds and being more sensitive to noise. Conclusions: The sound preference and hearing habits questionnaire (SP-HHQ) is a usable tool for profiling the users with respect to sound preferences relevant for HA fitting and pre-sets for audio devices.
Objective: Two modifications of the standardised MUlti Stimulus test with Hidden Reference and Anchor (MUSHRA), namely MUSHRA simple and MUSHRA drag&drop, were implemented and evaluated together with ...the original test method. The modifications were designed to maximise the accessibility of MUSHRA for elderly and technically non-experienced listeners, who constitute the typical target group in hearing aid evaluation. Design: Three MUSHRA variants were assessed based on subjective and objective measures, e.g. test-retest reliability, discrimination ability, time exposure and overall preference. With each method, participants repeated the task to rate the quality of several hearing aid algorithms four times. Study sample: Fifty listeners grouped into five subject classes were tested, including elderly and technically non-experienced participants with normal and impaired hearing. Normal-hearing, technically experienced students served as controls. Results: Both modifications can be used to obtain compatible rating results. Both were preferred over the classical MUSHRA procedure. Technically experienced listeners performed best with the modification MUSHRA drag&drop. Conclusions: The comprehensive comparison of the MUSHRA variants demonstrates that the intuitive modification MUSHRA drag&drop can be generally recommended. However, considering e.g. specific evaluation demands, we suggest a differentiated and careful application of listening test methods.