As one of the most representative applications built on deep learning, audio systems, including keyword spotting, automatic speech recognition, and speaker identification, have recently been ...demonstrated to be vulnerable to adversarial examples, which have already raised general concerns in both academia and industry. Existing attacks follow the same adversarial example generation paradigm from computer vision, i.e., overlaying the optimized additive perturbations on original voices. However, due to the additive perturbations' nature on human audibility, balancing the stealthiness and attack capability remains a challenging problem. In this paper, we rethink the stealthiness of audio adversarial examples and turn to introduce another kind of audio distortion, i.e., reverberation, as a new perturbation format for stealthy adversarial example generation. Such convolutional adversarial perturbations are crafted as real-world impulse responses and behave as a natural reverberation for deceiving humans. Based on this idea, we propose AdvReverb to construct, optimize, and deliver phoneme-level convolutional adversarial perturbations on both speech and music carriers with a well-designed objective. Experimental results demonstrate that AdvReverb could realize high attack success rates over 95% on three audio-domain tasks while achieving superior perceptual quality and keeping stealthy from human perception in over-the-air and over-the-line delivery scenarios.
Cascaded audio systems (CASs) widely exist in consumer electronics, within which volume control is one of the most basic functions. However, the current volume adjustment schemes cannot automatically ...achieve optimal control, leading to performance degradation of CASs. Meanwhile, there is a lack of mathematical models and quantitative analysis between CASs performance and volume control position. In this study, an optimal volume control architecture named local processing and amplification (LPA) is proposed to minimize the volume control's adverse impact on CASs' performance automatically. First, a general CAS model is established through effective analysis and reasonable equivalence of diverse complicated CASs. Second, the quantitative relationship between the volume control and the CAS performance is theoretically analyzed using the noise figure theory, hence, the LPA architecture is proposed. Finally, the proposed LPA's performance improvement effect is analyzed and compared with the other volume controls in a wide volume range. Simulations and experiments using multiple indexes show that the signal-to-noise ratio (SNR) and the total harmonic distortion plus noise (THD+N) improvement of the proposed architecture are up to 5.2 dB and 4.8 dB, respectively, under a typical −9 dB volume.
Recently, the media of the signal transmission of pro-audio equipment has been changed from audio cable to LAN cable. As the transmission of audio signal by LAN cable solves the problem of the loss ...and noise caused by long distance transmission, various transmission technologies are emerging. Particularly, some technologies change the configuration of the conventional sound system by mixing different audio sources in the receiving device, including the function of the signal transmission. However, these technologies can only be implemented in the Intranet; even if they are implemented in the Internet, each receiving device has limited independent mixing. In this paper, each receiver based on the Internet selects, controls, and mixes multiple audio sources. Consequently, it is possible to provide an acoustic environment suitable for a user's taste rather than a uniform sound service. The results of this study show that each receiver selects, controls, and mixes the different audio sources independently, and then outputs these audio sources. It is expected that this study will become the standard in the related industry as a new audio network protocol in the field of Audio Service.
Frequency sampling structures are being effectively used in different signal processing applications, especially for higher filter lengths with a few non-zero samples in the passband. Although a few ...time domain-based polyphase Rational Sampling Rate Converter (RSRC) design approaches are available, a Frequency Sampling-based polyphase RSRC design approach has not yet been developed, and hence, it is analyzed in this presented work. In the proposed work, initially, we have derived a mathematical expression of the transfer function with real-valued parameters for the polyphase component of the FIR filter in the frequency domain. Stability analysis of the proposed filter with a radius <inline-formula> <tex-math notation="LaTeX">r < 1 </tex-math></inline-formula> gives better results. The work also discusses the usage of derived expressions to indicate the effect of different values of '<inline-formula> <tex-math notation="LaTeX">r </tex-math></inline-formula>'. Thereafter, it is established that the proposed structure has a lower requirement of Multiplications Per Output Sample (MPOS). For example, in the case of <inline-formula> <tex-math notation="LaTeX">M= </tex-math></inline-formula>3, <inline-formula> <tex-math notation="LaTeX">L= </tex-math></inline-formula>2 and <inline-formula> <tex-math notation="LaTeX">N= </tex-math></inline-formula>420 configuration by allowing 38 more Additions Per Output Sample (APOS), 24 MPOS are reduced, whereas, for <inline-formula> <tex-math notation="LaTeX">M= </tex-math></inline-formula>10, <inline-formula> <tex-math notation="LaTeX">L= </tex-math></inline-formula>9 and <inline-formula> <tex-math notation="LaTeX">N= </tex-math></inline-formula>900 configuration just 9 more APOS enables the proposed structure eliminating 20 MPOS. Finally, the synthesis result validates the efficacy of the proposed structure.
•Enhance the near filed of a loudspeaker array to realize personal audio systems for close-listening.•Spatial distributions and frequency characteristics of the acoustic pressure were investigated in ...the speech frequency band.•Robustness against a sphere the size of a human head was also investigated.
In personal audio systems, the loudspeaker array for enhancing the near field is designed based on the theory of generalized radiation modes. These modes are formulated as a generalized eigenvalue problem associated with acoustic transfer functions on the surface of the loudspeaker array. The eigenvectors, i.e., the modal shapes, represent combinations of the amplitude and polarity of individual loudspeakers. The eigenvalues indicate the reactive-to-active ratio of the acoustic power generated by the loudspeaker array. Hence, the loudspeaker array is designed according to the eigenvector with the largest eigenvalue to maximize the reactive-to-active ratio. In this study, a planar array consisting of nine loudspeakers is constructed, and the acoustic radiation characteristics are experimentally investigated.
Auscultation is a fundamental diagnostic technique that provides valuable diagnostic information about different parts of the body. With the increasing prevalence of digital stethoscopes and ...telehealth applications, there is a growing trend towards digitizing the capture of bodily sounds, thereby enabling subsequent analysis using machine learning algorithms. This study introduces the SonicGuard sensor, which is a multichannel acoustic sensor designed for long-term recordings of bodily sounds. We conducted a series of qualification tests, with a specific focus on bowel sounds ranging from controlled experimental environments to phantom measurements and real patient recordings. These tests demonstrate the effectiveness of the proposed sensor setup. The results show that the SonicGuard sensor is comparable to commercially available digital stethoscopes, which are considered the gold standard in the field. This development opens up possibilities for collecting and analyzing bodily sound datasets using machine learning techniques in the future.
During the COVID-19 pandemic, smart home requirements have shifted toward entertainment at home. The purpose of this research project was therefore to develop a robotic audio system for home ...automation. High-end audio systems normally refer to multichannel home theaters. Although multichannel audio systems enable people to enjoy surround sound as they do at the cinema, stereo audio systems have been popularly used since the 1980s. The major shortcoming of a stereo audio system is its narrow listening area. If listeners are out of the area, the system has difficulty providing a stable sound field. This is because of the head-shadow effect blocking the high-frequency sound. The proposed system, by integrating computer vision and robotics, can track the head movement of a user and adjust the directions of loudspeakers, thereby helping the sound wave travel through the air. Unlike previous studies, in which only a diminutive scenario was built, in this work, the idea was applied to a commercial 2.1 audio system, and listening tests were conducted. The theory and the simulation coincide with the experimental results. The approximate rate of audio quality improvement is 31%. The experimental results are encouraging, especially for high-pitched music.
The purpose of this study was to evaluate the relationship between recreational sound exposure and potentially undiagnosed or subclinical hearing loss by assessing sound exposure history, threshold ...sensitivity, distortion product otoacoustic emission (DPOAE) amplitudes, and performance on the words-in-noise (WIN) test.
Survey data were collected from 74 adult participants (14 male and 60 female), 18 to 27 years of age, recruited via advertisements posted throughout the University of Florida campus. Of these participants, 70 completed both the survey and the additional functional test battery, and their preferred listening level was measured in a laboratory setting.
There were statistically significant relationships between hearing thresholds and DPOAE amplitude. In contrast, performance on the WIN was not reliably related to threshold sensitivity within this cohort with largely normal hearing. The two most common exposures included bars or dance clubs, followed by music player use. There were no statistically significant relationships between individual or composite measures of recreational sound exposure, including preferred listening level, years of music player use, number of reported sound exposures, previous impulse noise exposure, or previous noise-induced change in hearing, and functional measures including threshold, DPOAE amplitude, and WIN measures. Some subjects were highly consistent in listening level preferences, while others were more variable from song to song.
No reliable relationships between common recreational sound exposure or previous noise-induced changes in hearing were found during analysis of threshold sensitivity, DPOAE amplitude, or WIN performance in this cohort. However, the study sample was predominantly female and Caucasian, which limits generalizability of the results.
In the world of real audio systems, it is extremely important to model and identify their nonlinear behavior, especially in the case of professional audio devices. In this context, it is useful to ...have a quantitative estimation of the nonlinearity degree of the device, which can be obtained by exploiting an efficient and rapid measurement methodology. In this article, we propose an original estimation technique targeting the third-order intermodulation distortion (IMD) and based on a single detection. The proposed technique can be implemented both on devices operating in baseband and in bandpass. Starting from the same single detection, the technique allows to give either an estimate of the third-order IMD for the signal level actually used and to extrapolate the estimate of the IMD to signal levels different from the one actually used. Experimental verifications on real audio devices have allowed to validate the procedure in operational situations, thus confirming the validity of the proposed approach.
More than a billion adolescents and youngsters are estimated to be at risk of acquiring recreational noise-induced hearing loss (RNIHL) due to the unsafe use of personal audio systems. RNIHL is ...preventable; therefore, the present study aimed to determine (i) the accuracy and reliability of dbTrack (Westone) sound-level monitoring earphones and (ii) the effect of sound-level monitoring earphones with smartphone feedback and hearing-health information as an intervention to promote healthy listening behaviors in young adults.
The study consisted of two phases: the first phase investigated the accuracy and reliability of dbTrack sound-level monitoring earphones. Accuracy was determined by comparing earphone measurements to sound level meter measurements. Intradevice reliability was determined by comparing earphone measurements during test-retest conditions. Nineteen participants were recruited through convenience sampling to determine within-subject reliability by comparing in-ear sound levels measured by the earphones during test-retest conditions. For the second phase of the study, a single-group pretest-posttest design was utilized. Forty participants, recruited through snowball sampling, utilized the sound-level monitoring earphones with the accompanying dbTrack smartphone application for 4 weeks. The application's smartphone feedback was disabled during the first 2 weeks (pretest condition) and enabled during the last 2 weeks (posttest condition). Average daily intensities, durations, and sound dosages measured during pre- and posttest conditions were compared.
Phase 1 dbTrack earphone measurements were within 1 dB when compared with sound level meter measurements. Earphones were also within 1 dB in repeated measures across earphones and across participants. Phase 2 posttest average daily intensity decreased by 8.7 dB (18.3 SD), duration decreased by 7.6 minutes (46.6 SD), and sound dose decreased by 4128.4% (24965.5% SD). Differences in intensity and sound dose were significantly lower with a small and medium effect size, respectively.
This study's preliminary data indicate that dbTrack (Westone) sound-level monitoring earphones with a calibrated in-ear microphone can reliably and accurately measure personal audio systems sound exposure. Preliminary results also suggest that feedback on sound exposure using the accurate sound-level monitoring earphones with the accompanying dbTrack application can potentially promote safe listening behavior in young adults and reduce the risk of acquiring an RNIHL.