Precise control of ultrasonic acoustic waves with frequencies f ≳ 20 kHz is useful in a range of applications from ultrasonic scanners to nondestructive testing and consumer haptic devices. A spatial ...sound modulator (SSM) is the acoustic analogy to the spatial light modulator (SLM) in optics and is highly sought after by acoustics researchers. A spatial sound modulator is constrained by very distinct practical conditions. Namely, it must be a reconfigurable device which modulates sound arbitrarily from a decoupled source. Here a reflective phase modulating device is realized, whose local units can be tuned to imprint a phase signature to an incoming wave. It is manually reconfigurable and consists of 1024 rigidly ended square waveguides with sliding bottom surfaces to provide variable phase delays. Experiments demonstrate the ability of this device to focus ultrasonic waves in air at different points in space, generate accurate pressure landscapes, and perform multiplane holography. Moreover, thanks to the subwavelength nature of the unit cells, this device outperforms state‐of‐the‐art phased‐array transducers of the same size in the quality and energy distribution of generated acoustic holographic images. These results pave the way for the construction of electronically controlled reflective SSMs.
In this work, a manually reconfigurable spatial sound modulator for ultrasonic waves in air is presented. This device can locally modulate the phase of incoming waves due to its individually tuned unit cells. Experiments demonstrate the ability of this device to focus ultrasonic waves in air at different points in space, generate accurate pressure landscapes, and perform multiplane holography.
This paper reviews the current state of loudspeaker-based spatial sound reproduction methods from technical perspective as well as perceptual perspective. A nomenclature is developed that allows for ...a strict separation between these two perspectives. The physical fundamentals, practical realization, and results from perceptual studies are discussed for a number of well-established and emerging reproduction techniques. Further, the paper outlines novel approaches to spatial sound evaluation in terms of perceived quality and provides a comparison of current approaches.
Spatial audio reproduction using the loudspeaker array introduces the curvature effect leading to a distorted listening experience when the listener is in the near field. In the near-field, the ...loudspeakers are approximated as point sources (spherical wave) and amplify the mode vectors. Further, the problem becomes more challenging for the irregular loudspeaker arrangement, which causes uneven energy distribution in the reproduction region. In this context, a near-field compensation is applied to the encoded ambisonics coefficients. An optimization problem is formulated, such as the loudspeaker gains encoded with spherical harmonics basis coefficients should match the target ambisonics coefficients. Further, the in-phase and quadrature components of the energy localization vector are imposed as the constraints to direct maximum energy in the reproduction region. The solution to the optimization problem is obtained using a derivative-free optimization solver. The performance of the proposed methods is evaluated for ITU-R recommended loudspeaker layouts using the technical and perceptual evaluation attributes.
Microperforated panel (MPP) absorbers have been widely used in noise reduction and are regarded as a promising alternative to the traditional porous materials. However, the flat panel-like shape ...restricts their practical use in actual rooms or buildings. To overcome these limitations, three-dimensional MPP spatial sound absorbers have been proposed. MPP are mostly metal materials with high cost. It is relatively easy to make cylinders or cubes, and difficult to make other complex structures. In view of this, a kind of non-woven material which can replace micro-perforated board is proposed in this paper. This kind of non-woven material has the characteristics of low cost, flexibility, convenient moulding and good sound absorption performance. The impedance tube experiments has been employed to study the normal incidence sound absorption coefficient for the non-woven fabric materials. The experimental results show that the sound absorption properties of non-woven fabrics are similar to those of ultra-micro MPP. Three kinds of spatial sound absorbers have been made by non-woven materials. They are the hollow cylindrical, fan-shaped, and honeycomb-like spatial sound absorbers, respectively. The measurement of reverberation chamber results show that the sound absorption capacity of the honeycomb-like spatial sound absorber is better than the hollow cylindrical and fan-shaped spatial sound absorbers. Adding non-woven cylindrical boundary layer can further improve the sound absorption performance of the fan-shaped and honeycomb-like space absorber.
To realize 3D spatial sound rendering with a two-channel headphone, one needs head-related transfer functions (HRTFs) tailored for a specific user. However, measurement of HRTFs requires a tedious ...and expensive procedure. To address this, we propose a fully perceptual-based HRTF fitting method for individual users using machine learning techniques. The user only needs to answer pairwise comparisons of test signals presented by the system during calibration. This reduces the efforts necessary for the user to obtain individualized HRTFs. Technically, we present a novel adaptive variational AutoEncoder with a convolutional neural network. In the training, this AutoEncoder analyzes publicly available HRTFs dataset and identifies factors that depend on the individuality of users in a nonlinear space. In calibration, the AutoEncoder generates high-quality HRTFs fitted to a specific user by blending the factors. We validate the feasibilities of our method through several quantitative experiments and a user study.
Visually guided spatial sound generation (VGSSG) is a well-suited multimodal learning method for dealing with recorded videos. However, existing methods are difficult to be directly applied to ...spatial sound generation for movie clips. This is mainly due to (1) the existence of Cinematic Audiovisual Language (CAL) in movies, which makes it difficult to construct spatial sound mapping models directly through data-driven based methods. (2) The problem of the inadequate model performance, which is caused by the excessive heterogeneous gap between audiovisual modal information. To solve the aforementioned problems, we propose a VGSSG method based on CAL decision-making and hierarchical feature coding and decoding, which effectively accomplishes spatial sound generation based on the CAL of movies. Specifically, to solve the problem of CAL modeling, a multimodal information-guided movie audio rendering decision maker is established, which can decide the rendering strategy based on the CAL of the current clip. To narrow the heterogeneous gap that hinders the fusion between audiovisual modal data, we propose a codec structure based on hierarchical fusion of audiovisual features and full-scale skip-connections, which improves the efficiency of the comprehensive utilization of audiovisual modal data, and demonstrates the effectiveness of adopting shallow features in VGSSG task. We integrate both 2-channel and 6-channel spatial audio generation into a unified framework. In addition, we establish a movie audiovisual bimodal dataset with hand-crafted CAL annotations. Experimentally, we demonstrate that compared with the existing methods, our method has higher performance in terms of reducing generation distortion.
•The existence of CAL makes it difficult to construct spatial sound rendering models through data-driven methods.•Neglecting the utilization of shallow structure features limits spatial sound generation performance.•The codec structure based on hierarchical fusion of audiovisual features and full-scale skip-connections can narrow the heterogeneous gap.•Propose a multimodal information-guided movie audio rendering decision maker to solve the problem of CAL modeling.
Abstract Augmented Reality (AR) involves the combination of synthetic and real stimuli, not being restricted to visual cues. For the inclusion of computer-generated sound in AR environments, it is ...often assumed that the distance attenuation model is the most intuitive and useful system for all users, regardless of the characteristics of the environment. This model reduces the gain of the sound sources as a function of the distance between the source and the listener. In this paper, we propose a different attenuation model not only based on distance, but also considering the listener orientation, so the user could listen more clearly the objects that they are looking at, instead of other near objects that could be out of their field of view and interest. We call this a directional attenuation model . To test the model, we developed an AR application that involves visual and sound stimuli to compare the traditional model versus the new one, by considering two different tasks in two AR scenarios in which sound plays an important role. A total of 38 persons participated in the experiments. The results show that the proposed model provides better workload for the two tasks, requiring less time and effort, allowing users to explore the AR environment more easily and intuitively. This demonstrates that this alternative model has the potential to be more efficient for certain applications.
Augmented Reality (AR) technologies are increasingly utilized as a means of stimulating immersive experiences to cultural site visitors, mainly through visual superimposition of interactive digital ...elements onto the physical world. Recent research has investigated the use of Audio AR (AAR) in heritage sites, wherein visitors listen to spatially registered sound which could be attributed to ‘talking’ physical artefacts. A parallel trend in the audience engagement programs of cultural institutions involves the employment of AI chatbots which are engaged in dialogues with followers or visitors to provide meaningful responses to a number of user questions. Herein, we present Exhibot, an intelligent audio guide system aiming at enhancing the user experience of cultural site visitors. Exhibot involves the combination of AAR and chatbot technologies to enable natural visitor-exhibit interaction, while also leveraging IoT devices to contextualize the delivered information. The key contribution of the proposed system lies in the interplay of AAR, chatbot and IoT technologies to create immersive learning experiences in the context of an integrated cultural guide system. Exhibot has undergone field trials to validate its usability and utility in realistic operational conditions. As a case study, we have chosen the statue of a prominent politician situated at a central square in Heraklion, Greece. The evaluation results indicated a very positive attitude of users, which is attributed both to the sense of immersion evoked by the AAR-powered storytelling and the natural human-like conversation enabled by the chatbot.
The individual Head-Related Transfer Functions (HRTFs) typically show large left-right ear differences. This work evaluates HRTF left-right differences by means of the rms measure called the Root ...Mean Square Difference (RMSD). The RMSD was calculated for HRTFs measured with the participation of a group of 15 subjects in our laboratory, for the HRTFs taken from the LISTEN database and for the acoustic manikin. The results showed that the RMSD varies in relation to the frequency and as expected is small for more symmetrical HRTFs at low frequencies (0.3÷1 kHz). For higher frequency bands (1÷5 kHz and above 5 kHz), the left-right differences are higher as an effect of the complex filtering caused by anatomical shape of the head and the pinnae. Results obtained for the subjects and for data taken from the LISTEN database were similar, whereas different for the acoustic manikin. This means that measurements with the use of the manikin cannot be considered as perfect average representation of the results obtained for people. The method and results of this study may be useful in assessing the symmetry of the HRTFs, and further analysis and improvement of how to considered the HRTFs individualization and personalization algorithms.