Cinematic virtual reality (VR) elicits new possibilities for the treatment of sound in space. Distinct from screen-based practices of filmmaking, diegetic sound-image relations in immersive ...environments present unique, potent affordances, in which content is at once imaginary, and real. However, a reductive modelling of environmental realism, in the name of 'presence' predominates. Yet cross-modal perception is a noisy, flickering representation of worlds. Treating our perceptual apparatus as stable, objective transducers, ignores the inter-subjective potential at the heart of immersive work, and situates users as passive spectators. This condescends to audiences and discounts the historic symbiosis of sound-image signification, which comes to constitute notions of verisimilitude. We understand the tropes; we willingly suspend disbelief. This article examines spatial sound rendering in virtual environments, probing at diegetic realism. It calls for an experimental, aesthetic approach, suggesting several speculative strategies, drawing from theories of embodied cognition and acousmatic practice (amongst others) which necessarily deal with space and time as contingencies of the immersive. VR affords a development of the dialectic between sound and image which distinctively involves our spatial attention. The lines between referent and signified blur; the mediation between representations invoked by practitioners, and those experienced by audiences, suggest new opportunities for co-authorship.
We present an interactive wave-based sound propagation system that generates accurate, realistic sound in virtual environments for dynamic (moving) sources and listeners. We propose a novel algorithm ...to accurately solve the wave equation for dynamic sources and listeners using a combination of precomputation techniques and GPU-based runtime evaluation. Our system can handle large environments typically used in VR applications, compute spatial sound corresponding to listener's motion (including head tracking) and handle both omnidirectional and directional sources, all at interactive rates. As compared to prior wave-based techniques applied to large scenes with moving sources, we observe significant improvement in runtime memory. The overall sound-propagation and rendering system has been integrated with the Half-Life 2 game engine, Oculus-Rift head-mounted display, and the Xbox game controller to enable users to experience high-quality acoustic effects (e.g., amplification, diffraction low-passing, high-order scattering) and spatial audio, based on their interactions in the VR application. We provide the results of preliminary user evaluations, conducted to study the impact of wave-based acoustic effects and spatial audio on users' navigation performance in virtual environments.
The auditory brain circuits are biologically constructed to recand localize sounds by encoding a combination of cues that help individuals interpret sounds. The development of computational methods ...inspired by human capacities has established opportunities for improving machine hearing. Recent studies based on deep learning show that using convolutional recurrent neural networks (CRNNs) is a promising approach for sound event detection and localization in spatial sound. Nevertheless, depending on the sound environment, the performance of these systems is still far from reaching perfect metrics. Therefore, this work intends to boost the performance of state-of-the-art (SOTA) systems by using bio-inspired gammatone auditory filters and intensity vectors (IVs) for the acoustic feature extraction stage, along with the implementation of a temporal convolutional network (TCN) block into a CRNN model, to capture long term dependencies. Three data augmentation techniques are applied to increase the small number of samples in spatial audio datasets. The mentioned stages constitute our proposed Gammatone-based Sound Events Localization and Detection (G-SELD) system, which exceeded the SOTA results on four spatial audio datasets with different levels of acoustical complexity and with up to three sound sources overlapping in time.
Ambisonics i.e., a full-sphere surround sound, is quintessential with 360° visual content to provide a realistic virtual reality (VR) experience. While 360° visual content capture gained a tremendous ...boost recently, the estimation of corresponding spatial sound is still challenging due to the required sound-field microphones or information about the sound-source locations. In this paper, we introduce a novel problem of generating Ambisonics in 360° videos using the audiovisual cue. With this aim, firstly, a novel 360° audio-visual video dataset of 265 videos is introduced with annotated sound-source locations. Secondly, a pipeline is designed for an automatic Ambisonic estimation problem. Benefiting from the deep learning based audiovisual feature-embedding and prediction modules, our pipeline estimates the 3D sound-source locations and further use such locations to encode to the B-format. To benchmark our dataset and pipeline, we additionally propose evaluation criteria to investigate the performance using different 360° input representations. Our results demonstrate the efficacy of the proposed pipeline and open up a new area of research in 360° audio-visual analysis for future investigations.
Sound source separation extracts only sound sources of interest from a mixture of sound sources and it is used as pre-processing for automatic speech recognition. For example, it reduces ambient ...noise, and automatic speech recognition and speaker identification are expected to improve. A commonly used sound source separation method is called beamforming using a microphone array consisting of multiple microphones. Although beamforming can separate sound sources based on the direction obtained from inter-microphone time and level differences, it has a limitation that it cannot separate sound sources in the same direction. In this paper, we propose a location-specific source separation method using multiple microphone arrays to solve this problem. In the proposed method, first, each microphone array separates a target sound source, and each separated sound includes other noise sources in the same direction of the target sound source mentioned above. Since the target sound source is included in all separated sounds, the proposed method extracts signals commonly included in the separated sounds to remove such noise sources. Preliminary results through numerical simulation showed the proposed method with non-negative matrix factorization as sound source separation worked properly.
High‐quality acousto‐holographic patterns and images, integral to applications like 3D displays, acoustophoresis, and midair haptics, require precise distribution of ultrasound waves to achieve. ...Essential tools for this task are spatial sound modulators (SSMs), which control constituent elements to enable dynamic distribution of sound pressure. However, current ultrasonic SSMs face limitations due to high costs and the intricate actuation of numerous small, closely spaced units. This study introduces “segmented SSMs,” novel devices that combine traditional acoustic metasurface pixel units into custom‐shaped segmented elements. These segmented SSMs reduce actuation costs and complexity while retaining pressure distribution quality. This approach includes a custom phase agglomeration algorithm (PAA), that offers a hierarchy of potential segmentation solutions for user selection. An SSM fabrication method is detailed using off‐the‐shelf 3D printers and bespoke control electronics, completing an end‐to‐end methodology from conception to realization. This approach is validated with two prototype SSM devices that focus sound waves and levitate polystyrene beads using dynamic segmented elements. Further enhancements to the technique are explored through hybrid SSM devices with both static and dynamic elements. The pipeline facilitates efficient SSM construction across diverse applications and invites the inception of future devices with varying sizes, uses, and actuation mechanisms.
Segmented spatial sound modulators (SSMs) are innovative devices that combine traditional acoustic metasurface units into custom‐shaped segmented elements, reducing actuation cost and complexity while maintaining high‐quality sound pressure distributions. The end‐to‐end methodology employs a custom phase agglomeration algorithm and uses off‐the‐shelf 3D printers and bespoke control electronics to ensure efficient SSM construction for applications like 3D displays and acoustic levitation.
This paper presents an experimental study of spatial sound usefulness in searching and navigating through augmented reality environments. Participants were asked to find three objects hidden within ...no-sound and spatial sound AR environments. The experiment showed that the participants of the spatialized sound group performed faster and more efficiently than working in no-sound configuration. What is more, 3D sound was a valuable cue for navigation in AR environment. The collected data suggest that the use of spatial sound in AR environments can be a significant factor in searching and navigating for hidden objects within indoor AR scenes. To conduct the experiment, the CARE approach was applied, while its CARL language was extended with new elements responsible for controlling audio in 3D space.
The sweet spot can be interpreted as the region where acoustic sources create a spatial auditory illusion. We study the problem of maximizing this sweet spot when reproducing a desired sound wave ...using an array of loudspeakers. To achieve this, we introduce a theoretical framework for spatial sound perception that can be used to define a sweet spot, and we develop a method that aims to generate a sound wave that directly maximizes the sweet spot defined by a model within this framework. Our method aims to incorporate perceptual principles from the onset and it is flexible: it imposes little to no constraints on the regions of interest, the arrangement of loudspeakers or their radiation pattern. However, the perceptual models must satisfy a convexity condition, which is fulfilled by state-of-the-art monaural perceptual models, but not by binaural ones. Proof-of-concept experiments show that our method, when implemented with van de Par's monaural model, outperforms state-of-the-art sound field synthesis methods in terms of their binaural azimuth localization and binaural coloration properties.
Sound environment reproduction of various flight conditions in aircraft mock-ups is a valuable tool for the study, prediction, demonstration and jury testing of interior aircraft sound quality and ...annoyance. To provide a faithful reproduced sound environment, time, frequency and spatial characteristics should be preserved. Physical sound field reproduction methods for spatial sound reproduction are mandatory to immerse the listener׳s body in the proper sound fields so that localization cues are recreated at the listener׳s ears. Vehicle mock-ups pose specific problems for sound field reproduction. Confined spaces, needs for invisible sound sources and very specific acoustical environment make the use of open-loop sound field reproduction technologies such as wave field synthesis (based on free-field models of monopole sources) not ideal. In this paper, experiments in an aircraft mock-up with multichannel least-square methods and equalization are reported. The novelty is the actual implementation of sound field reproduction with 3180 transfer paths and trim panel reproduction sources in laboratory conditions with a synthetic target sound field. The paper presents objective evaluations of reproduced sound fields using various metrics as well as sound field extrapolation and sound field characterization.