Single-image generative adversarial networks learn from the internal distribution of a single training example to generate variations of it, removing the need of a large dataset. In this paper we ...introduce SpecSinGAN, an unconditional generative architecture that takes a single one-shot sound effect (e.g., a footstep; a character jump) and produces novel variations of it, as if they were different takes from the same recording session. We explore the use of multi-channel spectrograms to train the model on the various layers that comprise a single sound effect. A listening study comparing our model to real recordings and to digital signal processing procedural audio models in terms of sound plausibility and variation revealed that SpecSinGAN is more plausible and varied than the procedural audio models considered, when using multi-channel spectrograms. Sound examples can be found at the project website: https://www.adrianbarahonarios.com/specsingan/
Haptic interaction with user manipulation for smartphone Lee, Jong-uk; Lim, Jeong-Mook; Shin, Heesook ...
2013 IEEE International Conference on Consumer Electronics (ICCE),
2013-Jan., 20130101
Conference Proceeding, Journal Article
Recenzirano
This paper presents haptic interaction design and implementation for our designed smartphone bumper case 1 providing an interactive and a realistic physical feeling. The thin actuator is installed in ...the case to simulate a rapid realistic response. We designed a software structure guaranteeing a realtime physical response. The designed API can be used to provide realistic touch responses corresponding to an interactive physical feeling during gaming applications corresponding to sound effects.
From Cinema to Audiobook: Sharing Media Features Domingos, Ana Cláudia Munari; Garcia, Jaimeson Machado
Ekphrasis. Images, Cinema, Theory, Media,
01/2023, Letnik:
29, Številka:
1
Journal Article
Odprti dostop
This essay is part of a research project that seeks to analyze, based on Intermediality, the boundaries between sound media products such as audiobooks and podcasts. One of the conclusions of this ...larger project is in the way fictional sound media use cinema features to create what we can understand as immersion in the narrative diegesis. In this essay, we show how cinema, as a type of multimodal media, becomes fundamental to the construction of features for fiction audiobooks, such as tracks and soundeffects, as well as vocal performance, a feature that, along with the rhythm, is the oldest in literary history. Using Lars Elleström’s model in The modalities of media II (2020) and analyzing different audiobooks, we establish some categories. For soundtracks: iconic, based on the similarity with non-musical elements (Mickey Mousing); symbolic, the conventions in rhythms and other musical elements; and indexical, which refers to the manifestation of time, geographical location, or specific social group. Sound effects are distinguished into depictional sounds (to illustrate actions), deictional sounds (to create ambiance), and descriptional sounds (by convention). The methodology is comparative, seeking similarities and differences between these sound media products. Our intention is to expand Elleström’s model so that it can support the understanding of these new—and not-so-new—qualified media types. In addition to Elleström’s theories, we also bring those of Agnes Petho, Jorgen Bruhn and André Bazin.
Emulating the human ability to solve the cocktail party problem, i.e., focus on a source of interest in a complex acoustic scene, is a long standing goal of audio source separation research. Much of ...this research investigates separating speech from noise, speech from speech, musical instruments from each other, or sound events from each other. In this paper, we focus on the cocktail fork problem, which takes a three-pronged approach to source separation by separating an audio mixture such as a movie soundtrack or podcast into the three broad categories of speech, music, and sound effects (SFX - understood to include ambient noise and natural sound events). We benchmark the performance of several deep learning-based source separation models on this task and evaluate them with respect to simple objective measures such as signal-to-distortion ratio (SDR) as well as objective metrics that better correlate with human perception. Furthermore, we thoroughly evaluate how source separation can influence downstream transcription tasks. First, we investigate the task of activity detection on the three sources as a way to both further improve source separation and perform transcription. We formulate the transcription tasks as speech recognition for speech and audio tagging for music and SFX. We observe that, while the use of source separation estimates improves transcription performance in comparison to the original soundtrack, performance is still sub-optimal due to artifacts introduced by the separation process. Therefore, we thoroughly investigate how remixing of the three separated source stems at various relative levels can reduce artifacts and consequently improve the transcription performance. We find that remixing music and SFX interferences at a target SNR of 17.5 dB reduces speech recognition word error rate, and similar impact from remixing is observed for tagging music and SFX content.
The paper presents the most currently used driving simulators. Based on the analysis of the operating conditions of the rail vehicles the criteria to be met by this type of vehicle simulator have ...been characterized. In addition, the paper defines the animation requirements. Reference was made to both graphics and video presentation techniques. The paper also discusses the design requirements for the installation of a cab simulator and extras such as the sound effects. The most important requirements have been defined for programming of the simulated routes, weather conditions, collisions and other parameters that can significantly improve the quality of a training simulator. The paper discusses the function of control of the course of the exercise trainer and the requirements of his job. Finally the paper discusses the issues of the negative impact of driving simulators on the human body and methods of prevention thereof.
Sound design involves creatively selecting, recording, and editing sound effects for various media like cinema, video games, and virtual/augmented reality. One of the most time-consuming steps when ...designing sound is synchronizing audio with video. In some cases, environmental recordings from video shoots are available, which can aid in the process. However, in video games and animations, no reference audio exists, requiring manual annotation of event timings from the video. We propose a system to extract repetitive actions onsets from a video, which are then used - in conjunction with audio or textual embeddings - to condition a diffusion model trained to generate a new synchronized sound effects audio track. In this way, we leave complete creative control to the sound designer while removing the burden of synchronization with video. Furthermore, editing the onset track or changing the conditioning embedding requires much less effort than editing the audio track itself, simplifying the sonification process. We provide sound examples, source code, and pretrained models to faciliate reproducibility
Despite the significant positive characteristics game-based learning offers to pupil learning and assessment, preserving pupils' interest and keeping them engaged in an educational game is still a ...challenge. To this end, the study and implementation of motivation mechanisms in educational games are considered crucial. Typical examples of motivators in electronic games include points (coins), avatar icons, visualization of achievement levels, NPCs (non-player characters) giving helpful information to users, children-friendly graphics and sound effects, comparison with classmates, and leaderboards. In this paper, we conduct a preliminary study of the effectiveness of these GBL motivators in MG, an educational game for practicing and assessing multiplication skills. The study combined eye-tracking with a short, semi-structured interview session with the four elementary school students that took part in the experiment. Eye-tracking provides detailed monitoring and visualization of gaze behavior in the form of fixation (point and duration of visual focus) and saccade sequences. Given that the way users allocate their visual focus is spontaneous, the data collected and analyzed by eye-tracking are unbiased and give a new spectrum of insight into how users perceive a visual stimulus. In this study, we investigate how users visually respond to the implemented motivators and their visual behavior when deciding between two or more available answers and when given feedback after a wrong answer. The paper discusses useful eye-tracking metrics, provides adequate visualizations of the main findings, and concludes with the ways eye-tracking can help education scientists and practitioners gain a better understanding of the behavior of users of GBL applications and the motivation mechanisms they support.
Footsteps are among the most ubiquitous sound effects in multimedia applications. There is substantial research into understanding the acoustic features and developing synthesis models for footstep ...sound effects. In this paper, we present a first attempt at adopting neural synthesis for this task. We implemented two GAN-based architectures and compared the results with real recordings as well as six traditional sound synthesis methods. Our architectures reached realism scores as high as recorded samples, showing encouraging results for the task at hand.
This paper compares the sound design in the 2004 Chinese version of Letter from an Unknown Woman to the 1948 Hollywood version of the same film. Analysis of the issues of female discourse, female ...silence, and theme music in the two films,shows how historical and cultural differences determine different sound discourses for women. Furthermore, sound design influences the ways we interpret and understand the "unknown woman" in the Chinese version of Letter from an Unknown Woman.