Human-machine addressee detection (H-M AD) is a modern paralinguistics and dialogue challenge that arises in multiparty conversations between several people and a spoken dialogue system (SDS) since ...the users may also talk to each other and even to themselves while interacting with the system. The SDS is supposed to determine whether it is being addressed or not. All existing studies on acoustic H-M AD were conducted on corpora designed in such a way that a human addressee and a machine played different dialogue roles. This peculiarity influences speakers' behaviour and increases vocal differences between human- and machine-directed utterances. In the present study, we consider the Restaurant Booking Corpus (RBC) that consists of complexity-identical human- and machine-directed phone calls and allows us to eliminate most of the factors influencing speakers' behaviour implicitly. The only remaining factor is the speakers' explicit awareness of their interlocutor (technical system or human being). Although complexity-identical H-M AD is essentially more challenging than the classical one, we managed to achieve significant improvements using data augmentation (unweighted average recall (UAR) = 0.628) over native listeners (UAR = 0.596) and a baseline classifier presented by the RBC developers (UAR = 0.539).
With the spread of smart devices, people may obtain a variety of information on their surrounding environment thanks to sensing technologies. To design more context-aware systems, psychological user ...context (e.g., emotional status) is a substantial factor for providing useful information in an appropriate timing. As a typical use case that has a high demand for context awareness but is not tackled widely yet, we focus on the tourism domain. In this study, we aim to estimate the emotional status and satisfaction level of tourists during sightseeing by using unconscious and natural tourist actions. As tourist actions, behavioral cues (eye and head/body movement) and audiovisual data (facial/vocal expressions) were collected during sightseeing using an eye-gaze tracker, physical-activity sensors, and a smartphone. Then, we derived high-level features, e.g., head tilt and footsteps, from behavioral cues. We also used existing databases of emotionally rich interactions to train emotion-recognition models and apply them in a cross-corpus fashion to generate emotional-state prediction for the audiovisual data. Finally, the features from several modalities are fused to estimate the emotion of tourists during sightseeing. To evaluate our system, we conducted experiments with 22 tourists in two different touristic areas located in Germany and Japan. As a result, we confirmed the feasibility of estimating both the emotional status and satisfaction level of tourists. In addition, we found that effective features used for emotion and satisfaction estimation are different among tourists with different cultural backgrounds.
Humans and machines harmoniously collaborating and benefiting from each other is a long lasting dream for researchers in robotics and artificial intelligence. An important feature of efficient and ...rewarding cooperation is the ability to assume possible problematic situations and act in advance to prevent negative outcomes. This concept of assistance is known under the term proactivity. In this article, we investigate the development and implementation of proactive dialogues for fostering a trustworthy human-computer relationship and providing adequate and timely assistance. Here, we make several contributions. A formalisation of proactive dialogue in conversational assistants is provided. The formalisation forms a framework for integrating proactive dialogue in conversational applications. Additionally, we present a study showing the relations between proactive dialogue actions and several aspects of the perceived trustworthiness of a system as well as effects on the user experience. The results of the experiments provide significant contributions to the line of proactive dialogue research. Particularly, we provide insights on the effects of proactive dialogue on the human-computer trust relationship and dependencies between proactive dialogue and user specific and situational characteristics.
As emotions play a central role in human communication, automatic emotion recognition has attracted increasing attention in the last two decades. While multimodal systems enjoy high performances on ...lab-controlled data, they are still far from providing ecological validity on non-lab-controlled, namely “in-the-wild” data. This work investigates audiovisual deep learning approaches to emotion recognition in in-the-wild problem. Inspired by the outstanding performance of end-to-end and transfer learning techniques, we explored the effectiveness of architectures in which a modality-specific Convolutional Neural Network (CNN) is followed by a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) using the AffWild2 dataset under the Affective Behavior Analysis in-the-Wild (ABAW) challenge protocol. We deployed unimodal end-to-end and transfer learning approaches within a multimodal fusion system, which generated final predictions using a weighted score fusion scheme. Exploiting the proposed deep-learning-based multimodal system, we reached a test set challenge performance measure of 48.1% on the ABAW 2020 Facial Expressions challenge, which advances the first-runner-up performance.
Information about a subjective user opinion towards an argument is crucial for argumentative systems in order to present appropriate content and adapt their behaviour to the individual user. However, ...requesting explicit feedback regarding the discussed arguments is often impractical and can hinder the interaction. To address this issue, we investigate the automatic recognition of user opinions towards arguments that are presented by means of a virtual avatar from social signals. We focus on two different user opinion categories ( convincing and interesting ) and two different types of social signals (facial expressions and eye movement). The recognition is addressed as a supervised learning problem and realized using the argument search evaluation data discussed in previous work. The overall performance is compared to a human annotation on a subset of the collected data. The results show that the machine learning performance is similar to human performance in both recognition tasks.
Numerous technologies exist for promoting a healthier lifestyle. These technologies collectively referred to as "Behavior Change Support Systems". However, the majority of existing apps use ...quantitative data representation. Since it is difficult to understand the meaning behind quantitative data, this approach has been suggested to lower users' motivation and fail to promote behavior change. Therefore, an interpretation of quantitative data needs to be provided as a supplement. However, different descriptions of the same data may lead to different outcomes. In this paper, we explore the impact of different communication styles for interpretations of quantitative data on behavior change by developing and evaluating Walkeeper - a web-based app that provides interpretations of the users' daily step counts using different levels of elaborateness and indirectness with the aim of promoting walking. Through the quantitative analysis and results of a user study, we contribute new knowledge on designing such interpretations for quantitative data.
How to Win Arguments Weber, Klaus; Rach Niklas; Minker, Wolfgang ...
Datenbank-Spektrum,
06/2020, Letnik:
20, Številka:
2
Journal Article
Recenzirano
Odprti dostop
People make decisions every day or form an opinion based on persuasion processes, whether through advertising, planning leisure activities with friends or public speeches. Most of the time, however, ...subliminal persuasion processes triggered by behavioral cues (rather than the content of the message) play a far more important role than most people are aware of. To raise awareness of the different aspects of persuasion (how and what), we present a multimodal dialog system consisting of two virtual agents that use synthetic speech in a discussion setting to present pros and cons to a user on a controversial topic. The agents are able to adapt their emotions based on explicit feedback of the users to increase their perceived persuasiveness during interaction using Reinforcement Learning.
With the emergence of new technologies, the surgical working environment becomes increasingly complex and comprises many medical devices that have to be taken cared of. However, the goal is to reduce ...the workload of the surgical team to allow them to fully focus on the actual surgical procedure. Therefore, new strategies are needed to keep the working environment manageable. Existing research projects in the field of intelligent medical environments mostly concentrate on workflow modeling or single smart features rather than building up a complete intelligent environment. In this article, we present the concept of intelligent digital assistance for clinical operating rooms (IDACO), providing the surgeon assistance in many different situations before and during an ongoing procedure using natural spoken language. The speech interface enables the surgeon to concentrate on the surgery and control the technical environment at the same time, without taking care of how to interact with the system. Furthermore, the system observes the context of the surgery and controls several devices autonomously at the appropriate time during the procedure.
As computer technology develops, spoken dialogue is becoming ever-more important when interacting with a wide variety of technological devices, including Personal Digital Assistants, tablet PCs, and ...mobile phones. Using speech leads to more natural and user-friendly interfaces. More specifically, the authors of this volume contend that the experience of talking to our computerized gadgets may be greatly improved by dynamically adapting the system's dialogue interaction style to the user's profile and emotional status. In this book, a novel approach that combines speech-based emotion recognition with adaptive human-computer dialogue modeling is described. With the robust recognition of emotions from speech signals as their goal, the authors analyze the effectiveness of using a plain emotion recognizer, a speech-emotion recognizer combining speech and emotion recognition, and multiple speech-emotion recognizers at the same time. The semi-stochastic dialogue model employed relates user emotion management to the corresponding dialogue interaction history and allows the device to adapt itself to the context, including altering the stylistic realization of its speech. This comprehensive volume begins by introducing spoken language dialogue systems and providing an overview of human emotions, theories, categorization and emotional speech. It moves on to cover the adaptive semi-stochastic dialogue model and the basic concepts of speech-emotion recognition. Finally, the authors show how speech-emotion recognizers can be optimized, and how an adaptive dialogue manager can be implemented. The book, with its novel methods to perform robust speech-based emotion recognition at low complexity, will be of interest to a variety of readers involved in human-computer interaction.
In this work, the authors present a fully statistical approach to model non--native speakers' pronunciation. Second-language speakers pronounce words in multiple different ways compared to the native ...speakers. Those deviations, may it be phoneme substitutions, deletions or insertions, can be modelled automatically with the new method presented here.The methods is based on a discrete hidden Markov model as a word pronunciation model, initialized on a standard pronunciation dictionary. The implementation and functionality of the methodology has been proven and verified with a test set of non-native English in the regarding accent.The book is written for researchers with a professional interest in phonetics and automatic speech and speaker recognition.