The opinion mining and human-agent interaction communities are currently addressing sentiment analysis from different perspectives that comprise, on the one hand, disparate sentiment-related ...phenomena and computational representations, and on the other hand, different detection and dialog management methods. In this paper we identify and discuss the growing opportunities for cross-disciplinary work that may increase individual advances. Sentiment/opinion detection methods used in human-agent interaction are indeed rare and, when they are employed, they are not different from the ones used in opinion mining and consequently not designed for socio-affective interactions (timing constraint of the interaction, sentiment analysis as an input and an output of interaction strategies). To support our claims, we present a comparative state of the art which analyzes the sentiment-related phenomena and the sentiment detection methods used in both communities and makes an overview of the goals of socio-affective human-agent strategies. We propose then different possibilities for mutual benefit, specifying several research tracks and discussing the open questions and prospects. To show the feasibility of the general guidelines proposed we also approach them from a specific perspective by applying them to the case of the Greta embodied conversational agents platform and discuss the way they can be used to make a more significative sentiment analysis for human-agent interactions in two different use cases: job interviews and dialogs with museum visitors.
Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose ...a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a video-based task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance.
Humans and machines harmoniously collaborating and benefiting from each other is a long lasting dream for researchers in robotics and artificial intelligence. An important feature of efficient and ...rewarding cooperation is the ability to assume possible problematic situations and act in advance to prevent negative outcomes. This concept of assistance is known under the term proactivity. In this article, we investigate the development and implementation of proactive dialogues for fostering a trustworthy human-computer relationship and providing adequate and timely assistance. Here, we make several contributions. A formalisation of proactive dialogue in conversational assistants is provided. The formalisation forms a framework for integrating proactive dialogue in conversational applications. Additionally, we present a study showing the relations between proactive dialogue actions and several aspects of the perceived trustworthiness of a system as well as effects on the user experience. The results of the experiments provide significant contributions to the line of proactive dialogue research. Particularly, we provide insights on the effects of proactive dialogue on the human-computer trust relationship and dependencies between proactive dialogue and user specific and situational characteristics.
This longitudinal study concerns the analysis of 347 doctoral theses on scientific medical information retrieved from the TESEO database and defended in Spanish universities from 1977 to 2018. At the ...same time, it considers other factors, such as the geographical scope distinguishing between dissertations defended in the Spanish region of Levante and in the rest of Spain, and discusses the gender of the authors and whether the general scientometric topic is eminently bibliometric or just related to the computerisation process. The longitudinal global finding is a bimodal trend rejecting the Price model of science growth. Our study reveals that this production came initially from the universities located in the Spanish Levante. Despite the recent increase of theses covering computerisation topics, the theses on scientific medical information are mostly bibliometric. In addition, men were more productive than women but no statistically significant differences in the quantity and in the longitudinal trends according to the authors’ gender were observed. Nevertheless, women show a growing lineal pattern. An overall conclusion that could be inferred is that the prestige gained by the Spanish scientific medical information is the result, to a considerable extent, of the remarkable efforts performed by its researchers.
Based on a previous investigation, a quantitative study aimed to identify user’ preferences towards four synthetic voices of two different quality levels (classified through the sophistication of the ...synthesizer: low vs. high) is proposed. The voices administered to participants were developed considering two main aspects: the voice quality (high/low) and their gender (male/female). 182 unpaid participants were recruited for the study, divided in four groups according to their age, and therefore classified as adolescents, young adults, middle-aged, and seniors. To collect data regarding each voice, randomly audited by participants, the shortened version of the Virtual Agent Voice Acceptance Questionnaire (VAVAQ) was exploited. Outcomes of the previous study revealed that the voices of high quality, regardless of their gender, received a higher acclaim by all participants examined rather than the corresponding two voices assessed as lower quality. Conversely, findings of the current study suggest that the four new groups of participants involved agreed in showing their strong preference towards the high-quality voice gendered as female compared to all the other considered voices. Regarding the two voices gendered as male, the high-quality one was considered as more original and capable to arouse positive emotional states than the low-quality one. Moreover, the high-quality male voice was judged as more natural than the female low-quality one. Results provide some insights for future directions in the user experience and design field.
Currently, the diagnosis of major depressive disorder (MDD) and its subtypes is mainly based on subjective assessments and self-reported measures. However, objective criteria as ...Electroencephalography (EEG) features would be helpful in detecting depressive states at early stages to prevent the worsening of the symptoms. Scientific community has widely investigated the effectiveness of EEG-based measures to discriminate between depressed and healthy subjects, with the aim to better understand the mechanisms behind the disorder and find biomarkers useful for diagnosis. This work offers a comprehensive review of the extant literature concerning the EEG-based biomarkers for MDD and its subtypes, and identify possible future directions for this line of research. Scopus, PubMed and Web of Science databases were researched following PRISMA's guidelines. The initial papers' screening was based on titles and abstracts; then full texts of the identified articles were examined, and a synthesis of findings was developed using tables and thematic analysis. After screening 1871 articles, 76 studies were identified as relevant and included in the systematic review. Reviewed markers include EEG frequency bands power, EEG asymmetry, ERP components, non-linear and functional connectivity measures. Results were discussed in relations to the different EEG measures assessed in the studies. Findings confirmed the effectiveness of those measures in discriminating between healthy and depressed subjects. However, the review highlights that the causal link between EEG measures and depressive subtypes needs to be further investigated and points out that some methodological issues need to be solved to enhance future research in this field.
Emotion recognition is attracting the attention of the research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an ...automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy results were achieved when we fine-tuned the whole model by appending a multilayer perceptron on top of it, confirming that the training was more robust when it did not start from scratch and the previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion recognizer, we extracted the Action Units of the videos and compared the performance between employing static models against sequential models. Results showed that sequential models beat static models by a narrow difference. Error analysis reported that the visual systems could improve with a detector of high-emotional load frames, which opened a new line of research to discover new ways to learn from videos. Finally, combining these two modalities with a late fusion strategy, we achieved 86.70% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. Results demonstrated that these modalities carried relevant information to detect users’ emotional state and their combination allowed to improve the final system performance.
Spoken dialogue systems have been proposed to enable a more natural and intuitive interaction with the environment and human-computer interfaces. In this contribution, we present a framework based on ...neural networks that allows modeling of the user’s intention during the dialogue and uses this prediction to dynamically adapt the dialogue model of the system taking into consideration the user’s needs and preferences. We have evaluated our proposal to develop a user-adapted spoken dialogue system that facilitates tourist information and services and provide a detailed discussion of the positive influence of our proposal in the success of the interaction, the information and services provided, and the quality perceived by the users.
ThisSpecial Issue presents the latest advances in research and novel applications of speech and language technologies based on the works presented at the sixth edition of the IberSPEECH conference ...held in Granada in 2022, paying special attention to those focused on Iberian languages. IberSPEECH is the international conference of the Special Interest Group on Iberian Languages (SIG-IL) of the International Speech Communication Association (ISCA) and the Spanish Thematic Network on Speech Technologies (Red Temática en Tecnologías del Habla, or RTTH for short). Several researchers were invited to extend the contributions presented at IberSPEECH2022 due to their interest and quality. As a result, the Special Issue is composed of 11 papers that cover different research topics related to speech perception, speech analysis and enhancement, speaker verification and identification, speech production and synthesis, natural language processing, together with several applications and evaluation challenges.