In this article, behaviour of students in the e-learning environment is analyzed. The novel pipeline is proposed based on video facial processing. At first, face detection, tracking and clustering ...techniques are applied to extract the sequences of faces of each student. Next, a single efficient neural network is used to extract emotional features in each frame. This network is pre-trained on face identification and fine-tuned for facial expression recognition on static images from AffectNet using a specially developed robust optimization technique. It is shown that the resulting facial features can be used for fast simultaneous prediction of students' engagement levels (from disengaged to highly engaged), individual emotions (happy, sad, etc.,) and group-level affect (positive, neutral or negative). This model can be used for real-time video processing even on a mobile device of each student without the need for sending their facial video to the remote server or teacher's PC. In addition, the possibility to prepare a summary of a lesson is demonstrated by saving short clips of different emotions and engagement of all students. The experimental study on the datasets from EmotiW (Emotion Recognition in the Wild) challenges showed that the proposed network significantly outperforms existing single models.
—
Starting from the definition of the main tone of the speaker’s speech as the minimum frequency of the linear power spectrum of the vocalized segments of the speech signal, an estimation of ...potentially achievable accuracy of its measurement under the action of background interference such as white Gaussian noise has been made. Based on this estimation, a suboptimal algorithm for measuring the pitch frequency using a short speech frame has been developed. The developed algorithm effectiveness is confirmed by the results of the experiment, during which the author’s software was used.
In this paper, we consider the problem of autoregressive modeling of a speech signal according to the data of its discrete Fourier transform on intervals of one speech frame (several milliseconds). ...Based on the information-theoretic approach, a novel method, in which two computational procedures, namely, iterative optimization of autoregressive parameters and their automatic amplitude scaling are separated from each other was developed. A full-scale experiment was set up and carried out. The main advantage of the new method in comparison with its known analogs is shown to be the extremely high rate of convergence of iterations to the optimal solution.
—
The article proposes a new algorithm for solving the problem of real-time detection of vowel speech sounds based on (
R
+ 1)-element information and the whitening filter method. An example of ...practical application of the algorithm is described and an assessment of its efficiency is provided. A full-scale experiment is conducted; its results indicate that the proposed algorithm demonstrates a sufficiently high speed and a guaranteed significance level of decisions with minimal performance requirements to the computing equipment.
The problem of determining a fundamental tone frequency of a speech signal in the presence of white Gaussian noise is examined. A method for measuring this frequency is proposed which takes into ...account the periodic structure of the power spectrum of voiced speech frames and is based on the principle of harmonic energy accumulation in the frequency domain. For this purpose a procedure for equalizing the envelope of the power spectrum is introduced in the algorithm for processing a speech signal using a two-level autoregression model of the observations: within the limits of a single period of the fundamental tone and within an interval of several of these periods. Here adaptation of the order of the autoregression of the lower level to the observed frame is planned. An example of the practical realization of the adaptive method based on the Berg method is examined. The basic advantages of the adaptive method compared to the known analogs are high speed and enhanced noise stability, which are confirmed in a full-scale experiment. A gain in threshold signals of 5-10 dB was obtained through use of the adaptive method.
The article considers the problem of personal biometric data “aging” over time. A method has been proposed to overcome this problem by automatically updating the specified data in the biometric ...system storage using the speech signals of registered users obtained during latest requests for their identification and online service. The proposed method uses a scale-invariant indicator of the voice template quality. As a result, it is characterized by guaranteed reliability of the decisions made in the conditions of a wide speech signal dynamic range. It was established that the use of a scale-invariant indicator provides the guaranteed significance level of decisions made by a conventional observer. A full-scale experiment implementing the proposed method has been set up and carried out using an authoring software; practical justification for the effectiveness of the proposed method with real speech data has been given. The results obtained are intended for using in the development of new and modernization of existing systems and technologies for automated quality control and updating of personal biometric data.
This paper addresses the face recognition task for offline mobile applications. Using AutoML techniques, we propose a novel approach to develop a fast neural network-based facial feature extractor ...for a concrete device. First, the Once-for-All SuperNet is trained on a large facial dataset. Each device is characterized by its lookup table, which contains the running times of inference in each layer of the SuperNet. An evolutionary search is then used to select the most accurate subnetwork within a limit on the maximum expected latency. We propose training a neural architecture comparator using Gradient Boosted Trees to choose the better subnetwork in this search. Experimental face verification and recognition results demonstrate our proposed approach's robustness to various facial region positions. Our best model achieves an identification accuracy of 98.7% for the LFW dataset in less than 5 ms on the Qualcomm Snapdragon 865 GPU.
The interaction of two types of modality of a system for processing audiovisual information in the problem of evaluating the emotional state of users of dialogue information systems was studied. In ...order to enhance the precision of an estimation in real time, it is proposed to use an audio modality for the purpose of detecting speech segments of increased emotionality. As an indicator of the degree of speech emotionality, the intensity of the flow of vowel sounds in a user’s speech signal at input to the information system is used. A method has been developed for measuring this indicator from the empirical probability of the occurrence of vowel sounds in the user’s a speech signal. An example is presented for practical implementation of the method in soft real time. A full-scale experiment using the authors’ software was posed and presented. The advantages of the proposed method are shown: high speed of operation and high sensitivity to the change in the level of speech emotionality of users. Results obtained are intended for developers of advanced information systems with an audiovisual user interface.
We present the first extensive radio to γ-ray observations of a fast-rising blue optical transient, AT 2018cow, over its first ∼100 days. AT 2018cow rose over a few days to a peak luminosity Lpk ∼ 4 ...× 1044 erg s−1, exceeding that of superluminous supernovae (SNe), before declining as L ∝ t−2. Initial spectra at δt 15 days were mostly featureless and indicated large expansion velocities v ∼ 0.1c and temperatures reaching T ∼ 3 × 104 K. Later spectra revealed a persistent optically thick photosphere and the emergence of H and He emission features with v ∼ 4000 km s−1 with no evidence for ejecta cooling. Our broadband monitoring revealed a hard X-ray spectral component at E ≥ 10 keV, in addition to luminous and highly variable soft X-rays, with properties unprecedented among astronomical transients. An abrupt change in the X-ray decay rate and variability appears to accompany the change in optical spectral properties. AT 2018cow showed bright radio emission consistent with the interaction of a blast wave with vsh ∼ 0.1c with a dense environment ( for vw = 1000 km s−1). While these properties exclude 56Ni-powered transients, our multiwavelength analysis instead indicates that AT 2018cow harbored a "central engine," either a compact object (magnetar or black hole) or an embedded internal shock produced by interaction with a compact, dense circumstellar medium. The engine released ∼1050-1051.5 erg over ∼103-105 s and resides within low-mass fast-moving material with equatorial-polar density asymmetry (Mej,fast 0.3 M☉). Successful SNe from low-mass H-rich stars (like electron-capture SNe) or failed explosions from blue supergiants satisfy these constraints. Intermediate-mass black holes are disfavored by the large environmental density probed by the radio observations.
The present paper discusses the problem of distortions in speech signals transmitted over a communication channel to a biometric system during voice-based remote identification. A possible ...rectification approach involves a preliminary correction of the frequency spectrum of the received signal based on the pre-distortion principle. Taking into account a priori uncertainty, a new information indicator of speech signal distortions is proposed, along with a method for its measurement under conditions of small observation samples. An example of fast practical implementation of the method based on a parametric spectral analysis algorithm is considered. Results of an experimental test of the proposed approach are provided for three different communication channel instantiations. It is shown that the proposed method facilitates the transformation of an initially distorted speech signal into compliance with a registered voice template using an acceptable algorithmic information discrimination criterion. The described approach may be used in existing biometric systems and speaker identification technologies.