Video learning is an important task in computer vision and has experienced increasing interest over the recent years. Since even a small amount of videos easily comprises several million frames, ...methods that do not rely on a frame-level annotation are of special importance. In this work, we propose a novel learning algorithm with a Viterbi-based loss that allows for online and incremental learning of weakly annotated video data. We moreover show that explicit context and length modeling leads to huge improvements in video segmentation and labeling tasks and include these models into our framework. On several action segmentation benchmarks, we obtain an improvement of up to 10% compared to current state-of-the-art methods.
Human mobility patterns have shown significant applications in policy-decision scenarios and economic behavior researches. The human mobility simulation task aims to generate human mobility ...trajectories given a small set of trajectory data, which have aroused much concern due to the scarcity and sparsity of human mobility data. Existing methods mostly rely on the static relationships of locations, while largely neglect the dynamic spatiotemporal effects of locations. On the one hand, spatiotemporal correspondences of visit distributions reveal the spatial proximity and the functionality similarity of locations. On the other hand, the varying durations in different locations hinder the iterative generation process of the mobility trajectory. Therefore, we propose a novel framework to model the dynamic spatiotemporal effects of locations, namely S patio T emporal- A ugmented g R aph neural networks (STAR). The STAR framework designs various spatiotemporal graphs to capture the spatiotemporal correspondences and builds a novel dwell branch to simulate the varying durations in locations, which is finally optimized in an adversarial manner. The comprehensive experiments over four real datasets for the human mobility simulation have verified the superiority of STAR to state-of-the-art methods. Our code is available at https://github.com/Star607/STAR-TKDE .
Recently, a new acoustic model based on deep neural networks (DNN) has been introduced. While the DNN has generated significant improvements over GMM-based systems on several tasks, there has been no ...evaluation of the robustness of such systems to environmental distortion. In this paper, we investigate the noise robustness of DNN-based acoustic models and find that they can match state-of-the-art performance on the Aurora 4 task without any explicit noise compensation. This performance can be further improved by incorporating information about the environment into DNN training using a new method called noise-aware training. When combined with the recently proposed dropout training technique, a 7.5% relative improvement over the previously best published result on this task is achieved using only a single decoding pass and no additional decoding complexity compared to a standard DNN.
Adaptive Sojourn Time HSMM for Heart Sound Segmentation Oliveira, Jorge; Renna, Francesco; Mantadelis, Theofrastos ...
IEEE journal of biomedical and health informatics,
2019-March, 2019-03-00, 2019-3-00, 20190301, Letnik:
23, Številka:
2
Journal Article
Recenzirano
Heart sounds are difficult to interpret due to events with very short temporal onset between them (tens of milliseconds) and dominant frequencies that are out of the human audible spectrum. ...Computer-assisted decision systems may help but they require robust signal processing algorithms. In this paper, we propose a new algorithm for heart sound segmentation using a hidden semi-Markov model. The proposed algorithm infers more suitable sojourn time parameters than those currently suggested by the state of the art, through a maximum likelihood approach. We test our approach over three different datasets, including the publicly available PhysioNet and Pascal datasets. We also release a pediatric dataset composed of 29 heart sounds. In contrast with any other dataset available online, the annotations of the heart sounds in the released dataset contain information about the beginning and the ending of each heart sound event. Annotations were made by two cardiopulmonologists. The proposed algorithm is compared with the current state of the art. The results show a significant increase in segmentation performance, regardless the dataset or the methodology presented. For example, when using the PhysioNet dataset to train and to evaluate the HSMMs, our algorithm achieved average an F-score of <inline-formula><tex-math notation="LaTeX">{\text{92}}\%</tex-math></inline-formula> compared to <inline-formula><tex-math notation="LaTeX">{\text{89}}\%</tex-math></inline-formula> achieved by the algorithm described in D.B. Springer, L. Tarassenko, and G. D. Clifford, "Logistic regressionHSMM-based heart sound segmentation," IEEE Transactions on Biomedical Engineering , vol. 63, no. 4, pp. 822-832, 2016. In this sense, the proposed approach to adapt sojourn time parameters represents an effective solution for heart sound segmentation problems, even when the training data does not perfectly express the variability of the testing data.
Our application requires a keyword spotting system with a small memory footprint, low computational cost, and high precision. To meet these requirements, we propose a simple approach based on deep ...neural networks. A deep neural network is trained to directly predict the keyword(s) or subword units of the keyword(s) followed by a posterior handling method producing a final confidence score. Keyword recognition results achieve 45% relative improvement with respect to a competitive Hidden Markov Model-based system, while performance in the presence of babble noise shows 39% relative improvement.
Inspite the emerging importance of Speech Emotion Recognition (SER), the state-of-the-art accuracy is quite low and needs improvement to make commercial applications of SER viable. A key underlying ...reason for the low accuracy is the scarcity of emotion datasets, which is a challenge for developing any robust machine learning model in general. In this article, we propose a solution to this problem: a multi-task learning framework that uses auxiliary tasks for which data is abundantly available. We show that utilisation of this additional data can improve the primary task of SER for which only limited labelled data is available. In particular, we use gender identifications and speaker recognition as auxiliary tasks, which allow the use of very large datasets, e. g., speaker classification datasets. To maximise the benefit of multi-task learning, we further use an adversarial autoencoder (AAE) within our framework, which has a strong capability to learn powerful and discriminative features. Furthermore, the unsupervised AAE in combination with the supervised classification networks enables semi-supervised learning which incorporates a discriminative component in the AAE unsupervised training pipeline. This semi-supervised learning essentially helps to improve generalisation of our framework and thus leads to improvements in SER performance. The proposed model is rigorously evaluated for categorical and dimensional emotion, and cross-corpus scenarios. Experimental results demonstrate that the proposed model achieves state-of-the-art performance on two publicly available datasets.
Plug-and-play priors (PnP) is a methodology for regularized image reconstruction that specifies the prior through an image denoiser. While PnP algorithms are well understood for denoisers performing ...maximum a posteriori probability (MAP) estimation, they have not been analyzed for the minimum mean squared error (MMSE) denoisers. This letter addresses this gap by establishing the first theoretical convergence result for the iterative shrinkage/thresholding algorithm (ISTA) variant of PnP for MMSE denoisers. We show that the iterates produced by PnP-ISTA with an MMSE denoiser converge to a stationary point of some global cost function. We validate our analysis on sparse signal recovery in compressive sensing by comparing two types of denoisers, namely the exact MMSE denoiser and the approximate MMSE denoiser obtained by training a deep neural net.
The recognition of driver's braking intensity is of great importance for advanced control and energy management for electric vehicles. In this paper, the braking intensity is classified into three ...levels based on novel hybrid unsupervised and supervised learning methods. First, instead of selecting threshold for each braking intensity level manually, an unsupervised Gaussian mixture model is used to cluster the braking events automatically with brake pressure. Then, a supervised Random Forest model is trained to classify the correct braking intensity levels with the state signals of vehicle and powertrain. To obtain a more efficient classifier, critical features are analyzed and selected. Moreover, beyond the acquisition of discrete braking intensity level, a novel continuous observation method is proposed based on artificial neural networks to quantitative analyze and recognize the brake intensity using the prior determined features of vehicle states. Experimental data are collected in an electric vehicle under real-world driving scenarios. Finally, the classification and regression results of the proposed methods are evaluated and discussed. The results demonstrate the feasibility and accuracy of the proposed hybrid learning methods for braking intensity classification and quantitative recognition with various deceleration scenarios.
The process by which new ideas, innovations, and behaviors spread through a large social network can be thought of as a networked interaction game: Each agent obtains information from certain number ...of agents in his friendship neighborhood, and adapts his idea or behavior to increase his benefit. In this paper, we are interested in how opinions, about a certain topic, form in social networks. We model opinions as continuous scalars ranging from 0 to 1 with 1 (0) representing extremely positive (negative) opinion. Each agent has an initial opinion and incurs some cost depending on the opinions of his neighbors, his initial opinion, and his stubbornness about his initial opinion. Agents iteratively update their opinions based on their own initial opinions and observing the opinions of their neighbors. The iterative update of an agent can be viewed as a myopic cost-minimization response (i.e., the so-called best response) to the others’ actions. We study whether an equilibrium can emerge as a result of such local interactions and how such equilibrium possibly depends on the network structure, initial opinions of the agents, and the location of stubborn agents and the extent of their stubbornness. We also study the convergence speed to such equilibrium and characterize the convergence time as a function of aforementioned factors. We also discuss the implications of such results in a few well-known graphs such as Erdos–Renyi random graphs and small-world graphs.