Audio event recognition, the human-like ability to identify and relate sounds from audio, is a nascent problem in machine perception. Comparable problems such as object detection in images have ...reaped enormous benefits from comprehensive datasets - principally ImageNet. This paper describes the creation of Audio Set, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research. Using a carefully structured hierarchical ontology of 632 audio classes guided by the literature and manual curation, we collect data from human labelers to probe the presence of specific audio classes in 10 second segments of YouTube videos. Segments are proposed for labeling using searches based on metadata, context (e.g., links), and content analysis. The result is a dataset of unprecedented breadth and size that will, we hope, substantially stimulate the development of high-performance audio event recognizers.
In many applications of Internet of Things (IoT), the huge amount of data are generated by sensor nodes and processing them are complex. Offloading data classification and anomaly event detection ...tasks to sink nodes in sensor networks can reduce the computing complexity, lower remote communication loads, and improve the response time for the delay-sensitive IoT applications. Many existing classification and anomaly detection methods cannot be directly applied to these IoT applications, because the computing and energy resources of sensors are limited. In this paper, a new feature-based learning system for IoT applications is proposed to effectively classify data and detect anomaly event. Especially, based on the theory of distributed compression, the sparsity and relativity of the data are exploited to obtain the classification features, which can reduce the computation overhead and energy consumption. Further, an RBF-BP hybrid neural network is employed to detect the anomaly event based on the classification results, by which the training time of neural network can be significantly reduced and the accuracy can be improved for users' decisions.
This article proposes a nonlinear distributed cooperative control scheme that can regulate the power output to achieve efficient utilization of renewable energy in ac microgirds, which ensures ...mean-square autonomous proportional power sharing over a nonlinear microgird system via a sparse cyber network subject to noisy disturbance and limited bandwidth constraints. The cyber networks are exposed to noisy disturbances and limited bandwidth constraints that terribly reduce the stability and quality of the whole system. To eliminate the adverse effects of noisy disturbances and limited bandwidth constraints, we propose a robust distributed control strategy designed by using partial feedback linearization for the dynamical nonlinear model of a microgrid system. Moreover, a distributed event detection mechanism with noise-dependent threshold is adopted to update the control signals with the consideration of unnecessary data communication reduction. Through adopting stochastic stability theory and Lyapunov function, the stability and convergence analysis of the proposed dynamic distributed event-detection conditions considering noise interferences is derived. As a result, the suggested method decreases the sensitivity of the system to failures and increases its reliability. Finally, a modified IEEE 34-bus test system in MATLAB/Simulink is utilized to verify the effectiveness of the proposed controller design scheme.
Home energy management requires accurate information about the appliances' consumption pattern. This information can help consumers save energy, control their usage by shifting their usage to ...off-peak hours and reduce their electricity costs. Non-intrusive load monitoring (NILM) in which the power consumption profile of appliances are extracted from the aggregated signal of a household, provides this information. For the NILM problem, machine learning approaches as the training-based solutions require large training datasets for an accurate disaggregation and the optimization-based approaches employs prior information about the characteristics of appliances. This paper proposes a novel event-based optimization algorithm. In its first stage, the prior information about appliances is extracted from the events of the consumption profiles of appliances by means of clustering. Then, a new event-based down-sampling method and transition filtering are designed for decreasing the computation time of optimization. At the last stage of the proposed algorithm, post-processing considering ON duration of appliances and varying states are proposed to increase the accuracy of the power profile reconstruction. The proposed approach was successfully tested for the low-frequency dataset of a house from the REDD. Numerical results show the advantages of the proposed algorithm, marked improvement over classification-based NILM considering small training dataset and its applicability in disaggregating the power consumption measured by the smart meter.
Sound Event Detection in the DCASE 2017 Challenge Mesaros, Annamaria; Diment, Aleksandr; Elizalde, Benjamin ...
IEEE/ACM transactions on audio, speech, and language processing,
06/2019, Volume:
27, Issue:
6
Journal Article
Peer reviewed
Open access
Each edition of the challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) contained several tasks involving sound event detection in different setups. DCASE 2017 presented ...participants with three such tasks, each having specific datasets and detection requirements: Task 2, in which target sound events were very rare in both training and testing data, Task 3 having overlapping events annotated in real-life audio, and Task 4, in which only weakly labeled data were available for training. In this paper, we present three tasks, including the datasets and baseline systems, and analyze the challenge entries for each task. We observe the popularity of methods using deep neural networks, and the still widely used mel frequency-based representations, with only few approaches standing out as radically different. Analysis of the systems behavior reveals that task-specific optimization has a big role in producing good performance; however, often this optimization closely follows the ranking metric, and its maximization/minimization does not result in universally good performance. We also introduce the calculation of confidence intervals based on a jackknife resampling procedure to perform statistical analysis of the challenge results. The analysis indicates that while the 95% confidence intervals for many systems overlap, there are significant differences in performance between the top systems and the baseline for all tasks.
Semantic attributes have been increasingly used the past few years for multimedia event detection (MED) with promising results. The motivation is that multimedia events generally consist of lower ...level components such as objects, scenes, and actions. By characterizing multimedia event videos with semantic attributes, one could exploit more informative cues for improved detection results. Much existing work obtains semantic attributes from images, which may be suboptimal for video analysis since these image-inferred attributes do not carry dynamic information that is essential for videos. To address this issue, we propose to learn semantic attributes from external videos using their semantic labels. We name them video attributes in this paper. In contrast with multimedia event videos, these external videos depict lower level contents such as objects, scenes, and actions. To harness video attributes, we propose an algorithm established on a correlation vector that correlates them to a target event. Consequently, we could incorporate video attributes latently as extra information into the event detector learnt from multimedia event videos in a joint framework. To validate our method, we perform experiments on the real-world large-scale TRECVID MED 2013 and 2014 data sets and compare our method with several state-of-the-art algorithms. The experiments show that our method is advantageous for MED.
The availability of datasets annotated with verified events by the public is a necessary prerequisite for unleashing the potential of multimodal deep learning for news event detection. Publicly ...available datasets are either incompletely annotated due to expensive cost, or ignore the verifiability of event labels, which are susceptible to bias and errors introduced by a limited number of annotators. In this paper, we provide a YouTube dataset labelled by real-world news events that can be verified by Wikipedia-like crowd sourcing platforms, with the target of advancing temporal event detection. The events in our dataset cover a wide range of event topics including public security, natural disasters, elections, sports, and entertainment events, etc. In the dataset, each sample is labelled with real-world event that is verifiable by the public. We extensively evaluate the performance of 13 state-of-the-art algorithms on our dataset in a temporal manner, involving the multiple relationships between training and testing event labels, and provide a thorough analysis of the findings. The dataset is available at https://github.com/zhengyang5/TED .
The detection of driving events could be useful for reducing accidents, fleet management and insurance premiums etc. Currently, top of the range vehicles and large fleets employ expensive driver ...monitoring systems. However, most drivers do not have access to such systems. The required monitoring platform would have to deliver the required performance while also being affordable and accessible. A candidate with considerable promise is the smartphone with sensors built-in that could be exploited for the detection of driving events. However, to date it has not been possible to achieve the required correct, missed and false detection rates in addition to the computational efficiency for real-time operations. This paper proposes a novel bagging tree and dynamic time warping (DTW) integrated algorithm for the detection of driving events employing acceleration and orientation data from a smartphone's low cost three-axis accelerometers and gyroscopes. The bagging tree-based machine learning algorithm provides the initial maneuver detection results, as well as the location of the event start and end points. Event detection is then achieved by calculating the similarity of the results predicted through the bagging tree algorithm with the corresponding templates extracted from the experience datasets, while also applying a number of constraints to verify the calculated results. Field test results show that the proposed integrated algorithm is superior to the state-of-the-art, achieving a high correct detection accuracy of 97.5%, a low missed detection of 2.5% and a false detection rate of 2.9%. The corresponding results for the best alternative candidate method are 90.2%, 9.8% and 11.7%. Furthermore, the improvement in computational efficiency offered by our proposed approach is three to more than ten times greater than that of the other state-of-the-art algorithms.
During action recognition in videos, irrelevant motions in the background can greatly degrade the performance of recognizing specific actions with which we actually concern ourself here. In this ...paper, a novel deep neural network, called factorized action-scene network (FASNet), is proposed to encode and fuse the most relevant and informative semantic cues for action recognition. Specifically, we decompose the FASNet into two components. One is a newly designed encoding network, named content attention network (CANet), which encodes local spatial-temporal features to learn the action representations with good robustness to the noise of irrelevant motions. The other is a fusion network, which integrates the pretrained CANet to fuse the encoded spatial-temporal features with contextual scene feature extracted from the same video, for learning more descriptive and discriminative action representations. Moreover, different from the existing deep learning based tasks for generic action recognition, which applies softmax loss function as the training guidance, we formulate two loss functions for guiding the proposed model to accomplish more specific action recognition tasks, i.e., the multilabel correlation loss for multilabel action recognition and the triplet loss for complex event detection. Extensive experiments on the Hollywood2 dataset and the TRECVID MEDTest 14 dataset show that our method achieves superior performance compared with the state-of-the-art methods.