Sound event detection (SED) is a task to detect sound events in an audio recording. One challenge of the SED task is that many datasets such as the Detection and Classification of Acoustic Scenes and ...Events (DCASE) datasets are weakly labelled. That is, there are only audio tags for each audio clip without the onset and offset times of sound events. To address the weakly labelled SED problem, we investigate segment-wise training and clip-wise training methods. The proposed systems are based on the variants of convolutional neural networks (CNNs) including convolutional recurrent neural networks and our proposed CNN-transformers for audio tagging and sound event detection. Another challenge of SED is that only the presence probabilities of sound events are predicted and thresholds are required to predict the presence or absence of sound events. Previous work set this threshold empirically which is not an optimised solution. To solve this problem, we propose an automatic threshold optimization method. The first stage is to optimize the system with respect to metrics that do not depend on the thresholds such as mean average precision (mAP). The second stage is to optimize the thresholds with respect to the metric that depends on those thresholds. This proposed automatic threshold optimization system achieved state-of-the-art audio tagging and SED F1 score of 0.646, 0.584, outperforming the performance with best manually selected thresholds of 0.629 and 0.564, respectively.
In many applications of Internet of Things (IoT), the huge amount of data are generated by sensor nodes and processing them are complex. Offloading data classification and anomaly event detection ...tasks to sink nodes in sensor networks can reduce the computing complexity, lower remote communication loads, and improve the response time for the delay-sensitive IoT applications. Many existing classification and anomaly detection methods cannot be directly applied to these IoT applications, because the computing and energy resources of sensors are limited. In this paper, a new feature-based learning system for IoT applications is proposed to effectively classify data and detect anomaly event. Especially, based on the theory of distributed compression, the sparsity and relativity of the data are exploited to obtain the classification features, which can reduce the computation overhead and energy consumption. Further, an RBF-BP hybrid neural network is employed to detect the anomaly event based on the classification results, by which the training time of neural network can be significantly reduced and the accuracy can be improved for users' decisions.
This article proposes a nonlinear distributed cooperative control scheme that can regulate the power output to achieve efficient utilization of renewable energy in ac microgirds, which ensures ...mean-square autonomous proportional power sharing over a nonlinear microgird system via a sparse cyber network subject to noisy disturbance and limited bandwidth constraints. The cyber networks are exposed to noisy disturbances and limited bandwidth constraints that terribly reduce the stability and quality of the whole system. To eliminate the adverse effects of noisy disturbances and limited bandwidth constraints, we propose a robust distributed control strategy designed by using partial feedback linearization for the dynamical nonlinear model of a microgrid system. Moreover, a distributed event detection mechanism with noise-dependent threshold is adopted to update the control signals with the consideration of unnecessary data communication reduction. Through adopting stochastic stability theory and Lyapunov function, the stability and convergence analysis of the proposed dynamic distributed event-detection conditions considering noise interferences is derived. As a result, the suggested method decreases the sensitivity of the system to failures and increases its reliability. Finally, a modified IEEE 34-bus test system in MATLAB/Simulink is utilized to verify the effectiveness of the proposed controller design scheme.
Home energy management requires accurate information about the appliances' consumption pattern. This information can help consumers save energy, control their usage by shifting their usage to ...off-peak hours and reduce their electricity costs. Non-intrusive load monitoring (NILM) in which the power consumption profile of appliances are extracted from the aggregated signal of a household, provides this information. For the NILM problem, machine learning approaches as the training-based solutions require large training datasets for an accurate disaggregation and the optimization-based approaches employs prior information about the characteristics of appliances. This paper proposes a novel event-based optimization algorithm. In its first stage, the prior information about appliances is extracted from the events of the consumption profiles of appliances by means of clustering. Then, a new event-based down-sampling method and transition filtering are designed for decreasing the computation time of optimization. At the last stage of the proposed algorithm, post-processing considering ON duration of appliances and varying states are proposed to increase the accuracy of the power profile reconstruction. The proposed approach was successfully tested for the low-frequency dataset of a house from the REDD. Numerical results show the advantages of the proposed algorithm, marked improvement over classification-based NILM considering small training dataset and its applicability in disaggregating the power consumption measured by the smart meter.
•The authors propose an audio events detection system tailored to surveillance applications.•The method has been tested on a huge and challenging data set, made publicly available.•The performance ...analysis has been done for low SNR values and under various conditions.•A comparative analysis with other methods from the literature has been performed.
In this paper we propose a novel method for the detection of audio events for surveillance applications. The method is based on the bag of words approach, adapted to deal with the specific issues of audio surveillance: the need to recognize both short and long sounds, the presence of a significant noise level and of superimposed background sounds of intensity comparable to the audio events to be detected. In order to test the proposed method in complex, realistic scenarios, we have built a large, publicly available dataset of audio events. The dataset has allowed us to evaluate the robustness of our method with respect to varying levels of the Signal-to-Noise Ratio; the experimentation has confirmed its applicability under real world conditions, and has shown a significant performance improvement with respect to other methods from the literature.
A Survey of Single-Scene Video Anomaly Detection Ramachandra, Bharathkumar; Jones, Michael J.; Vatsavai, Ranga Raju
IEEE transactions on pattern analysis and machine intelligence,
05/2022, Volume:
44, Issue:
5
Journal Article
Peer reviewed
Open access
This article summarizes research trends on the topic of anomaly detection in video feeds of a single scene. We discuss the various problem formulations, publicly available datasets and evaluation ...criteria. We categorize and situate past research into an intuitive taxonomy and provide a comprehensive comparison of the accuracy of many algorithms on standard test sets. Finally, we also provide best practices and suggest some possible directions for future research.
Sound Event Detection in the DCASE 2017 Challenge Mesaros, Annamaria; Diment, Aleksandr; Elizalde, Benjamin ...
IEEE/ACM transactions on audio, speech, and language processing,
06/2019, Volume:
27, Issue:
6
Journal Article
Peer reviewed
Open access
Each edition of the challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) contained several tasks involving sound event detection in different setups. DCASE 2017 presented ...participants with three such tasks, each having specific datasets and detection requirements: Task 2, in which target sound events were very rare in both training and testing data, Task 3 having overlapping events annotated in real-life audio, and Task 4, in which only weakly labeled data were available for training. In this paper, we present three tasks, including the datasets and baseline systems, and analyze the challenge entries for each task. We observe the popularity of methods using deep neural networks, and the still widely used mel frequency-based representations, with only few approaches standing out as radically different. Analysis of the systems behavior reveals that task-specific optimization has a big role in producing good performance; however, often this optimization closely follows the ranking metric, and its maximization/minimization does not result in universally good performance. We also introduce the calculation of confidence intervals based on a jackknife resampling procedure to perform statistical analysis of the challenge results. The analysis indicates that while the 95% confidence intervals for many systems overlap, there are significant differences in performance between the top systems and the baseline for all tasks.
Semantic attributes have been increasingly used the past few years for multimedia event detection (MED) with promising results. The motivation is that multimedia events generally consist of lower ...level components such as objects, scenes, and actions. By characterizing multimedia event videos with semantic attributes, one could exploit more informative cues for improved detection results. Much existing work obtains semantic attributes from images, which may be suboptimal for video analysis since these image-inferred attributes do not carry dynamic information that is essential for videos. To address this issue, we propose to learn semantic attributes from external videos using their semantic labels. We name them video attributes in this paper. In contrast with multimedia event videos, these external videos depict lower level contents such as objects, scenes, and actions. To harness video attributes, we propose an algorithm established on a correlation vector that correlates them to a target event. Consequently, we could incorporate video attributes latently as extra information into the event detector learnt from multimedia event videos in a joint framework. To validate our method, we perform experiments on the real-world large-scale TRECVID MED 2013 and 2014 data sets and compare our method with several state-of-the-art algorithms. The experiments show that our method is advantageous for MED.
The availability of datasets annotated with verified events by the public is a necessary prerequisite for unleashing the potential of multimodal deep learning for news event detection. Publicly ...available datasets are either incompletely annotated due to expensive cost, or ignore the verifiability of event labels, which are susceptible to bias and errors introduced by a limited number of annotators. In this paper, we provide a YouTube dataset labelled by real-world news events that can be verified by Wikipedia-like crowd sourcing platforms, with the target of advancing temporal event detection. The events in our dataset cover a wide range of event topics including public security, natural disasters, elections, sports, and entertainment events, etc. In the dataset, each sample is labelled with real-world event that is verifiable by the public. We extensively evaluate the performance of 13 state-of-the-art algorithms on our dataset in a temporal manner, involving the multiple relationships between training and testing event labels, and provide a thorough analysis of the findings. The dataset is available at https://github.com/zhengyang5/TED .