The availability of datasets annotated with verified events by the public is a necessary prerequisite for unleashing the potential of multimodal deep learning for news event detection. Publicly ...available datasets are either incompletely annotated due to expensive cost, or ignore the verifiability of event labels, which are susceptible to bias and errors introduced by a limited number of annotators. In this paper, we provide a YouTube dataset labelled by real-world news events that can be verified by Wikipedia-like crowd sourcing platforms, with the target of advancing temporal event detection. The events in our dataset cover a wide range of event topics including public security, natural disasters, elections, sports, and entertainment events, etc. In the dataset, each sample is labelled with real-world event that is verifiable by the public. We extensively evaluate the performance of 13 state-of-the-art algorithms on our dataset in a temporal manner, involving the multiple relationships between training and testing event labels, and provide a thorough analysis of the findings. The dataset is available at https://github.com/zhengyang5/TED .
Sound event detection (SED) is the task of tagging the absence or presence of audio events and their corresponding interval within a given audio clip. While SED can be done using supervised machine ...learning, where training data is fully labeled with access to per event timestamps and duration, our work focuses on weakly-supervised sound event detection (WSSED), where prior knowledge about an event's duration is unavailable. Recent research within the field focuses on improving segmentand eventlevel localization performance for specific datasets regarding specific evaluation metrics. Specifically, well-performing event-level localization requires fully labeled development subsets to obtain event duration estimates, which significantly benefits localization performance. Moreover, well-performing segment-level localization models output predictions at a coarse-scale (e.g.,1 second), hindering their deployment on datasets containing very short events (<; 1second). This work proposes a duration robust CRNN (CDur) framework, which aims to achieve competitive performance in terms of segmentand event-level localization. This paper proposes a new post-processing strategy named "Triple Threshold" and investigates two data augmentation methods along with a label smoothing method within the scope of WSSED. Evaluation of our model is done on the DCASE2017 and 2018 Task 4 datasets, and URBAN-SED. Our model outperforms other approaches on the DCASE2018 and URBAN-SED datasets without requiring prior duration knowledge. In particular, our model is capable of similar performance to strongly-labeled supervised models on the URBANSED dataset. Lastly, ablation experiments to reveal that without post-processing, our model's localization performance drop is significantly lower compared with other approaches.
The Many Shades of Negativity Zhigang Ma; Xiaojun Chang; Yi Yang ...
IEEE transactions on multimedia,
07/2017, Volume:
19, Issue:
7
Journal Article
Peer reviewed
Complex event detection has been progressively researched in recent years for the broad interest of video indexing and retrieval. To fulfill the purpose of event detection, one needs to train a ...classifier using both positive and negative examples. Current classifier training treats the negative videos as equally negative. However, we notice that many negative videos resemble the positive videos in different degrees. Intuitively, we may capture more informative cues from the negative videos if we assign them fine-grained labels, thus benefiting the classifier learning. Aiming for this, we use a statistical method on both the positive and negative examples to get the decisive attributes of a specific event. Based on these decisive attributes, we assign the fine-grained labels to negative examples to treat them differently for more effective exploitation. The resulting fine-grained labels may be not optimal to capture the discriminative cues from the negative videos. Hence, we propose to jointly optimize the fine-grained labels with the classifier learning, which brings mutual reciprocality. Meanwhile, the labels of positive examples are supposed to remain unchanged. We thus additionally introduce a constraint for this purpose. On the other hand, the state-of-the-art deep convolutional neural network features are leveraged in our approach for event detection to further boost the performance. Extensive experiments on the challenging TRECVID MED 2014 dataset have validated the efficacy of our proposed approach.
Abnormal event detection aims to identify the events that deviate from expected normal patterns. Existing methods usually extract normal spatio-temporal patterns of appearance and motion in a ...separate manner, which ignores low-level correlations between appearance and motion patterns and may fall short of capturing fine-grained spatio-temporal patterns. In this paper, we propose to simultaneously learn appearance and motion to obtain fine-grained spatio-temporal patterns. To this end, we present an adversarial 3D convolutional auto-encoder to learn the normal spatio-temporal patterns and then identify abnormal events by diverging them from the learned normal patterns in videos. The encoder captures the low-level correlations between spatial and temporal dimensions of videos, and generates distinctive features representing visual spatio-temporal information. The decoder reconstrucccts the original video from the encoded features representing by 3D de-convolutions and learns the normal spatio-temporal patterns in an unsupervised manner. We introduce the denoising reconstruction error and adversarial learning strategy to train the 3D convolutional auto-encoder to implicitly learn accurate data distributions that are considered normal patterns, which benefits enhancing the reconstruction ability of the auto-encoder to discriminate abnormal events. Both the theoretical analysis and the extensive experiments on four publicly available datasets demonstrate the effectiveness of our method.
Event detection targets at recognizing and localizing specified spatio-temporal patterns in videos. Most research of human activity recognition in the past decades experimented on relatively clean ...scenes with limited actors performing explicit actions. Recently, more efforts have been paid to the real-world surveillance videos in which the human activity recognition is more challenging due to large variations caused by factors, such as scaling, resolution, viewpoint, cluttered background, and crowdedness. In this paper, we systematically evaluate seven different types of low-level spatio-temporal features in the context of surveillance event detection (SED) using a uniform experimental setup. Fisher vector is employed to aggregate low-level features as the representation of each video clip. A set of random forests is then learned as the classification models. To bridge the research efforts and real-world applications, we utilize the NIST TRECVID SED as our testbed in which seven events are predefined involving different levels of human activity analysis. Strengths and limitations for each low-level feature type are analyzed and discussed.
Zero-shot complex event detection has been an emerging task in coping with the scarcity of labeled training videos in practice. Aiming to progress beyond the state-of-the-art zero-shot event ...detection, we propose a new zero-shot event detection approach, which exploits the semantic correlation between an event and concepts. Based on the concept detectors pre-trained from external sources, our method learns the semantic correlation from the concept vocabulary and emphasizes on the most related concepts for the zero-shot event detection. Particularly, a novel Event-Adaptive Concept Integration algorithm is introduced to estimate the effectiveness of semantically related concepts by assigning different weights to them. As opposed to assigning weights by an invariable strategy, we compute the weights of concepts using the area under score curve. The assigned weights are incorporated into the confidence score vector statistically to better characterize the event-concept correlation. Our algorithm is proved to be able to harness the related concepts discriminatively tailored for a target event. Extensive experiments are conducted on the challenging TRECVID event video datasets, which demonstrate the advantage of our approach over the state-of-the-art methods.
Adverse event detection is critical for many real-world applications including timely identification of product defects, disasters, and major socio-political incidents. In the health context, adverse ...drug events account for countless hospitalizations and deaths annually. Since users often begin their information seeking and reporting with online searches, examination of search query logs has emerged as an important detection channel. However, search context - including query intent and heterogeneity in user behaviors - is extremely important for extracting information from search queries, and yet the challenge of measuring and analyzing these aspects has precluded their use in prior studies. We propose DeepSAVE, a novel deep learning framework for detecting adverse events based on user search query logs. DeepSAVE uses an enriched variational autoencoder encompassing a novel query embedding and user modeling module that work in concert to address the context challenge associated with search-based detection of adverse events. Evaluation results on three large real-world event datasets show that DeepSAVE outperforms existing detection methods as well as comparison deep learning auto encoders. Ablation analysis reveals that each component of DeepSAVE significantly contributes to its overall performance. Collectively, the results demonstrate the viability of the proposed architecture for detecting adverse events from search query logs.
Detecting events in real-time from the Twitter data stream has gained substantial attention in recent years from researchers around the world. Different event detection approaches have been proposed ...as a result of these research efforts. One of the major challenges faced in this context is the high computational cost associated with event detection in real-time. We propose, TwitterNews+, an event detection system that incorporates specialized inverted indices and an incremental clustering approach to provide a low computational cost solution to detect both major and minor newsworthy events in real-time from the Twitter data stream. In addition, we conduct an extensive parameter sensitivity analysis to fine-tune the parameters used in TwitterNews+ to achieve the best performance. Finally, we evaluate the effectiveness of our system using a publicly available corpus as a benchmark dataset. The results of the evaluation show a significant improvement in terms of recall and precision over five state-of-the-art baselines we have used.
Line trip events widely exist in power systems. They can result in power outages and a huge economic loss if not promptly detected and localized. To provide a fast and precise solution, this article ...presents a Complete Coverage of Voltage Measurement (CCVM)-based line trip event detection algorithm and a Relative Phase Angle (RPA)-based line trip event localization algorithm. First, frequency and relative phase angle features during a line trip event are calculated. Then, the CCVM-based algorithm is proposed from both frequency and rate of change of frequency estimation algorithm aspects. Additionally, the RPA-based algorithm is presented, and two cases are studied to demonstrate the uniqueness of the proposed algorithm. Various experiments are conducted, where the simulation results demonstrate that the proposed CCVM-based algorithm can detect a line trip event as short as 2.07 ms. In addition, the RPA-based algorithm has 1.26 times higher localization accuracy compared with the frequency magnitude-based and phase angle-based algorithms. The experiment results on the examples from two interconnected power systems in the U.S. verified the performance of the proposed algorithms in wide-area power systems.
CYGNO is an international collaboration with the aim of operating a ▪ optical time projection chamber (TPC) for directional Dark Matter (DM) searches and solar neutrino spectroscopy, to be deployed ...at the Laboratori Nazionali del Gran Sasso (LNGS). A ▪/▪ (60/40) mixture is used, along with a triple Gas Electron Multiplier (GEM) cascade to amplify the ionisation signal. The scintillation produced in the electron avalanches is read out using a scientific complementary metal–oxide–semiconductor (sCMOS) camera. This solution has proven to provide very high sensitivity to interactions in the few ▪ energy range. The inclusion of a hydrogen-based gas will offer an even lighter target, resulting in a more efficient energy transfer in a DM particle collision, and consequently, a lower detection threshold. Additionally, longer track lengths of light nuclear recoils are easier to detect with a clearer direction. However, the addition of such gas will contribute to quenching the scintillation, jeopardizing the TPC performance. In this work, we demonstrate the feasibility of adding 1% to 5% isobutane to the ▪/▪ (60/40) mixture by measuring the respective absolute scintillation yield output. The overall scintillation produced in the charge avalanches is not drastically suppressed by quenching due to the isobutane addition. The presence of Penning transfer from excited He atoms to isobutane molecules increases the number of electrons in the avalanches, partially compensating for the loss of scintillation due to quenching. For the highest applied GEM voltage, the total number of photons produced in the avalanche per ▪ deposited in the absorption region presents a decrease of only a factor of about three, from 2.30(20)×104 to 8.2(4)×103▪, as the isobutane content increases from 0 to 5%. The quantification of the visible component of the scintillation shows that isobutane quenches both visible and ultraviolet (UV) photons emitted by ▪/▪.