Drones pose a hidden threat to public safety, and effective and accurate detection technology for the drone that exist in the environment is imminent, in which how to weaken the influence of target ...sound source localization deviation and strong background interference noise on the detection task is the key to improve the accuracy of drone sound event detection. In this paper, a method has been proposed to improve the accuracy of drone acoustic event detection, named as the linear shrinkage-subspace projection-power spectral density filter method (LSP). This method mainly including covariance matrix reconstruction, steering vector recalibration, and filter coefficient redesign method. Firstly, based on Minimum Variance Distortionless Response, the linear shrinkage method is used to suppress the interference and noise components in the signal plus interference covariance matrix, and the sample covariance matrix is reconstructed to eliminate background interference noise. Then, the correlation between the steering vector and the eigenvector is used to eliminate the angle correlation term, and the subspace projection method is combined to recalibrate the steering vector, so as to improve the ability of the beamforming method to resist the angle deviation and realize the correction of the target source positioning deviation. Next, a redesign method for wiener filter coefficients based on the estimated power spectral density is used to further weaken background interference noise. In order to verify the accuracy of the proposed method, a complete drone sound event detection system is constructed by combining the deep learning drone sound event detection classifier, and the evaluation is carried out according to different angular deviations and interference sound distances. In addition, a new evaluation criterion is proposed, named as the Machine-Human Extreme Hearing Distance Rate (MHDR), which analogizes the system's detection ability with the ear's auditory detection ability. The research results of this article indicate that the detection accuracy of the detection system shows satisfactory accuracy when the proposed method is applied to circular microphone array, that improved by 15.12 % compared to existing methods. The proposed method improves the detection accuracy of the drone acoustic event detection task under the influence of sound source position deviation and strong background speech interference, and provides a reference for the development of anti-drone technology.
Display omitted
•The linear shrinkage method overcomes the influence of strong background noise on the drone detection task.•The subspace projection method eliminates the impact of the angle deviation of the target source.•The improved wiener filter redesigned by signal power spectral densities further suppress strong interference.•The proposed LSP can achieve high detection accuracy with larger mismatches or strong interferences.•The proposed MHDR provides a more quantitative description for the drone sound event detection.
Anomalous event detection in surveillance videos is a challenging and practical research problem among image and video processing community. Compared to the frame-level annotations of anomalous ...events, obtaining video-level annotations is quite fast and cheap though such high-level labels may contain significant noise. More specifically, an anomalous labeled video may actually contain anomaly only in a short duration while the rest of the video frames may be normal. In the current work, we propose a weakly supervised anomaly detection framework based on deep neural networks which is trained in a self-reasoning fashion using only video-level labels. To carry out the self-reasoning based training, we generate pseudo labels by using binary clustering of spatio-temporal video features which helps in mitigating the noise present in the labels of anomalous videos. Our proposed formulation encourages both the main network and the clustering to complement each other in achieving the goal of more accurate anomaly detection. The proposed framework has been evaluated on publicly available real-world anomaly detection datasets including UCF-crime, ShanghaiTech and UCSD Ped2. The experiments demonstrate superiority of our proposed framework over the current state-of-the-art methods.
We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models. To improve ...the fidelity of our student, we distill the low-resolution anomaly maps of the teachers by jointly applying standard and adversarial distillation, introducing an adversarial discriminator for each teacher to distinguish between target and generated anomaly maps. We conduct experiments on three benchmarks (Avenue, ShanghaiTech, UCSD Ped2), showing that our method is over 7 times faster than the fastest competing method, and between 28 and 62 times faster than object-centric models, while obtaining comparable results to recent methods. Our evaluation also indicates that our model achieves the best trade-off between speed and accuracy, due to its previously unheard-of speed of 1480 FPS. In addition, we carry out a comprehensive ablation study to justify our architectural design choices. Our code is freely available at: https://github.com/ristea/fast-aed.
Display omitted
•We introduce a novel teacher-student framework for anomaly detection in video.•We learn to detect anomalies by distilling from multiple highly accurate object-level teachers.•We propose adversarial knowledge distillation in the context of anomaly detection.•To increase the speed, we replace fully connected layers with pointwise convolutions.
Real-time situational awareness and event analysis are crucial to the security of the modern power grid, which is a complicated nonlinear system and hard to be completely modeled. Massive data is ...collected but the information hasn't been sufficiently leveraged. To effectively extract the event features, this paper proposes a framework for event detection, localization, and classification in power grids based on semi-supervised learning. Specifically, event detection is realized by invertible neural network (INN), hence to learn complex distributions of real-world measurements in a flexible way. Abundant normal measurements are learned by INN and explicit log-likelihoods then serve as the indicator to distinguish events with adequate sensitivity. Moreover, risks induced by events are assessed and spatial locations are determined. Since the majority of power system events are recorded without labels in practice, a pseudo label (PL) technique is leveraged to classify events with limited labels. The PL-based approach has an enhanced separating capability for events and outperforms other approaches under a low labeling rate. Case studies with simulated data in the IEEE 39-bus system and real-world measurements verify the effectiveness of the proposed framework.
Gait event detection is an essential approach to execute accurate gait recognition, and many studies use portable and reliable inertial measurement units (IMUs) for gait event detection. The popular ...methods mainly pay attention to the rules of specific signals or build the machine learning models when the event occurs, both of which overlook the consideration of the differences in characteristics coupled by multiple inputs. In this article, we propose a method based on fuzzy logic to quantify the event possibility and use it to detect gait events through the angular velocities and accelerations of lower limbs measured by IMUs. The proposed method identifies the event when heel and toe contact or leave the ground, making full use of the distribution characteristics of all extracted inputs without complex calculation. The mean absolute time differences between the detection and actual event in the recognition of heel strike (HS), toe strike (TS), heel off (HO), and toe off (TO) are 34, 23, 28, and 38 ms, respectively, in walking. We aim to propose an analysis method and provide some reference for gait recognition of assisted walking exoskeleton robots for healthy individuals, such as soldiers and workers.
Gait analysis serves as a pivotal tool in identifying abnormalities associated with various disorders. Recently, inertial measurement units (IMUs) have emerged as a feasible tool, showing promising ...results for continuous gait monitoring. However, current gait analysis algorithms often overlook the importance of sensor placement and corresponding motion characteristics. Moreover, there has been limited effort to tailor gait analysis algorithms for optimal performance with sensors placed in specific locations. In response to this, we propose a novel gait analysis algorithm designed for heel-mounted IMUs. Our algorithm employs refined methods to accurately assess heel dynamics and calculate a comprehensive range of spatiotemporal gait parameters and parameters related to symmetry and variability. Experiments with straight walking and daily activities simulation were performed and an optical motion capture (OMC) system was used as a reference system. The results demonstrated strong correlation ( r > 0.9) and good agreement with common gait parameters even in daily conditions (stride length -0.009 ± 0.055 m, stride time -0.002 ± 0.023 s and walking speed -0.004 ± 0.048 m/s). All spatiotemporal gait parameters exhibit high reliability, as indicated by a minimum intraclass correlation coefficient (ICC) of 0.921. The findings affirm the potential of the proposed algorithm to perform daily gait analysis and monitoring task, offering a reliable tool for professionals in the field. By addressing the shortcomings of existing algorithms and focusing on the heel, our approach contributes to the advancement of gait analysis, paving the way for more accessible and accurate gait assessment methods in real life.
Sound event detection (SED) methods are tasked with labeling segments of audio recordings by the presence of active sound sources. SED is typically posed as a supervised machine learning problem, ...requiring strong annotations for the presence or absence of each sound source at every time instant within the recording. However, strong annotations of this type are both labor- and cost-intensive for human annotators to produce, which limits the practical scalability of SED methods. In this paper, we treat SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality. The models, however, must still produce temporally dynamic predictions, which must be aggregated (pooled) when comparing against static labels during training. To facilitate this aggregation, we develop a family of adaptive pooling operators - referred to as autopool - which smoothly interpolate between common pooling operators, such as min-, max-, or average-pooling, and automatically adapt to the characteristics of the sound sources in question. We evaluate the proposed pooling operators on three datasets, and demonstrate that in each case, the proposed methods outperform nonadaptive pooling operators for static prediction, and nearly match the performance of models trained with strong, dynamic annotations. The proposed method is evaluated in conjunction with convolutional neural networks, but can be readily applied to any differentiable model for time-series label prediction. While this paper focuses on SED applications, the proposed methods are general, and could be applied widely to MIL problems in any domain.
Objective: In this paper, we accurately detect the state-sequence first heart sound (S1)-systole-second heart sound (S2)-diastole, i.e., the positions of S1 and S2, in heart sound recordings. We ...propose an event detection approach without explicitly incorporating a priori information of the state duration. This renders it also applicable to recordings with cardiac arrhythmia and extendable to the detection of extra heart sounds (third and fourth heart sound), heart murmurs, as well as other acoustic events. Methods: We use data from the 2016 PhysioNet/CinC Challenge, containing heart sound recordings and annotations of the heart sound states. From the recordings, we extract spectral and envelope features and investigate the performance of different deep recurrent neural network (DRNN) architectures to detect the state sequence. We use virtual adversarial training, dropout, and data augmentation for regularization. Results: We compare our results with the state-of-the-art method and achieve an average score for the four events of the state sequence of {\bf F}_{1}\approx 96% on an independent test set. Conclusion: Our approach shows state-of-the-art performance carefully evaluated on the 2016 PhysioNet/CinC Challenge dataset. Significance: In this work, we introduce a new methodology for the segmentation of heart sounds, suggesting an event detection approach with DRNNs using spectral or envelope features.