Automatic violence detection in video is a meaningful yet challenging task. Violent actions can be characterized both by intense sequential frames and by continuous spatial moves, imposing more ...complexity than other human actions. However, most existing approaches focus on general spatiotemporal features with local convolution and ignore the full temporal inference based on violence characteristics. To this end, we propose a novel full temporal cross fusion network (FTCF Net) to investigate an effective inference way for violence detection. Specifically, we design two essential components in each FTCF block: a spatial processor and a temporal processor by neural networks. The former is to capture the local structural features of each frame by a 3D CNN with a (3×3×1) filter to infer the continuous spatial moves, while the latter is to perform the cross-frame feature interaction step by step for each channel by a group of processing units to infer the intense and wide variation of violence in full temporal. The two branches are fused at the end of each FTCF block in the FTCF Net efficiently. We conduct extensive experiments on four benchmark datasets: Hockey Fight, Movie Fight, Violent Flow, and Real-life Violence Situations, and the experimental results show that FTCF Net outperforms 20 comparison methods in terms of predictive accuracy. The accuracy goes up to 99.5%, 100.0%, 98.0% and 98.5% in the four datasets respectively, validating the effectiveness of our proposed approach for violence detection. Moreover, the approach proposed in this paper obtains relative steady prediction performance superior to existing methods under different scale of training sets. We hope this work to be a baseline of violence detection, and the whole original codes and pre-trained weights are publicly available at
https://github.com/TAN-OpenLab/FTCF-NET
.
Spatiotemporal modeling is key for action recognition in videos. In this paper, we propose a Spatial features Compression and Temporal features Fusion (SCTF) block, including a Local Spatial features ...Compression (LSC) module and a Full Temporal features Fusion (FTF) module, we call the network equipped with SCTF block SCTF-NET, which is a human action recognition network more suitable for violent video detection. The spatial extraction and temporal fusions in previous works are typically achieved by stacking large numbers of convolution layers or adding some complex recurrent neural layers. In contrast, the SCTF module extracts the spatial information of video frames by LSC module, and the temporal sequence information of continuous frames is fused by FTF module, which can effectively conduct spatiotemporal modeling. Finally, our approach achieves good performance on action recognition benchmarks such as HMDB51 and UCF101. Meanwhile, it is more efficient in training and detection. What’s more, experiments on violence datasets Hockey Fights, Movie Fight and Violent Flow show that, our proposed SCTF block is more suitable for violent action recognition. Our code is available at
https://github.com/TAN-OpenLab/SCTF-Net
.
The fault diagnosis of rolling bearings is a critical technique to realize predictive maintenance for mechanical condition monitoring. In real industrial systems, the main challenges for the fault ...diagnosis of rolling bearings pertain to the accuracy and real-time requirements. Most existing methods focus on ensuring the accuracy, and the real-time requirement is often neglected. In this paper, considering both requirements, we propose a novel fast fault diagnosis method for rolling bearings, based on extreme learning machine (ELM) and logistic mapping, named logistic-ELM. First, we identify 14 kinds of time-domain features from the original vibration signals according to mechanical vibration principles and adopt the sequential forward selection strategy to select optimal features from them to ensure the basic predictive accuracy and efficiency. Next, we propose the logistic-ELM for fast fault classification, where the biases in ELM are omitted and the random input weights are replaced by the chaotic logistic mapping sequence which involves a higher uncorrelation to obtain more accurate results with fewer hidden neurons. We conduct extensive experiments on the rolling bearing vibration signal dataset of the Case Western Reserve University bearing data centre. The experimental results show that the proposed approach outperforms existing state-of-the-art comparison methods in terms of the predictive accuracy, and the highest accuracies are 100%, 99.71%, 98%, 100%, 100%, and 100%, respectively, in seven separate sub data environments. Moreover, in terms of the runtime cost, the experimental results indicate that the proposed logistic-ELM can predict the fault in 40 ms with a high accuracy, up to 21-1858 times more rapid than existing methods based on support vector machine, convolutional neural network and multi-scale entropy. Other experiments of fault diagnosis of the rolling bearings under four different loads also indicate that the logistic-ELM can adapt to different operation conditions with high efficiency. The relevant code is publicly available at
https://github.com/TAN-OpenLab/logistic-ELM
.
During online social networks (OSNs), popularity prediction uncovers the final size of online content based on the observed cascade, which has been the critical technology for online recommendation, ...viral marketing, and rumor detection. Recently, representation learning could help to infer the mapping between the dynamic cascade and the final popularity efficiently, and has been a new research paradigm for popularity prediction. However, those methods are vulnerable to structure disturbance when lack of fine-grained supervision, as only the dynamic cascade is used. Therefore, we propose a novel trend and cascade based spatiotemporal evolution network (TCSE-Net), which preserves the distinguishable structure pattern while eliminating potential noise, via aligning and fusing the temporal popularity and cascade. To be specific, we first leveraged the Long-Short Term Memory (LSTM) and recurrent graph convolutional network (GCN) to learn the trend representation and the corresponding cascade representation respectively. Meanwhile, we represent node with it’s layer, thereby the hierarchy is preserved in cascade representation through GCN. Then, both trend and cascade representations are aligned in time sequence and selectively assembled by a set of shared parameters for popularity prediction. The extensive experimental results show that our TCSE-Net outperforms state-of-the-art baselines on two real datasets. Related code will be publicly available on
https://github.com/TAN-OpenLab/TCSE-Net
.
In recent multi-speaker speech separation researches, the overall deep-learning-based architecture consists of three parts: encoder, separator, and decoder. But improvement strategies generally only ...focus on the separator in the middle, regardless of its input. The most common encoder structure at present is a single 1D convolution layer followed by a nonlinear activation function, ReLU. In this paper, we firstly propose a new encoder named Attention DE, trying to improve the input effectiveness of the separator. The new encoder adds extra 1D convolutional layers and the multi-head attention mechanism to enhance the feature aggregation ability of input speech. Secondly, instead of RNNs, our separator uses SepFormer Blocks to improve the training efficiency and learn the speech sequence patterns better. Experiments show that the Attention DE is generally applicable to improve the performance of the single-channel speech separation model based on the time domain. The method of Attention DE fusion SepFormer blocks achieves an advanced SI-SNRi of 20.3dB on WSJ0-2MIX. Code is publicly available at https://github.com/TAN-OpenLab/AttentionDE.
Speaker recognition using i-vector has been replaced by speaker recognition using deep learning. Speaker recognition based on Convolutional Neural Networks (CNNs) has been widely used in recent ...years, which learn low-level speech representations from raw waveforms. On this basis, a CNN architecture called SincNet proposes a kind of unique convolutional layer, which has achieved band-pass filters. Compared with standard CNNs, SincNet learns the low and high cut-off frequencies of each filter. This paper proposes an improved CNNs architecture called PF-Net, which encourages the first convolutional layer to implement more personalized filters than SincNet. PF-Net parameterizes the frequency domain shape and can realize band-pass filters by learning some deformation points in frequency domain. Compared with standard CNN, PF-Net can learn the characteristics of each filter. Compared with SincNet, PF-Net can learn more characteristic parameters, instead of only low and high cut-off frequencies. This provides a personalized filter bank for different tasks. As a result, our experiments show that the PF-Net converges faster than standard CNN and performs better than SincNet. Our code is available at github.com/TAN-OpenLab/PF-NET.
The fault diagnosis of rolling bearings is a critical technique to realize predictive maintenance for mechanical condition monitoring. In real industrial systems, the main challenges for the fault ...diagnosis of rolling bearings pertain to the accuracy and real-time requirements. Most existing methods focus on ensuring the accuracy, and the real-time requirement is often neglected. In this paper, considering both requirements, we propose a novel fast fault diagnosis method for rolling bearings, based on extreme learning machine (ELM) and logistic mapping, named logistic-ELM. First, we identify 14 kinds of time-domain features from the original vibration signals according to mechanical vibration principles and adopt the sequential forward selection (SFS) strategy to select optimal features from them to ensure the basic predictive accuracy and efficiency. Next, we propose the logistic-ELM for fast fault classification, where the biases in ELM are omitted and the random input weights are replaced by the chaotic logistic mapping sequence which involves a higher uncorrelation to obtain more accurate results with fewer hidden neurons. We conduct extensive experiments on the rolling bearing vibration signal dataset of the Case Western Reserve University (CWRU) Bearing Data Centre. The experimental results show that the proposed approach outperforms existing SOTA comparison methods in terms of the predictive accuracy, and the highest accuracy is 100% in seven separate sub data environments. The relevant code is publicly available at https://github.com/TAN-OpenLab/logistic-ELM.
With the rapid development of online social network (OSN), the influence of all kinds of information in online social network on People's Daily life also increases accordingly. Therefore, it is of ...great significance to study the rule of information diffusion in OSN. Many models of information diffusion have been proposed in most previous studies, and a lot of these models have been modeled by considering the structure of online social networks only. However, such research has been unable to satisfy the complex law of information diffusion in social networks. In this paper, considering the initial influence of information at the beginning of diffusion and the activity of users in forwarding information, the information diffusion model based on particle dynamics was designed inspired by particle dynamics called particle-model. Then, through the experimental analysis of information in wiki_Vote public data set and Sina Weibo real data set, the speed change trend of the number of information activated when users have competitive information in the network under the particle-Model is compared with the classical models including IC Model and SI disease Model. Finally, we compared the change of the number of users who forwarded a message intercepted from the Weibo real data set to illustrate the advantages of the particle-model proposed in this paper.