Bus arrival time prediction intends to improve the level of the services provided by transportation agencies. Intuitively, many stochastic factors affect the predictability of the arrival time, e.g. ..., weather and local events. Moreover, the arrival time prediction for a current station is closely correlated with that of multiple passed stations. Motivated by the observations above, this paper proposes to exploit the long-range dependencies among the multiple time steps for bus arrival prediction via recurrent neural network (RNN). Concretely, RNN with long short-term memory block is used to "correct" the prediction for a station by the correlated multiple passed stations. During the correlation among multiple stations, one-hot coding is introduced to fuse heterogeneous information into a unified vector space. Therefore, the proposed framework leverages the dynamic measurements ( i.e. , historical trajectory data) and the static observations ( i.e. , statistics of the infrastructure) for bus arrival time prediction. In order to fairly compare with the state-of-the-art methods, to the best of our knowledge, we have released the largest data set for this task. The experimental results demonstrate the superior performances of our approach on this data set.
Academic performance prediction is a fundamental and hot issue in educational data mining (EDM). Recently, researchers have proposed a series of effective machine learning (ML) based classification ...strategies to predict students’ academic performance. However, prior arts are typically concerned about individual models but neglect the association among students, which might considerably have an effect on the integrity of the academic performance-related representations. Meanwhile, students’ multi-viewing behavior contains complex relations among students. Therefore, we propose a Multi-View Hypergraph Neural Network (MVHGNN) for predicting students’ academic performance. MVHGNN uses hypergraphs to construct high-order relations among students. The semantic information implied by multiple behaviors is consolidated through meta-paths. Further, a Cascade Attention Transformer (CAT) module is introduced to mine the weight of different behaviors by the self-attention mechanism. Our method is evaluated on real campus student behavioral datasets. The experimental results demonstrate that our method outperforms the state-of-the-art ones.
Although object detection algorithms based on deep learning have been widely used in many scenarios, they face challenges under some degraded conditions, such as low-light. A conventional solution is ...that image enhancement approaches are used as a separate pre-processing module to improve the quality of degraded image. However, this two-step approach makes it difficult to unify the goals of enhancement and detection, that is, low-light enhancement operations are not always helpful for subsequent object detection. Recently, some works try to integrate enhancement and detection in an end-to-end network, but still suffer from complex network structure, training convergence problem and demanding reference images. To address above problems, a plug-and-play image enhancement model is proposed in this paper, namely, low-light image enhancement (LLIE) model, which can be easily embedded into some off-the-shelf object detection methods in an end-to-end manner. LLIE is composed of a parameter estimation module and image processing module. The former learns to regress lighting enhancement parameters according to the feedback of detection network, and the latter enhances degraded image adaptively to promote subsequent detection model under low-light condition. Extensive object detection experiments on several low-light image data sets show that the performance of detector is significantly improved when LLIE is integrated.
This paper introduces an L1-norm-based probabilistic principal component analysis model on 2D data (L1-2DPPCA) based on the assumption of the Laplacian noise model. The Laplacian or L1 density ...function can be expressed as a superposition of an infinite number of Gaussian distributions. Under this expression, a Bayesian inference can be established based on the variational expectation maximization approach. All the key parameters in the probabilistic model can be learned by the proposed variational algorithm. It has experimentally been demonstrated that the newly introduced hidden variables in the superposition can serve as an effective indicator for data outliers. Experiments on some publicly available databases show that the performance of L1-2DPPCA has largely been improved after identifying and removing sample outliers, resulting in more accurate image reconstruction than the existing PCA-based methods. The performance of feature extraction of the proposed method generally outperforms other existing algorithms in terms of reconstruction errors and classification accuracy.
In multicamera video surveillance, it is challenging to represent videos from different cameras properly and fuse them efficiently for specific applications such as human activity recognition and ...clustering. In this paper, a novel representation for multicamera video data, namely, the product Grassmann manifold (PGM), is proposed to model video sequences as points on the Grassmann manifold and integrate them as a whole in the product manifold form. In addition, with a new geometry metric on the product manifold, the conventional low rank representation (LRR) model is extended onto PGM and the new LRR model can be used for clustering nonlinear data, such as multicamera video data. To evaluate the proposed method, a number of clustering experiments are conducted on several multicamera video data sets of human activity, including the Dongzhimen Transport Hub Crowd action data set, the ACT 42 Human Action data set, and the SKIG action data set. The experiment results show that the proposed method outperforms many state-of-the-art clustering methods.
Traditional synthesis/analysis sparse representation models signals in a one dimensional (1D) way, in which a multidimensional (MD) signal is converted into a 1D vector. 1D modeling cannot ...sufficiently handle MD signals of high dimensionality in limited computational resources and memory usage, as breaking the data structure and inherently ignores the diversity of MD signals (tensors). We utilize the multilinearity of tensors to establish the redundant basis of the space of multi linear maps with the sparsity constraint, and further propose MD synthesis/analysis sparse models to effectively and efficiently represent MD signals in their original form. The dimensional features of MD signals are captured by a series of dictionaries simultaneously and collaboratively. The corresponding dictionary learning algorithms and unified MD signal restoration formulations are proposed. The effectiveness of the proposed models and dictionary learning algorithms is demonstrated through experiments on MD signals denoising, image super-resolution and texture classification. Experiments show that the proposed MD models outperform state-of-the-art 1D models in terms of signal representation quality, computational overhead, and memory storage. Moreover, our proposed MD sparse models generalize the 1D sparse models and are flexible and adaptive to both homogeneous and inhomogeneous properties of MD signals.
Nowadays, face detection and head pose estimation have a lot of application such as face recognition, aiding in gaze estimation and modeling attention. For these two tasks, it is usually to design ...two different models. However, the head pose estimation model often depends on the region of interest (ROI) detected in advance, which means that a serial face detector is needed. Even the lightest face detector will slow down the whole forward inference time and cannot achieve real-time performance when detecting the head pose of multiple people. We can see that both face detection and head pose estimation need face features, so a shared face feature map can be used between them. In this paper, a multi-task learning model is proposed that can solve both problems simultaneously. We directly detect the location of the center point of the bounding box of face; at this location, we calculate the size of the bounding box of face and the head attitude. We evaluate our model’s performance on the AFLW. The proposed model has great competitiveness with the multi-stage face attribute analysis model, and our model can achieve real-time performance.
The different reconstruction parameters of CT imaging lead to domain shifts, which limits the generalization of deep learning models and their applications in computer-aided diagnosis systems. In ...this paper, we investigate the multi-source domain generalization (DG) problem in the context of lung nodule detection from CT images. We first identify the reconstructed convolution kernel as the key parameter leading to domain shifts. Accordingly, we reorganize the public LUNA16 dataset into a domain generalization benchmark,
i.e.,
, LUNA-DG. Then, we propose a novel DG method by adversarial frequency alignment (AFA). Specifically, we devise an adaptive transition module (ATM) to learn a frequency attention map that can align different domain images in a common frequency domain. For this purpose, a fidelity discriminator and a multi-domain discriminator are used to train the ATM alternately and adversarially. In addition, to mitigate the issue of ineffective gradient back-propagation in naive multi-domain adversarial learning, we propose a novel random domain adversarial learning (RDAL) strategy that can back-propagate effective gradient signals and gradually reduce the gap between multiple domains. The ATM can be combined with nodule detection models through differentiable Fast Fourier Transform (FFT) and inverse FFT, allowing end-to-end training. Experimental results on both LUNA-DG and our in-house datasets validate the superiority of AFA over representative DG methods.
Detecting salient objects in complicated scenarios is a challenging problem. Except for semantic features from the RGB image, spatial information from the depth image also provides sufficient cues ...about the object. Therefore, it is crucial to rationally integrate RGB and depth features for the RGB-D salient object detection task. Most existing RGB-D saliency detectors modulate RGB semantic features with absolution depth values. However, they ignore the appearance contrast and structure knowledge indicated by relative depth values between pixels. In this work, we propose a depth-induced network (DIN) for RGB-D salient object detection, to take full advantage of both absolute and relative depth information, and further, enforce the in-depth fusion of the RGB-D cross-modalities. Specifically, an absolute depth-induced module (ADIM) is proposed, to hierarchically integrate absolute depth values and RGB features, to allow the interaction between the appearance and structural information in the encoding stage. A relative depth-induced module (RDIM) is designed, to capture detailed saliency cues, by exploring contrastive and structural information from relative depth values in the decoding stage. By combining the ADIM and RDIM, we can accurately locate salient objects with clear boundaries, even from complex scenes. The proposed DIN is a lightweight network, and the model size is much smaller than that of state-of-the-art algorithms. Extensive experiments on six challenging benchmarks, show that our method outperforms most existing RGB-D salient object detection models.
By considering the increasing importance of screen contents, the high efficiency video coding (HEVC) standard includes screen content coding as one of its requirements. In this paper, we demonstrate ...that enabling frame level block searching in HEVC can significantly improve coding efficiency on screen contents. We propose a hash-based block matching scheme for the intra block copy mode and the motion estimation process, which enables frame level block searching in HEVC without changing the HEVC syntaxes. In the proposed scheme, the blocks sharing the same hash values with the current block are selected as prediction candidates. Then the hash-based block selection is employed to select the best candidates. To achieve the best coding efficiency, the rate distortion optimization is further employed to improve the proposed scheme by balancing the coding cost of motion vectors and prediction difference. Compared with HEVC, the proposed scheme achieves 21% and 37% bitrate saving with all intra and low delay configurations with encoding time reduction. Up to 59% bitrate saving can be achieved on sequences with large motions.