Specialized services and management must understand students' behavioral patterns in a timely and accurate manner. Based on these patterns, we can make targeted rules, especially for unexpected ...patterns. To perform this type of work, a questionnaire method is usually used to collect data and analyze students' behavioral states. However, the effectiveness of this method is greatly influenced by the timeliness and validity of the feedback data. To address this problem, we propose an unsupervised ensemble clustering framework to use student behavioral data to discover behavioral patterns. Because the behavioral data produced by students on campus are available in real time without intentional bias, clustering analysis can be relatively efficient and reliable. The proposed framework extracts behavior features from the two perspectives of statistics and entropy and then combines density-based spatial clustering of applications with noise (DBSCAN) and k-means algorithms to discover behavioral patterns. To evaluate the performance of the proposed framework, we carry out experiments on six types of behavioral data produced by undergraduates in a university in Beijing and analyze the relationships between different behavioral patterns and students' grade point averages (GPAs). The results show that the framework can not only detect anomalous behavioral patterns but also find mainstream patterns. The findings from this research can assist student-related departments in providing better services and management, such as psychological consulting and academic guidance.
The citywide crowd flow prediction is crucial for a city to ensure productivity, safety and management of its citizen. However, the crowd flow may be affected by many factors, such as weather, ...working times, events, seasons, and so on. In this paper, we proposed Attentive Spatio-Temporal Inception ResNet (ASTIR), which aims to address the difficulty of crowd flow prediction. The ASTIR is based on the Inception-ResNet structure combined with Convolution-LSTM layers and attention module to better capture pattern movement changes. We build our deep neural network framework consisting of four distinct parts, by which we can capture the short-term, long-term and period properties, as well as external factors that can affect crowd flow behaviors. To show the performance of the proposed method, we use the widely applied benchmarks for crowd flow prediction (Taxi Beijing and Bike New York), and obtain notable improvements over the state-of-the-art approaches.
Screen content like cartoons, captures of typical computer screens or video with text overlay or news ticker is an important category of video, which needs new techniques beyond the existing video ...coding techniques. In this paper, we analyze the characteristics of screen content and coding efficiency of HEVC on screen content. We propose a new coding scheme, which adopts a non-transform representation, separating screen content into color component and structure component. Based on the proposed representation, two coding modes are designed for screen content to exploit the directional correlation and non-translational changes in screen video sequences. The proposed scheme is then seamlessly incorporated into the HEVC structure and implemented into HEVC range extension reference software HM9.0. Experimental results show that the proposed scheme achieves up to 52.6% bitrate saving compared with HM9.0. On average, 35.1%, 29.2% and 23.6% bitrate saving are achieved with intra, random-access and low-delay configurations, respectively. The visual quality of the decoded video sequences is also significantly improved by reducing ringing artifacts around sharp edges and reserving the shape of text without blur.
Dockless bike sharing plays an important role in complementing urban transportation systems and promoting the sustainable development of cities worldwide. To improve system operational efficiency, it ...is critical to study the spatiotemporal patterns of dockless bike sharing demand as well as factors influencing these patterns. Based on bicycle trip data from Mobike, Point of Interest (POI) data and smart card data in Beijing, we built a spatially embedded network and implemented the Infomap algorithm, a community detection method to uncover the usage patterns. Then, the Gradient Boosting Decision Tree (GBDT) model was adopted to investigate the effect of the built environment and public transit services by controlling the temporal variables. The spatiotemporal distribution shows imbalanced characteristics. About half of the total trips occur in the morning/evening rush hours and at noon. The community detection results further reveal a polycentric pattern of trip demand distribution and 120 sub-regions with a significant difference in connection strength and scale. The result of the GBDT model indicates that factors including subway ridership, bus ridership, hour, residence density, office density have considerable impacts on trip demand, contributing about 62.6% of the total influence. Factors also represent complex nonlinear relationships with dockless bike sharing usage. The effect ranges of each factor were identified, it indicates rebalancing schemes could be changed according to spatial location. These findings may help planners and policymakers to determine the reasonable scale of bike deployment and improve the efficiency of redistribution in local regions while reducing rebalance costs.
Referring image segmentation identifies the object masks from images with the guidance of input natural language expressions. Nowadays, many remarkable cross‐modal decoder are devoted to this task. ...But there are mainly two key challenges in these models. One is that these models usually lack to extract fine‐grained boundary information and gradient information of images. The other is that these models usually lack to explore language associations among image pixels. In this work, a Multi‐scale Gradient balanced Central Difference Convolution (MG‐CDC) and a Graph convolutional network‐based Language and Image Fusion (GLIF) for cross‐modal encoder, called Graph‐RefSeg, are designed. Specifically, in the shallow layer of the encoder, the MG‐CDC captures comprehensive fine‐grained image features. It could enhance the perception of target boundaries and provide effective guidance for deeper encoding layers. In each encoder layer, the GLIF is used for cross‐modal fusion. It could explore the correlation of every pixel and its corresponding language vectors by a graph neural network. Since the encoder achieves robust cross‐modal alignment and context mining, a light‐weight decoder could be used for segmentation prediction. Extensive experiments show that the proposed Graph‐RefSeg outperforms the state‐of‐the‐art methods on three public datasets. Code and models will be made publicly available at https://github.com/ZYQ111/Graph_refseg.
In this work, we design a Multi‐scale Gradient balanced Central Difference Convolution (MG‐CDC) and a Graph convolutional network‐based Language and Image Fusion (GLIF) for cross‐modal encoder. Since our encoder achieves robust cross‐modal alignment and context mining, we could use a light‐weight decoder for segmentation prediction. Extensive experiments show that our method outperforms the state‐of‐the‐art methods on three public datasets.
Crowd counting provides an important foundation for public security and urban management. Due to the existence of small targets and large density variations in crowd images, crowd counting is a ...challenging task. Mainstream methods usually apply convolution neural networks (CNNs) to regress a density map, which requires annotations of individual persons and counts. Weakly-supervised methods can avoid detailed labeling and only require counts as annotations of images, but existing methods fail to achieve satisfactory performance because a global perspective field and multi-level information are usually ignored. We propose a weakly-supervised method, DTCC, which effectively combines multi-level dilated convolution and transformer methods to realize end-to-end crowd counting. Its main components include a recursive swin transformer and a multi-level dilated convolution regression head. The recursive swin transformer combines a pyramid visual transformer with a fine-tuned recursive pyramid structure to capture deep multi-level crowd features, including global features. The multi-level dilated convolution regression head includes multi-level dilated convolution and a linear regression head for the feature extraction module. This module can capture both low- and high-level features simultaneously to enhance the receptive field. In addition, two regression head fusion mechanisms realize dynamic and mean fusion counting. Experiments on four well-known benchmark crowd counting datasets (UCF_CC_50, ShanghaiTech, UCF_QNRF, and JHU-Crowd++) show that DTCC achieves results superior to other weakly-supervised methods and comparable to fully-supervised methods.
Benefiting from the good physical interpretations and low computational complexity, non‐negative matrix factorization (NMF) has attracted wide attentions in data representation learning tasks. Some ...graph‐based NMF approaches make the learned representation encode the topological structure by the local graph Laplacian regularizer, which improves the discriminant ability of data representation. However, the performance of graph‐based NMF methods depend heavily on the quality of the predefined graph and the complexity of models is high. Here, a globality constrained adaptive graph regularized non‐negative matrix factorization for data representation (GCAG‐NMF) model is proposed, which not only uses the self‐representation characteristics of data to learn an adaptive graph to describe the sample relationship more accurately, but also proposes a graph factorization technique to reduce the complexity of the model and improve the discriminative ability of data representation. Then, an iterative optimizing strategy with low complexity and strict convergence guarantee is developed to optimize the objective function. Experimental results on some databases demonstrate the effectiveness of the proposed model.
As the core technology of deep learning, convolutional neural networks have been widely applied in a variety of computer vision tasks and have achieved state-of-the-art performance. However, it’s ...difficult and inefficient for them to deal with high dimensional image signals due to the dramatic increase of training parameters. In this paper, we present a lightweight and efficient MS-Net for the multi-dimensional(MD) image processing, which provides a promising way to handle MD images, especially for devices with limited computational capacity. It takes advantage of a series of one dimensional convolution kernels and introduces a separable structure in the ConvNet throughout the learning process to handle MD image signals. Meanwhile, multiple group convolutions with kernel size 1 × 1 are used to extract channel information. Then the information of each dimension and channel is fused by a fusion module to extract the complete image features. Thus the proposed MS-Net significantly reduces the training complexity, parameters and memory cost. The proposed MS-Net is evaluated on both 2D and 3D benchmarks CIFAR-10, CIFAR-100 and KTH. Extensive experimental results show that the MS-Net achieves competitive performance with greatly reduced computational and memory cost compared with the state-of-the-art ConvNet models.
Spatiotemporal traffic data exhibit multi‐granular low‐rank structure due to their periodicity among different timelines. Traditional low rank data completion methods fail to characterize such ...properties and produce unsatisfactory results for data imputation. In this paper, a tensorial weighted Schatten‐p norm minimization (TWSN) is proposed for spatiotemporal traffic data imputation. TWSN consists of an approximation term and a low‐rank regularization term over the recovered tensor data, where the latter is a combination of the weighted Schatten‐p norm in the matrix form of each mode of the tensor. For each mode, TWSN utilizes a selection scheme of the mode‐wise weights to capture different properties of singular values of each mode of the tensor. Overall, TWSN not only plays a balancing role between the rank function and the nuclear norm, but also captures the anisotropic correlation of singular values of each mode of the tensor. TWSN is evaluated on four real‐world datasets with different ping frequencies (2, 5, 10 min) and its performance is compared with several state‐of‐the‐art methods. The experimental results show that TWSN outperforms other methods under various data missing scenarios.
Traffic forecasting is an important part in realising intelligent traffic management, which helps traffic controllers and travellers make effective decisions. However, traffic forecasting accuracy is ...often affected by missing traffic data due to hardware and software failure. Therefore, accurate prediction based on incomplete traffic data is an important problem as well as a challenge. Though many approaches recover the missing values before prediction, the errors from the data‐filling step are likely to cause additional bias to the prediction result. Besides, this tactic is difficult to guarantee the timeliness and may impede real‐time prediction. In this case, a traffic forecasting model is proposed to directly predict traffic data with missing values. This model develops tensor formed dynamic mode decomposition, recording the dynamic information of traffic data into a state transition tensor. In addition, the model takes low rank property of the dynamic tensor and the similarity of temporal variation trend into consideration. In order to verify the effectiveness and the robustness of the proposed model, experiments were performed on two real‐world time series datasets. The results demonstrate that the model achieves better performance on forecasting than other baseline approaches under the impact of missing data.