By considering the increasing importance of screen contents, the high efficiency video coding (HEVC) standard includes screen content coding as one of its requirements. In this paper, we demonstrate ...that enabling frame level block searching in HEVC can significantly improve coding efficiency on screen contents. We propose a hash-based block matching scheme for the intra block copy mode and the motion estimation process, which enables frame level block searching in HEVC without changing the HEVC syntaxes. In the proposed scheme, the blocks sharing the same hash values with the current block are selected as prediction candidates. Then the hash-based block selection is employed to select the best candidates. To achieve the best coding efficiency, the rate distortion optimization is further employed to improve the proposed scheme by balancing the coding cost of motion vectors and prediction difference. Compared with HEVC, the proposed scheme achieves 21% and 37% bitrate saving with all intra and low delay configurations with encoding time reduction. Up to 59% bitrate saving can be achieved on sequences with large motions.
Relation prediction aims to infer the missing relations among entities in knowledge graphs, where inductive relation prediction enjoys great popularity due to its effectiveness to be applied to ...emerging entities. Most existing approaches learn the logical compositional rules or utilize subgraphs to predict the missing relation. Although great progress has been made in the performance, current models are still suboptimal due to their limited ability to capture topological information that is critical for local relation prediction. To address this problem, we propose a novel inductive relation prediction approach called substructure-aware subgraph reasoning which incorporates the substructure information of subgraphs into the reasoning process, thus making the relation prediction more precise. Specifically, we extract the entities and relations around the target entities to form the subgraph and then encode the structure information of nodes and edges by counting the number of certain substructures. Next, the structural information is explicitly applied to the message passing for more accurate reasoning. To improve the performance, we also utilize the semantic correlations between relations as auxiliary information. Experimental results on three benchmark datasets show the effectiveness of the proposed approach for the inductive relation prediction.
Compared with natural images, underwater images are usually degraded with blur, scale variation, colour shift and texture distortion, which bring much challenge for computer vision tasks like object ...detection. In this case, generic object detection methods usually fail to achieve satisfactory performance. The main reason is considered that the current methods lack sufficient discriminativeness of feature representation for the degraded underwater images. A a novel multi‐scale feature representation and interaction network for underwater object detection is proposed, in which two core modules are elaborately designed to enhance the discriminativeness of feature representation for underwater images. The first is the Context Integration Module, which extracts rich context information from high‐level features and is integrated with the feature pyramid network to enhance the feature representation in a multi‐scale way. The second is the Dual‐refined Attention Interaction Module, which further enhances the feature representation by sufficient interactions between different levels of features both in channel and spatial domains based on attention mechanism. The proposed model is evaluated on four public underwater datasets. The experimental results compared with state‐of‐the‐art object detection methods show that the proposed model has leading performance, which verifies that it is effective for underwater object detection. In addition, object detection experiments on a foggy dataset of Real‐world Task‐driven Testing Set (RTTS) and the natural image dataset of pattern analysis statistical modelling and computational learning, visual object classes (PASCAL VOC) are conducted. The results show that the proposed model can be applied on the degraded dataset of RTTS but fails on PASCAL VOC.
We propose an underwater object detection method. The experimental results, compared with state‐of‐the‐art object detection methods, show that the proposed model has leading performance, which verifies that it is effective for underwater object detection.
Dimension reduction for high-order tensors is a challenging problem. In conventional approaches, dimension reduction for higher order tensors is implemented via Tucker decomposition to obtain lower ...dimensional tensors. This paper introduces a probabilistic vectorial dimension reduction model for tensorial data. The model represents a tensor by using a linear combination of the same order basis tensors, thus it offers a learning approach to directly reduce a tensor to a vector. Under this expression, the projection base of the model is based on the tensor CandeComp/PARAFAC (CP) decomposition and the number of free parameters in the model only grows linearly with the number of modes rather than exponentially. A Bayesian inference has been established via the variational Expectation Maximization (EM) approach. A criterion to set the parameters (a factor number of CP decomposition and the number of extracted features) is empirically given. The model outperforms several existing principal component analysis-based methods and CP decomposition on several publicly available databases in terms of classification and clustering accuracy.
Medical images exhibit multi‐granularity and high obscurity along boundaries. As representative work, the U‐Net and its variants exhibit two shortcomings on medical image segmentation: (a) they ...expand the range of reception fields by applying addition or concatenate operators to features with different reception fields, which disrupts the distribution of the essential feature of objects; (b) they utilize the downsampling or atrous convolution to characterize multi‐granular features of objects, which can obtain a large range of reception fields but leads to blur boundaries of objects. A Shuffling Atrous Convolutional U‐Net (SACNet) for circumventing those issues is proposed. The significant component of SACNet is the Shuffling Atrous Convolution (SAC) module, which fuses different atrous convolutional layers together by using a shuffle concatenate operation, so that the features from the same channel (which correspond to the same attribute of objects) are merged together. Besides the SAC modules, SACNet utilizes an EP module during the fine and medium levels to enhance the boundaries of objects, and utilizes a Transformer module during the coarse level to capture an overall correlation of pixels. Experiments on three medical image segmentation tasks: abdominal organ, cardiac, and skin lesion segmentation demonstrate that, SACNet outperforms several state‐of‐the‐art methods and facilitates easy transplant to other semantic segmentation tasks.
We propose a Shuffling Atrous Convolutional U‐Net (SACNet) for medical image segmentation. SACNet utilizes a fine‐medium‐coarse U‐Net architecture. Both the fine level and medium level consist of a shuffling atrous convolution module and an edge‐preserving module, which are organized parallel and fused by a residual block.
Three-dimensional human body curve-skeleton is widely used in pose estimation, skeleton animation and other fields. This paper proposes an improved
ℓ
1
median model that can extract three-dimensional ...human body curve-skeleton. The model includes three-dimensional human body reconstruction from multi-view images, interpolation curve-skeleton extraction,
ℓ
1
median skeleton completion, and continuous frame curve-skeleton optimization. Through the completion and optimization processes, the curve-skeleton we extract is smoother and more complete compared with previous methods. We conduct experiments on multi-view human body image dataset collected from light field acquisition system. Both quantitative and qualitative results demonstrate the effectiveness of our model.
The emerging low rank matrix approximation (LRMA) method provides an energy efficient scheme for data collection in wireless sensor networks (WSNs) by randomly sampling a subset of sensor nodes for ...data sensing. However, the existing LRMA based methods generally underutilize the spatial or temporal correlation of the sensing data, resulting in uneven energy consumption and thus shortening the network lifetime. In this paper, we propose a correlated spatio-temporal data collection method for WSNs based on LRMA. In the proposed method, both the temporal consistence and the spatial correlation of the sensing data are simultaneously integrated under a new LRMA model. Moreover, the network energy consumption issue is considered in the node sampling procedure. We use Gini index to measure both the spatial distribution of the selected nodes and the evenness of the network energy status, then formulate and resolve an optimization problem to achieve optimized node sampling. The proposed method is evaluated on both the simulated and real wireless networks and compared with state-of-the-art methods. The experimental results show the proposed method efficiently reduces the energy consumption of network and prolongs the network lifetime with high data recovery accuracy and good stability.
At present, underwater exploration and salvage, underwater archaeology, and other underwater operations still mainly rely on professional underwater operators. Considering that artificial underwater ...operation is faced with the problems of small exploration scope, poor working environment, and low work efficiency, it is the future trend to use robots to replace manual underwater operation in related fields. Most of the current underwater robots are artificial remote-controlled, which lack intelligent detection and autonomous grasping system. In this paper, a grasping robot equipped with an AI computing platform is developed to enable the autonomous grasping of underwater targets by using stereo vision technology. For the problem of difficult detection due to the small size and occlusion of underwater targets, this paper proposes Cascade DetNet, which can improve recognition accuracy. The experimental results show that our proposed method achieves the best performance on URPC dataset compared with several mainstream methods. In addition, we also carry out the autonomous grasping of seafood in a real marine environment to verify the autonomous grasping performance of underwater vehicles.
Abstract
Real‐time passenger‐flow anomaly detection at all metro stations is a very critical task for advanced Internet management. Robust principal component analysis (RPCA) based method has often ...been employed for anomaly detection task of multivariate time series data. However, it ignores the spatio‐temporal features of regular passenger‐flow patterns, resulting in a decrease in the accuracy of anomaly detection. In this paper, RT‐STRPCA model integrating temporal periodicity and spatial similarity is proposed to address the above issues. RT‐STRPCA model detects anomalies by decomposing the observation data into normal component and anomaly component. The spatio‐temporal constraints are taken into account to extract anomalies more accurately. The real‐time anomaly detection are realized by a sliding window. The performance of RT‐STRPCA model is evaluated on synthetic datasets and real‐world datasets. The experimental results on synthetic datasets demonstrate that the method achieves more accurate real‐time anomaly detection than baseline approaches and the result on real‐world datasets verify the utility and effectiveness of the proposed method.
Matrix-variate Restricted Boltzmann Machine (MVRBM), a variant of Restricted Boltzmann Machine, has demonstrated excellent capacity of modelling matrix variable. However, MVRBM is still an ...unsupervised generative model, and is usually used to feature extraction or initialization of deep neural network. When MVRBM is used to classify, additional classifiers must be added. In order to make the MVRBM itself be supervised, in this paper, we propose improved MVRBMs for classification, which can be used to classify 2D data directly and accurately. To this end, on one hand, classification constraint is added to MVRBM to get Matrix-variate Restricted Boltzmann Machine Classification Model (ClassMVRBM). On the other hand, fisher discriminant analysis criterion for matrix-style variable is proposed and applied to the hidden variable, therefore, the extracted feature is more discriminative so as to enhance the classification performance of ClassMVRBM. We call the novel model Matrix-variate Restricted Boltzmann Machine Classification Model with Fisher discriminant analysis (ClassMVRBM-MVFDA). Experimental results on some publicly available databases demonstrate the superiority of the proposed models. Of which, the image classification accuracy of ClassMVRBM is higher than conventional unsupervised RBM, its variants and supervised Restricted Boltzmann Machine Classification Model (ClassRBM) for vector variable. Especially, the image classification accuracy of the proposed ClassMVRBM-MVFDA performs better than supervised ClassMVRBM and vectorial RBM-FDA.