As the first step of machine-learning based protein structure and function prediction, the amino acid encoding play a fundamental role in the final success of those methods. Different from the ...protein sequence encoding, the amino acid encoding can be used in both residue-level and sequence-level prediction of protein properties by combining them with different algorithms. However, it has not attracted enough attention in the past decades, and there are no comprehensive reviews and assessments about encoding methods so far. In this article, we make a systematic classification and propose a comprehensive review and assessment for various amino acid encoding methods. Those methods are grouped into five categories according to their information sources and information extraction methodologies, including binary encoding, physicochemical properties encoding, evolution-based encoding, structure-based encoding, and machine-learning encoding. Then, 16 representative methods from five categories are selected and compared on protein secondary structure prediction and protein fold recognition tasks by using large-scale benchmark datasets. The results show that the evolution-based position-dependent encoding method PSSM achieved the best performance, and the structure-based and machine-learning encoding methods also show some potential for further application, the neural network based distributed representation of amino acids in particular may bring new light to this area. We hope that the review and assessment are useful for future studies in amino acid encoding.
Along with the development of information technologies such as mobile Internet, information acquisition technology, cloud computing and big data technology, the traditional knowledge engineering and ...knowledge-based software engineering have undergone fundamental changes where the network plays an increasingly important role. Within this context, it is required to develop new methodologies as well as technical tools for network-based knowledge representation, knowledge services and knowledge engineering. Obviously, the term "network" has different meanings in different scenarios. Meanwhile, some breakthroughs in several bottleneck problems of complex networks promote the developments of the new methodologies and technical tools for network-based knowledge representation, knowledge services and knowledge engineering. This paper first reviews some recent advances on complex networks, and then, in conjunction with knowledge graph, proposes a framework of networked knowledge which models knowledge and its relationships with the perspective of complex networks. For the unique advantages of deep learning in acquiring and processing knowledge, this paper reviews its development and emphasizes the role that it played in the development of knowledge engineering. Finally, some challenges and further trends are discussed.
In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could ...effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair.
First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics.
The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment.
Deep metric learning (DML) has achieved great results on visual understanding tasks by seamlessly integrating conventional metric learning with deep neural networks. Existing deep metric learning ...methods focus on designing pair-based distance loss to decrease intra-class distance while increasing interclass distance. However, these methods fail to preserve the geometric structure of data in the embedding space, which leads to the spatial structure shift across mini-batches and may slow down the convergence of embedding learning. To alleviate these issues, by assuming that the input data is embedded in a lower-dimensional sub-manifold, we propose a novel deep Riemannian metric learning (DRML) framework that exploits the non-Euclidean geometric structural information. Considering that the curvature information of data measures how much the Riemannian (non-Euclidean) metric deviates from the Euclidean metric, we leverage geometry flow, which is called a geometric evolution equation, to characterize the relation between the Riemannian metric and its curvature. Our DRML not only regularizes the local neigh-borhoods connection of the embeddings at the hidden layer but also adapts the embeddings to preserve the geometric structure of the data. On several benchmark datasets, the proposed DRML outperforms all existing methods and these results demonstrate its effectiveness.
Knowledge Engineering with Big Data Wu, Xindong; Chen, Huanhuan; Wu, Gongqing ...
IEEE intelligent systems,
2015-Sept.-Oct., 2015-9-00, 20150901, Volume:
30, Issue:
5
Journal Article
Peer reviewed
In the era of big data, knowledge engineering faces fundamental challenges induced by fragmented knowledge from heterogeneous, autonomous sources with complex and evolving relationships. The ...knowledge representation, acquisition, and inference techniques developed in the 1970s and 1980s, driven by research and development of expert systems, must be updated to cope with both fragmented knowledge from multiple sources in the big data revolution and in-depth knowledge from domain experts. This article presents BigKE, a knowledge engineering framework that handles fragmented knowledge modeling and online learning from multiple information sources, nonlinear fusion on fragmented knowledge, and automated demand-driven knowledge navigation.
Quantum-dot cellular automata (QCA) has been widely considered as a replacement candidate for complementary metal-oxide semiconductor (CMOS). The fundamental logic device in QCA is the majority gate. ...In this paper, we propose an efficient methodology for majority logic synthesis of arbitrary Boolean functions. We prove that our method provides a minimal majority expression and an optimal QCA layout for any given three-variable Boolean function. In order to obtain high-quality decomposed Boolean networks, we introduce a new decomposition scheme that can decompose all Boolean networks efficiently. Furthermore, our method removes all the redundancies that are produced in the process of converting a decomposed network into a majority network. In existing methods, however, these redundancies are not considered. We have built a majority logic synthesis tool based on our method and several existing logic synthesis tools. Experiments with 40 multiple-output benchmarks indicate that, compared to existing methods, 37 benchmarks are optimized by our method, up to 31.6%, 78.2%, 75.5%, and 83.3% reduction in level count, gate count, gate input count, and inverter count, respectively, is possible with the average being 4.7%, 14.5%, 13.3%, and 26.4%, respectively. We have also implemented the QCA layouts of 10 benchmarks by using our method. Results indicate that, compared to existing methods, up to 33.3%, 76.7%, and 75.5% reduction in delay, cell count, and area, respectively, is possible with the average being 8.1%, 28.9%, and 29.0%, respectively.
Starting in July 2016, the Ministry of Science and Technology of China, along with several other national agencies, sponsors a 54-month 45-million RMB (Chinese Yuan) project on knowledge engineering ...with Big Data (www.bigke.org) for 15 top research and development institutions to study the fundamental theory and the applications of BigKE, a big-data knowledge engineering framework that handles fragmented knowledge modeling and online learning from multiple information sources, nonlinear fusion on fragmented knowledge, and automated demand-driven knowledge navigation. The project seeks to provide petabytescale data and knowledge services in identified application domains. In this paper, we discuss our BigKE framework, and present a novel application scenario for BigKE services.
After a short introduction to the concepts of knowware, knowware engineering and knowledge middleware, this paper proposes to study the software/knowware co-engineering. Different from the ...traditional software engineering process, it is a mixed process involving both software engineering and knowware engineering issues. The technical subtleties of such a mixed process are discussed and guidelines of building models for it are proposed. It involves three parallel lines of developing system components of different types. The key issues of this process are how to guarantee the correctness and appropriateness of system composition and decomposition. The ladder principle, which is a modification of the waterfall model, and the tower principle, which is a modification of the fountain model, are proposed. We also studied the possibility of equipping the co-engineering process with a formal semantics. The core problem of establishing such a theory is to give a formal semantics to an open knowledge source. We have found a suitable tool for this purpose. That is the co-algebra. We also try to give a preliminary delineation of a co-algebraic semantics for a typical example of open knowledge source – the knowledge distributed on the World Wide Web.
The current remarkable development of communication technology is enabling the intelligent driving system by providing a V2X network to exchange and process transportation data. However, security, as ...a fundamental requirement, is still lacking in many aspects of current V2X networks. For V2X networks, low latency, as a critical requirement for V2X networks, restrains usage of traditional security functions as security related operations like Public Key Infrastructure (PKI) systems also introduce latency. In this article, we propose an efficient scheme by intelligently distributing keys for authentication in V2X networks. The general design is to distribute key pairs valid according to the location information for vehicles and RoadSide Units (RSUs). Thus, based on this authentication scheme relying on location information, keys can be pre-distributed according to the vehicles’ future locations. Also, we propose to use the Recurrent Neural Network (RNN) to predict the future route and locations which can let the key requests started from the vehicles’ ends. The key idea is to provide an intelligent and efficient key distribution protocol for V2X networks. Some experimental results prove the efficiency with evaluations on our proposal compared with the existing solution.
Automatic text summarization (ATS) has recently achieved impressive performance thanks to recent advances in deep learning and the availability of large-scale corpora. However, there is still no ...guarantee that the generated summaries are grammatical, concise, and convey all salient information as the original documents have. To make the summarization results more faithful, this paper presents an unsupervised approach that combines rhetorical structure theory, deep neural model, and domain knowledge concern for ATS. This architecture mainly contains three components: domain knowledge base construction based on representation learning, the attentional encoder–decoder model for rhetorical parsing, and subroutine-based model for text summarization. Domain knowledge can be effectively used for unsupervised rhetorical parsing thus rhetorical structure trees for each document can be derived. In the unsupervised rhetorical parsing module, the idea of translation was adopted to alleviate the problem of data scarcity. The subroutine-based summarization model purely depends on the derived rhetorical structure trees and can generate content-balanced results. To evaluate the summary results without golden standard, we proposed an unsupervised evaluation metric, whose hyper-parameters were tuned by supervised learning. Experimental results show that, on a large-scale Chinese dataset, our proposed approach can obtain comparable performances compared with existing methods.
•Knowledge-guided unsupervised rhetorical parsing.•A domain knowledge base construction approach based on representation learning.•Subroutine-based text summarization model.•An unsupervised evaluation metric for text summarization.