The interactions between proteins and aptamers are prevalent in organisms and play an important role in various life activities. Thanks to the rapid accumulation of protein-aptamer interaction data, ...it is necessary and feasible to construct an accurate and effective computational model to predict aptamers binding to certain interested proteins and protein-aptamer interactions, which is beneficial for understanding mechanisms of protein-aptamer interactions and improving aptamer-based therapies. In this study, a novel web server named PPAI is developed to predict aptamers and protein-aptamer interactions with key sequence features of proteins/aptamers and a machine learning framework integrated adaboost and random forest. A new method for extracting several key sequence features of both proteins and aptamers is presented, where the features for proteins are extracted from amino acid composition, pseudo-amino acid composition, grouped amino acid composition, C/T/D composition and sequence-order-coupling number, while the features for aptamers are extracted from nucleotide composition, pseudo-nucleotide composition (PseKNC) and normalized Moreau-Broto autocorrelation coefficient. On the basis of these feature sets and balanced the samples with SMOTE algorithm, we validate the performance of PPAI by the independent test set. The results demonstrate that the Area Under Curve (AUC) is 0.907 for prediction of aptamer, while the AUC reaches 0.871 for prediction of protein-aptamer interactions. These results indicate that PPAI can query aptamers and proteins, predict aptamers and predict protein-aptamer interactions in batch mode precisely and efficiently, which would be a novel bioinformatics tool for the research of protein-aptamer interactions. PPAI web-server is freely available at http://39.96.85.9/PPAI.
Modeling complex spatial and temporal dependencies in multivariate time series data is crucial for traffic forecasting. Graph convolutional networks have proved to be effective in predicting ...multivariate time series. Although a predefined graph structure can help the model converge to good results quickly, it also limits the further improvement of the model due to its stationary state. In addition, current methods may not converge on some datasets due to the graph structure of these datasets being difficult to learn. Motivated by this, we propose a novel model named Dynamic Correlation Graph Convolutional Network (DCGCN) in this paper. The model can construct adjacency matrices from input data using a correlation coefficient; thus, dynamic correlation graph convolution is used for capturing spatial dependencies. Meanwhile, gated temporal convolution is used for modeling temporal dependencies. Finally, we performed extensive experiments to evaluate the performance of our proposed method against ten existing well-recognized baseline methods using two original and four public datasets.
Acute myocardial infarction (AMI) is the common cause of mortality in developed countries. The feasibility of whole-genome gene expression analysis to identify outcome-related genes and dysregulated ...pathways remains unknown. Molecular marker such as BNP, CRP and other serum inflammatory markers have got the notice at this point. However, these biomarkers exhibit elevated levels in patients with thyroid disease, renal failure and congestive heart failure. In this study, three groups of microarray data sets (GES66360, GSE48060, GSE29532) were collected from GEO, a total of 99, 52 and 55 samples, respectively. Weighted gene co-expression network analysis (WGCNA) was performed to obtain a classifier which composed of related genes that best characterize the AMI.
Here, this study obtained three groups of microarray data sets (GES66360, GSE48060, GSE29532) on AMI blood samples, a total of 99, 52 and 24 samples, respectively. In all, 4672 genes, 3185 genes, 3660 genes were identified in GSE66360, GSE48060, GSE60993 modules, respectively. We preformed WGCNA, GO and KEGG pathway enrichment analysis on these three data sets, finding function enrichment of the differential expression gene on inflammation and immune response. Transcriptome analysis were performed in AMI patients at four time points compared to CAD patients with no history of MI, to determine gene expression profiles and their possible changes during the recovery from myocardial infarction.
The results suggested that three overlapping genes (FGFBP2, GFOD1 and MLC1) between two modules could be a potential use of gene biomarkers for the diagnose of AMI.
Seismic data interpolation techniques are extremely economical for removing negative effects resulting from insufficient spatial sampling. In recent years, self-learning mechanism-based methods ...(relatively intelligent methods), such as machine learning (ML) and deep learning, have increasingly attracted the attention of many scholars. Support vector regression machine (SVR), a kind of ML method, can obtain good reconstructed performance for seismic data interpolation. However, the performance is influenced directly by the kernel function, which controls the map from the input space to the feature space. In other words, a good and suitable kernel function can contribute to the extraction of good features and improve the self-learning ability. In this paper, Ricker kernel function suitable for seismic data, rather than the traditional Gaussian kernel function, will be introduced and applied in the new SVR-based interpolation method. In detail, the Ricker wavelet is widely used to simulate seismic data. And the Ricker kernel function can be used specially for the seismic data process. Under the guarantee of the special kernel function, the input time-space vector series are trained as a better model for future prediction of missing seismic data. Numerical experiments on synthetic and real field data show better reconstruction ability than that of SVR based on the traditional kernel function. Furthermore, we discuss the difference between the relatively shallow ML method (SVR) and the deep learning neural networks method. Except for the computer hardware, deep neural learning methods will impose stringent requirements on the training data, not only with respect to quantity but also including data quality. From this perspective, SVR shows more flexibility in the self-learning process.
The new technology of single-cell RNA sequencing (scRNA-seq) can yield valuable insights into gene expression and give critical information about the cellular compositions of complex tissues. In ...recent years, vast numbers of scRNA-seq datasets have been generated and made publicly available, and this has enabled researchers to train supervised machine learning models for predicting or classifying various cell-level phenotypes. This has led to the development of many new methods for analyzing scRNA-seq data. Despite the popularity of such applications, there has as yet been no systematic investigation of the performance of these supervised algorithms using predictors from various sizes of scRNA-seq datasets. In this study, 13 popular supervised machine learning algorithms for cell phenotype classification were evaluated using published real and simulated datasets with diverse cell sizes. This benchmark comprises two parts. In the first, real datasets were used to assess the computing speed and cell phenotype classification performance of popular supervised algorithms. The classification performances were evaluated using the area under the receiver operating characteristic curve, F1-score, Precision, Recall, and false-positive rate. In the second part, we evaluated gene-selection performance using published simulated datasets with a known list of real genes. The results showed that ElasticNet with interactions performed the best for small and medium-sized datasets. The NaiveBayes classifier was found to be another appropriate method for medium-sized datasets. With large datasets, the performance of the XGBoost algorithm was found to be excellent. Ensemble algorithms were not found to be significantly superior to individual machine learning methods. Including interactions in the ElasticNet algorithm caused a significant performance improvement for small datasets. The linear discriminant analysis algorithm was found to be the best choice when speed is critical; it is the fastest method, it can scale to handle large sample sizes, and its performance is not much worse than the top performers.
Early detection of malignant pulmonary nodules is of great help to the treatment of lung cancer. Yet it is difficult to establish a general diagnostic standard because of the two main characteristics ...of pulmonary nodules: different sizes and irregular shapes. To address this problem effectively, an improved pulmonary nodule detection model based on deformable convolution is proposed. Specifically, by adding a branch network to obtain the offsets, the process of feature extraction is more suitable with the shape of nodule itself. Besides, a simple but effective strategy is proposed for the size variability of pulmonary nodules, which is combined with the multilevel information as well as the fusion of different sizes feature maps. Compared with the two-dimensional convolution neural network and other advanced technologies, our method has a significant improvement, and its mean average precision can achieve 82.7%.
With the availability of low-cost depth-visual sensing devices, such as Microsoft Kinect, we are experiencing a growing interest in indoor environment understanding, at the core of which is semantic ...segmentation in RGB-D image. The latest research shows that the convolutional neural network (CNN) still dominates the image semantic segmentation field. However, down-sampling operated during the training process of CNNs leads to unclear segmentation boundaries and poor classification accuracy. To address this problem, in this paper, we propose a novel end-to-end deep architecture, termed FuseCRFNet, which seamlessly incorporates a fully-connected Conditional Random Fields (CRFs) model into a depth-based CNN framework. The proposed segmentation method uses the properties of pixel-to-pixel relationships to increase the accuracy of image semantic segmentation. More importantly, we formulate the CRF as one of the layers in FuseCRFNet to refine the coarse segmentation in the forward propagation, in meanwhile, it passes back the errors to facilitate the training. The performance of our FuseCRFNet is evaluated by experimenting with SUN RGB-D dataset, and the results show that the proposed algorithm is superior to existing semantic segmentation algorithms with an improvement in accuracy of at least 2%, further verifying the effectiveness of the algorithm.
The recommendation model based on the knowledge graph (KG) alleviates the problem of data sparsity in the recommendation to a certain extent and further improves the accuracy, diversity, and ...interpretability of recommendations. Therefore, the knowledge graph recommendation model has become a major research topic, and the question of how to utilize the entity and relation information fully and effectively in KG has become the focus of research. This paper proposes a knowledge graph recommendation model based on adversarial training (ATKGRM), which can dynamically and adaptively adjust the knowledge graph aggregation weight based on adversarial training to learn the features of users and items more reasonably. First, the generator adopts a novel long- and short-term interest model to obtain user features and item features and generates a high-quality set of candidate items. Then, the discriminator discriminates candidate items by comparing the user’s scores of positive items, negative items, and candidate items. Finally, experimental studies on five real-world datasets with multiple knowledge graph recommendation models and multiple adversarial training recommendation models prove the effectiveness of our model.