Essential genes encode functions that play a vital role in the life activities of organisms, encompassing growth, development, immune system functioning, and cell structure maintenance. Conventional ...experimental techniques for identifying essential genes are resource-intensive and time-consuming, and the accuracy of current machine learning models needs further enhancement. Therefore, it is crucial to develop a robust computational model to accurately predict essential genes.
In this study, we introduce GCNN-SFM, a computational model for identifying essential genes in organisms, based on graph convolutional neural networks (GCNN). GCNN-SFM integrates a graph convolutional layer, a convolutional layer, and a fully connected layer to model and extract features from gene sequences of essential genes. Initially, the gene sequence is transformed into a feature map using coding techniques. Subsequently, a multi-layer GCN is employed to perform graph convolution operations, effectively capturing both local and global features of the gene sequence. Further feature extraction is performed, followed by integrating convolution and fully-connected layers to generate prediction results for essential genes. The gradient descent algorithm is utilized to iteratively update the cross-entropy loss function, thereby enhancing the accuracy of the prediction results. Meanwhile, model parameters are tuned to determine the optimal parameter combination that yields the best prediction performance during training.
Experimental evaluation demonstrates that GCNN-SFM surpasses various advanced essential gene prediction models and achieves an average accuracy of 94.53%. This study presents a novel and effective approach for identifying essential genes, which has significant implications for biology and genomics research.
Rapid advancements in protein sequencing technology have resulted in gaps between proteins with identified sequences and those with mapped structures. Although sequence-based predictions offer ...insights, they can be incomplete due to the absence of structural details. Conversely, structure-based methods face challenges with respect to newly sequenced proteins. The AlphaFold Multimer has remarkable accuracy in predicting the structure of protein complexes. However, it cannot distinguish whether the input protein sequences can interact. Nonetheless, by analyzing the information in the models predicted by the AlphaFold Multimer, we propose a highly accurate method for predicting protein interactions. This study focuses on the use of deep neural networks, specifically to analyze protein complex structures predicted by the AlphaFold Multimer. By transforming atomic coordinates and utilizing sophisticated image-processing techniques, vital 3D structural details were extracted from protein complexes. Recognizing the significance of evaluating residue distances in protein interactions, this study leveraged image recognition approaches by integrating Densely Connected Convolutional Networks (DenseNet) and Deep Residual Network (ResNet) within 3D convolutional networks for protein 3D structure analysis. When benchmarked against leading protein-protein interaction prediction methods, such as SpeedPPI, D-script, DeepTrio, and PEPPI, our proposed method, named SpatialPPI, exhibited notable efficacy, emphasizing the promising role of 3D spatial processing in advancing the realm of structural biology. The SpatialPPI code is available at: https://github.com/ohuelab/SpatialPPI.
Display omitted
•SpatialPPI predicts protein-protein interactions based on the structure of protein complexes from AlphaFold Multimer results.•SpatialPPI renders the predicted structures into 3D tensors through spatial-based rendering and analysis using deep neural networks.•Achieves superior prediction performance of protein-protein interactions over other approaches.•SpatialPPI boosts pathway analysis and drug target identification through accurate protein-protein interaction predictions.
A massive number of paper documents that include important information such as circuit schematics can be converted into digital documents by optical sensors like scanners or digital cameras. However, ...extracting the netlists of analog circuits from digital documents is an exceptionally challenging task. This process aids enterprises in digitizing paper-based circuit diagrams, enabling the reuse of analog circuit designs and the automatic generation of datasets required for intelligent design models in this domain. This paper introduces a bottom-up graph encoding model aimed at automatically parsing the circuit topology of analog integrated circuits from images. The model comprises an improved electronic component detection network based on the Swin Transformer, an algorithm for component port localization, and a graph encoding model. The objective of the detection network is to accurately identify component positions and types, followed by automatic dataset generation through port localization, and finally, utilizing the graph encoding model to predict potential connections between circuit components. To validate the model's performance, we annotated an electronic component detection dataset and a circuit diagram dataset, comprising 1200 and 3552 training samples, respectively. Detailed experimentation results demonstrate the superiority of our proposed enhanced algorithm over comparative algorithms across custom and public datasets. Furthermore, our proposed port localization algorithm significantly accelerates the annotation speed of circuit diagram datasets.
DNA methylation takes on critical significance to the regulation of gene expression by affecting the stability of DNA and changing the structure of chromosomes. DNA methylation modification sites ...should be identified, which lays a solid basis for gaining more insights into their biological functions. Existing machine learning-based methods of predicting DNA methylation have not fully exploited the hidden multidimensional information in DNA gene sequences, such that the prediction accuracy of models is significantly limited. Besides, most models have been built in terms of a single methylation type. To address the above-mentioned issues, a deep learning-based method was proposed in this study for DNA methylation site prediction, termed the MEDCNN model. The MEDCNN model is capable of extracting feature information from gene sequences in three dimensions (i.e., positional information, biological information, and chemical information). Moreover, the proposed method employs a convolutional neural network model with double convolutional layers and double fully connected layers while iteratively updating the gradient descent algorithm using the cross-entropy loss function to increase the prediction accuracy of the model. Besides, the MEDCNN model can predict different types of DNA methylation sites. As indicated by the experimental results,the deep learning method based on coding from multiple dimensions outperformed single coding methods, and the MEDCNN model was highly applicable and outperformed existing models in predicting DNA methylation between different species. As revealed by the above-described findings, the MEDCNN model can be effective in predicting DNA methylation sites.
Single track is the basis for the melt pool modeling and physics work in laser powder bed fusion (LPBF). The melting state of a single track is closely related to defects such as porosity, lack of ...fusion, and balling, which have a significant impact on the mechanical properties of an LPBF-created part. To ensure the reliability of part quality and repeatability, process monitoring and feedback control are emerging to improve the melting states, which is becoming a hot topic in both the industrial and academic communities. In this research, a simple and low-cost off-axial photodiode signal monitoring system was established to monitor the melting pools of single tracks. Nine groups of single-track experiments with different process parameter combinations were carried out four times and then thirty-six LPBF tracks were obtained. The melting states were classified into three classes according to the morphologies of the tracks. A convolutional neural network (CNN) model was developed to extract the characteristics and identify the melting states. The raw one-dimensional photodiode signal data were converted into two-dimensional grayscale images. The average identification accuracy reached 95.81% and the computation time was 15 ms for each sample, which was promising for engineering applications. Compared with some classic deep learning models, the proposed CNN could distinguish the melting states with higher classification accuracy and efficiency. This work contributes to real-time multiple-sensor monitoring and feedback control.
Craniopharyngioma is a congenital brain tumor with clinical characteristics of hypothalamic-pituitary dysfunction, increased intracranial pressure, and visual field disorder, among other injuries. ...Its clinical diagnosis mainly depends on radiological examinations (such as Computed Tomography, Magnetic Resonance Imaging). However, assessing numerous radiological images manually is a challenging task, and the experience of doctors has a great influence on the diagnosis result. The development of artificial intelligence has brought about a great transformation in the clinical diagnosis of craniopharyngioma. This study reviewed the application of artificial intelligence technology in the clinical diagnosis of craniopharyngioma from the aspects of differential classification, prediction of tissue invasion and gene mutation, prognosis prediction, and so on. Based on the reviews, the technical route of intelligent diagnosis based on the traditional machine learning model and deep learning model were further proposed. Additionally, in terms of the limitations and possibilities of the development of artificial intelligence in craniopharyngioma diagnosis, this study discussed the attentions required in future research, including few-shot learning, imbalanced data set, semi-supervised models, and multi-omics fusion.
Satellite passive microwave (MW) remote sensing has a better ability to observe land surface temperature (LST) in cloudy conditions than thermal infrared (TIR) remote sensing. Due to the much greater ...thermal sampling depth (TSD) of MW, currently available MW LST do not represent the thermodynamic temperature of the land surface and, therefore, yield systematic differences from TIR LST. The TSD effect is particularly prominent over barren land and sparsely vegetated surfaces. Here, we present a novel TSD correction (TSDC) method to estimate the MW LST over barren land. The core of this method is a new formulation of the passive MW radiation balance equation, which allows linking MW effective physical temperature to the soil temperature at a specific depth. The TSDC method is applied to the 6.9-GHz channel of AMSR-E in northwestern China-western Mongolia and western Namibia (WN). Evaluation shows that LST estimated by the TSDC method agrees well with the MODIS LST. Validation based on in situ LSTs measured at the Gobabeb site in WN demonstrates the high accuracy of the TSDC method: it yields a root mean squared error of about 2-3 K and slight systematic error. In contrast, other methods without TSDC yield lower accuracies and significantly underestimate LST. Therefore, the TSDC method has the potential to generate MW LST with the same physical meaning and similar accuracy as TIR LST. This study provides implications for developing practical and accurate methods to estimate MW LST over other land surface types and at the global scale.
Enhancers, genomic DNA elements, regulate neighboring gene expression crucial for biological processes like cell differentiation and stress response. However, current machine learning methods for ...predicting DNA enhancers often underutilize hidden features in gene sequences, limiting model accuracy. Hence, this article proposes the PDCNN model, a deep learning-based enhancer prediction method. PDCNN extracts statistical nucleotide representations from gene sequences, discerning positional distribution information of nucleotides in modifier-like DNA sequences. With a convolutional neural network structure, PDCNN employs dual convolutional and fully connected layers. The cross-entropy loss function iteratively updates using a gradient descent algorithm, enhancing prediction accuracy. Model parameters are fine-tuned to select optimal combinations for training, achieving over 95% accuracy. Comparative analysis with traditional methods and existing models demonstrates PDCNN’s robust feature extraction capability. It outperforms advanced machine learning methods in identifying DNA enhancers, presenting an effective method with broad implications for genomics, biology, and medical research.
Display omitted
•A deep learning model PDCNN for identifying DNA enhancers•Improving model performance with position-aware encoders•Comparative studies have shown that PDCNN is superior to existing models
Genetics; Bioinformatics
Single-cell RNA sequencing is a state-of-the-art technology to understand gene expression in complex tissues. With the growing amount of data being generated, the standardization and automation of ...data analysis are critical to generating hypotheses and discovering biological insights.
Here, we present scRNASequest, a semi-automated single-cell RNA-seq (scRNA-seq) data analysis workflow which allows (1) preprocessing from raw UMI count data, (2) harmonization by one or multiple methods, (3) reference-dataset-based cell type label transfer and embedding projection, (4) multi-sample, multi-condition single-cell level differential gene expression analysis, and (5) seamless integration with cellxgene VIP for visualization and with CellDepot for data hosting and sharing by generating compatible h5ad files.
We developed scRNASequest, an end-to-end pipeline for single-cell RNA-seq data analysis, visualization, and publishing. The source code under MIT open-source license is provided at https://github.com/interactivereport/scRNASequest . We also prepared a bookdown tutorial for the installation and detailed usage of the pipeline: https://interactivereport.github.io/scRNAsequest/tutorial/docs/ . Users have the option to run it on a local computer with a Linux/Unix system including MacOS, or interact with SGE/Slurm schedulers on high-performance computing (HPC) clusters.
Osteoporosis is characterized by low bone mineral density (BMD). The advancement of high-throughput technologies and integrative approaches provided an opportunity for deciphering the mechanisms ...underlying osteoporosis. Here, we generated genomic, transcriptomic, methylomic, and metabolomic datasets from 119 subjects with high (n = 61) and low (n = 58) BMDs. By adopting sparse multiple discriminative canonical correlation analysis, we identified an optimal multi-omics biomarker panel with 74 differentially expressed genes (DEGs), 75 differentially methylated CpG sites (DMCs), and 23 differential metabolic products (DMPs). By linking genetic data, we identified 199 targeted BMD-associated expression/methylation/metabolite quantitative trait loci (eQTLs/meQTLs/metaQTLs). The reconstructed networks/pathways showed extensive biomarker interactions, and a substantial proportion of these biomarkers were enriched in RANK/RANKL, MAPK/TGF-β, and WNT/β-catenin pathways and G-protein-coupled receptor, GTP-binding/GTPase, telomere/mitochondrial activities that are essential for bone metabolism. Five biomarkers (FADS2, ADRA2A, FMN1, RABL2A, SPRY1) revealed causal effects on BMD variation. Our study provided an innovative framework and insights into the pathogenesis of osteoporosis.
Display omitted
•Multi-omics integration revealed 172 osteoporosis biomarkers with complex interaction•Genetic variants have multi-level effects on osteoporosis biomarkers•A substantial proportion of biomarkers enriched in bone-related pathways/activities•Several osteoporosis biomarkers have causal effects on the BMD variation
Disease; Genomics; Metabolomics; Transcriptomics