Altered structural brain asymmetry in autism spectrum disorder (ASD) has been reported. However, findings have been inconsistent, likely due to limited sample sizes. Here we investigated 1,774 ...individuals with ASD and 1,809 controls, from 54 independent data sets of the ENIGMA consortium. ASD was significantly associated with alterations of cortical thickness asymmetry in mostly medial frontal, orbitofrontal, cingulate and inferior temporal areas, and also with asymmetry of orbitofrontal surface area. These differences generally involved reduced asymmetry in individuals with ASD compared to controls. Furthermore, putamen volume asymmetry was significantly increased in ASD. The largest case-control effect size was Cohen's d = -0.13, for asymmetry of superior frontal cortical thickness. Most effects did not depend on age, sex, IQ, severity or medication use. Altered lateralized neurodevelopment may therefore be a feature of ASD, affecting widespread brain regions with diverse functions. Large-scale analysis was necessary to quantify subtle alterations of brain structural asymmetry in ASD.
Huge amount of metagenomic sequence data have been produced as a result of the rapidly increasing efforts worldwide in studying microbial communities as a whole. Most, if not all, sequenced ...metagenomes are complex mixtures of chromosomal and plasmid sequence fragments from multiple organisms, possibly from different kingdoms. Computational methods for prediction of genomic elements such as genes are significantly different for chromosomes and plasmids, hence raising the need for separation of chromosomal from plasmid sequences in a metagenome. We present a program for classification of a metagenome set into chromosomal and plasmid sequences, based on their distinguishing pentamer frequencies. On a large training set consisting of all the sequenced prokaryotic chromosomes and plasmids, the program achieves ∼92% in classification accuracy. On a large set of simulated metagenomes with sequence lengths ranging from 300 bp to 100 kbp, the program has classification accuracy from 64.45% to 88.75%. On a large independent test set, the program achieves 88.29% classification accuracy. Availability: The program has been implemented as a standalone prediction program, cBar, which is available at http://csbl.bmb.uga.edu/∼ffzhou/cBar Contact: xyn@bmb.uga.edu Supplementary information:Supplementary data are available at Bioinformatics online.
The novel coronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused a major pandemic outbreak recently. Various diagnostic technologies have been under active development. ...The novel coronavirus disease (COVID-19) may induce pulmonary failures, and chest X-ray imaging becomes one of the major confirmed diagnostic technologies. The very limited number of publicly available samples has rendered the training of the deep neural networks unstable and inaccurate. This study proposed a two-step transfer learning pipeline and a deep residual network framework COVID19XrayNet for the COVID-19 detection problem based on chest X-ray images. COVID19XrayNet firstly tunes the transferred model on a large dataset of chest X-ray images, which is further tuned using a small dataset of annotated chest X-ray images. The final model achieved 0.9108 accuracy. The experimental data also suggested that the model may be improved with more training samples being released.
Graphic abstract
COVID19XrayNet, a two-step transfer learning framework designed for biomedical images.
Finding non-standard or new metabolic pathways has important applications in metabolic engineering, synthetic biology and the analysis and reconstruction of metabolic networks. Branched metabolic ...pathways dominate in metabolic networks and depict a more comprehensive picture of metabolism compared to linear pathways. Although progress has been developed to find branched metabolic pathways, few efforts have been made in identifying branched metabolic pathways via atom group tracking. In this paper, we present a pathfinding method called BPFinder for finding branched metabolic pathways by atom group tracking, which aims to guide the synthetic design of metabolic pathways. BPFinder enumerates linear metabolic pathways by tracking the movements of atom groups in metabolic network and merges the linear atom group conserving pathways into branched pathways. Two merging rules based on the structure of conserved atom groups are proposed to accurately merge the branched compounds of linear pathways to identify branched pathways. Furthermore, the integrated information of compound similarity, thermodynamic feasibility and conserved atom groups is also used to rank the pathfinding results for feasible branched pathways. Experimental results show that BPFinder is more capable of recovering known branched metabolic pathways as compared to other existing methods, and is able to return biologically relevant branched pathways and discover alternative branched pathways of biochemical interest. The online server of BPFinder is available at http://114.215.129.245:8080/atomic/. The program, source code and data can be downloaded from https://github.com/hyr0771/BPFinder.
Each genome has a stable distribution of the combined frequency for each k-mer and its reverse complement measured in sequence fragments as short as 1000 bps across the whole genome, for 1<k<6. The ...collection of these k-mer frequency distributions is unique to each genome and termed the genome's barcode.
We found that for each genome, the majority of its short sequence fragments have highly similar barcodes while sequence fragments with different barcodes typically correspond to genes that are horizontally transferred or highly expressed. This observation has led to new and more effective ways for addressing two challenging problems: metagenome binning problem and identification of horizontally transferred genes. Our barcode-based metagenome binning algorithm substantially improves the state of the art in terms of both binning accuracies and the scope of applicability. Other attractive properties of genomes barcodes include (a) the barcodes have different and identifiable characteristics for different classes of genomes like prokaryotes, eukaryotes, mitochondria and plastids, and (b) barcodes similarities are generally proportional to the genomes' phylogenetic closeness.
These and other properties of genomes barcodes make them a new and effective tool for studying numerous genome and metagenome analysis problems.
With recent advances in single-cell RNA sequencing, enormous transcriptome datasets have been generated. These datasets have furthered our understanding of cellular heterogeneity and its underlying ...mechanisms in homogeneous populations. Single-cell RNA sequencing (scRNA-seq) data clustering can group cells belonging to the same cell type based on patterns embedded in gene expression. However, scRNA-seq data are high-dimensional, noisy, and sparse, owing to the limitation of existing scRNA-seq technologies. Traditional clustering methods are not effective and efficient for high-dimensional and sparse matrix computations. Therefore, several dimension reduction methods have been introduced. To validate a reliable and standard research routine, we conducted a comprehensive review and evaluation of four classical dimension reduction methods and five clustering models. Four experiments were progressively performed on two large scRNA-seq datasets using 20 models. Results showed that the feature selection method contributed positively to high-dimensional and sparse scRNA-seq data. Moreover, feature-extraction methods were able to promote clustering performance, although this was not eternally immutable. Independent component analysis (ICA) performed well in those small compressed feature spaces, whereas principal component analysis was steadier than all the other feature-extraction methods. In addition, ICA was not ideal for fuzzy C-means clustering in scRNA-seq data analysis. K-means clustering was combined with feature-extraction methods to achieve good results.
Measuring conditional relatedness between a pair of genes is a fundamental technique and still a significant challenge in computational biology. Such relatedness can be assessed by gene expression ...similarities while suffering high false discovery rates. Meanwhile, other types of features, e.g., prior-knowledge based similarities, is only viable for measuring global relatedness. In this paper, we propose a novel machine learning model, named Multi-Features Relatedness (MFR), for accurately measuring conditional relatedness between a pair of genes by incorporating expression similarities with prior-knowledge based similarities in an assessment criterion. MFR is used to predict gene-gene interactions extracted from the COXPRESdb, KEGG, HPRD, and TRRUST databases by the 10-fold cross validation and test verification, and to identify gene-gene interactions collected from the GeneFriends and DIP databases for further verification. The results show that MFR achieves the highest area under curve (AUC) values for identifying gene-gene interactions in the development, test, and DIP datasets. Specifically, it obtains an improvement of 1.1% on average of precision for detecting gene pairs with both high expression similarities and high prior-knowledge based similarities in all datasets, comparing to other linear models and coexpression analysis methods. Regarding cancer gene networks construction and gene function prediction, MFR also obtains the results with more biological significances and higher average prediction accuracy, than other compared models and methods. A website of the MFR model and relevant datasets can be accessed from http://bmbl.sdstate.edu/MFR .
(1) Background: Obesity and diabetes continue to reach epidemic levels in the population with major health impacts that include a significantly increased risk of coronary atherosclerosis. The ...imbalance of trace elements in the body caused by nutritional factors can lead to the progression of coronary atherosclerosis. (2) Methods: We measured the concentrations of sodium (Na), potassium (K), magnesium (Mg), calcium (Ca), Zinc (Zn), and iron (Fe) in peripheral blood samples from 4243 patients and performed baseline analysis and propensity matching of the patient datasets. The patients were grouped into acute myocardial infarction (AMI, 702 patients) and stable coronary heart disease (SCAD1, 253 patients) groups. Both of these groups were included in the AS that had a total of 1955 patients. The control group consisted of 2288 patients. The plasma concentrations of calcium, magnesium, and iron were measured using a colorimetric method. For comparison, 15 external quality assessment (EQA) samples were selected from the Clinical Laboratory Center of the Ministry of Health of China. SPSS software was used for statistical analysis. The average values and deviations of all of the indicators in each group were calculated, and a
-value threshold of <0.05 was used to indicate statistical significance. (3) Results: The iron ion concentrations of the acute myocardial infarction (AMI) group were significantly lower than the control group (
< 0.05, AUC = 0.724, AUC = 0.702), irrespective of tendency matching. Compared to the data from the stable coronary artery disease (SCAD) group, the concentration of iron ions in the acute myocardial infarction group was significantly lower (
< 0.05, AUC = 0.710, AUC = 0.682). Furthermore, the iron ion concentrations in the (AMI + SCAD) group were significantly lower (
< 0.05) than in the control group. (4) Conclusions: The data presented in this study strongly indicate that the concentration of iron ions in the peripheral blood is related to coronary atherosclerosis. Decreases in the levels of iron ions in the peripheral blood can be used as a predictive biomarker of coronary atherosclerosis.
The recent outbreak of the coronavirus disease-2019 (COVID-19) caused serious challenges to the human society in China and across the world. COVID-19 induced pneumonia in human hosts and carried a ...highly inter-person contagiousness. The COVID-19 patients may carry severe symptoms, and some of them may even die of major organ failures. This study utilized the machine learning algorithms to build the COVID-19 severeness detection model. Support vector machine (SVM) demonstrated a promising detection accuracy after 32 features were detected to be significantly associated with the COVID-19 severeness. These 32 features were further screened for inter-feature redundancies. The final SVM model was trained using 28 features and achieved the overall accuracy 0.8148. This work may facilitate the risk estimation of whether the COVID-19 patients would develop the severe symptoms. The 28 COVID-19 severeness associated biomarkers may also be investigated for their underlining mechanisms how they were involved in the COVID-19 infections.
Pulmonary hypertension (PH) is a common disease that affects the normal functioning of the human pulmonary arteries. The peripheral blood mononuclear cells (PMBCs) served as an ideal source for a ...minimally invasive disease diagnosis. This study hypothesized that the transcriptional fluctuations in the PMBCs exposed to the PH arteries may stably reflect the disease. However, the dimension of a human transcriptome is much higher than the number of samples in all the existing datasets. So, an ensemble feature selection algorithm, EnRank, was proposed to integrate the ranking information of four popular feature selection algorithms, i.e., T-test (Ttest), Chi-squared test (Chi2), ridge regression (Ridge), and Least Absolute Shrinkage and Selection Operator (Lasso). Our results suggested that the EnRank-detected biomarkers provided useful information from these four feature selection algorithms and achieved very good prediction accuracy in predicting the PH patients. Many of the EnRank-detected biomarkers were also supported by the literature.