The 2019 novel coronavirus (renamed SARS-CoV-2, and generally referred to as the COVID-19 virus) has spread to 184 countries with over 1.5 million confirmed cases. Such major viral outbreaks demand ...early elucidation of taxonomic classification and origin of the virus genomic sequence, for strategic planning, containment, and treatment. This paper identifies an intrinsic COVID-19 virus genomic signature and uses it together with a machine learning-based alignment-free approach for an ultra-fast, scalable, and highly accurate classification of whole COVID-19 virus genomes. The proposed method combines supervised machine learning with digital signal processing (MLDSP) for genome analyses, augmented by a decision tree approach to the machine learning component, and a Spearman's rank correlation coefficient analysis for result validation. These tools are used to analyze a large dataset of over 5000 unique viral genomic sequences, totalling 61.8 million bp, including the 29 COVID-19 virus sequences available on January 27, 2020. Our results support a hypothesis of a bat origin and classify the COVID-19 virus as Sarbecovirus, within Betacoronavirus. Our method achieves 100% accurate classification of the COVID-19 virus sequences, and discovers the most relevant relationships among over 5000 viral genomes within a few minutes, ab initio, using raw DNA sequence data alone, and without any specialized biological knowledge, training, gene or genome annotations. This suggests that, for novel viral and pathogen genome sequences, this alignment-free whole-genome machine-learning approach can provide a reliable real-time option for taxonomic classification.
We present a novel Deep Learning method for the Unsupervised Clustering of DNA Sequences (DeLUCS) that does not require sequence alignment, sequence homology, or (taxonomic) identifiers. DeLUCS uses ...Frequency Chaos Game Representations (FCGR) of primary DNA sequences, and generates "mimic" sequence FCGRs to self-learn data patterns (genomic signatures) through the optimization of multiple neural networks. A majority voting scheme is then used to determine the final cluster assignment for each sequence. The clusters learned by DeLUCS match true taxonomic groups for large and diverse datasets, with accuracies ranging from 77% to 100%: 2,500 complete vertebrate mitochondrial genomes, at taxonomic levels from sub-phylum to genera; 3,200 randomly selected 400 kbp-long bacterial genome segments, into clusters corresponding to bacterial families; three viral genome and gene datasets, averaging 1,300 sequences each, into clusters corresponding to virus subtypes. DeLUCS significantly outperforms two classic clustering methods (K-means++ and Gaussian Mixture Models) for unlabelled data, by as much as 47%. DeLUCS is highly effective, it is able to cluster datasets of unlabelled primary DNA sequences totalling over 1 billion bp of data, and it bypasses common limitations to classification resulting from the lack of sequence homology, variation in sequence length, and the absence or instability of sequence annotations and taxonomic identifiers. Thus, DeLUCS offers fast and accurate DNA sequence clustering for previously intractable datasets.
The aim of this study was to investigate associations of yogurt and dairy consumption with energy, macronutrient, calcium, and vitamin D intakes, and associations with indicators of ...overweight/obesity in U.S. children in the National Health and Nutrition Examination Survey (NHANES 2005-2008). Using 24-hour recall data, children 8-18 years of age were classified to dairy consumption groups of <1, 1 to <2, or 2+ dairy servings, and yogurt consumers were those who reported eating yogurt during at least one of two dietary intake interviews. NHANES anthropometric measurements were used, and BMI and BMI-for-age percentiles were calculated. Yogurt and dairy consumption were associated with higher intakes of calcium, vitamin D and protein. Yogurt intake was associated with lower total fat and saturated fat intakes and body fat as measured by subscapular skinfold thickness. This study supports consumption of yogurt and higher amounts of dairy as eating patterns associated with greater intake of specific shortfall nutrients, and lower body fat in U.S. children.
Abstract
Summary
We present an interactive Deep Learning-based software tool for Unsupervised Clustering of DNA Sequences (iDeLUCS), that detects genomic signatures and uses them to cluster DNA ...sequences, without the need for sequence alignment or taxonomic identifiers. iDeLUCS is scalable and user-friendly: its graphical user interface, with support for hardware acceleration, allows the practitioner to fine-tune the different hyper-parameters involved in the training process without requiring extensive knowledge of deep learning. The performance of iDeLUCS was evaluated on a diverse set of datasets: several real genomic datasets from organisms in kingdoms Animalia, Protista, Fungi, Bacteria, and Archaea, three datasets of viral genomes, a dataset of simulated metagenomic reads from microbial genomes, and multiple datasets of synthetic DNA sequences. The performance of iDeLUCS was compared to that of two classical clustering algorithms (k-means++ and GMM) and two clustering algorithms specialized in DNA sequences (MeShClust v3.0 and DeLUCS), using both intrinsic cluster evaluation metrics and external evaluation metrics. In terms of unsupervised clustering accuracy, iDeLUCS outperforms the two classical algorithms by an average of ∼20%, and the two specialized algorithms by an average of ∼12%, on the datasets of real DNA sequences analyzed. Overall, our results indicate that iDeLUCS is a robust clustering method suitable for the clustering of large and diverse datasets of unlabeled DNA sequences.
Availability and implementation
iDeLUCS is available at https://github.com/Kari-Genomics-Lab/iDeLUCS under the terms of the MIT licence.
Although software tools abound for the comparison, analysis, identification, and classification of genomic sequences, taxonomic classification remains challenging due to the magnitude of the datasets ...and the intrinsic problems associated with classification. The need exists for an approach and software tool that addresses the limitations of existing alignment-based methods, as well as the challenges of recently proposed alignment-free methods.
We propose a novel combination of supervised Machine Learning with Digital Signal Processing, resulting in ML-DSP: an alignment-free software tool for ultrafast, accurate, and scalable genome classification at all taxonomic levels. We test ML-DSP by classifying 7396 full mitochondrial genomes at various taxonomic levels, from kingdom to genus, with an average classification accuracy of >97%. A quantitative comparison with state-of-the-art classification software tools is performed, on two small benchmark datasets and one large 4322 vertebrate mtDNA genomes dataset. Our results show that ML-DSP overwhelmingly outperforms the alignment-based software MEGA7 (alignment with MUSCLE or CLUSTALW) in terms of processing time, while having comparable classification accuracies for small datasets and superior accuracies for the large dataset. Compared with the alignment-free software FFP (Feature Frequency Profile), ML-DSP has significantly better classification accuracy, and is overall faster. We also provide preliminary experiments indicating the potential of ML-DSP to be used for other datasets, by classifying 4271 complete dengue virus genomes into subtypes with 100% accuracy, and 4,710 bacterial genomes into phyla with 95.5% accuracy. Lastly, our analysis shows that the "Purine/Pyrimidine", "Just-A" and "Real" numerical representations of DNA sequences outperform ten other such numerical representations used in the Digital Signal Processing literature for DNA classification purposes.
Due to its superior classification accuracy, speed, and scalability to large datasets, ML-DSP is highly relevant in the classification of newly discovered organisms, in distinguishing genomic signatures and identifying their mechanistic determinants, and in evaluating genome integrity.
Plant-Based Diets in CKD Clegg, Deborah J; Hill Gallant, Kathleen M
Clinical journal of the American Society of Nephrology,
01/2019, Letnik:
14, Številka:
1
Journal Article
Chronic kidney disease (CKD) affects approximately 10% of adults worldwide. Dysregulation of phosphorus homeostasis which occurs in CKD leads to development of CKD-Mineral Bone Disorder (CKD-MBD) and ...contributes to increased morbidity and mortality in these patients. Phosphorus is regulated by multiple hormones (parathyroid hormone (PTH), 1,25-dihyxdroxyvitamin D (1,25D), and fibroblast growth factor 23 (FGF23)) and tissues (kidney, intestine, parathyroid glands, and bone) to maintain homeostasis. In health, the kidneys are the major site of regulation for phosphorus homeostasis. However, as kidney function declines, the ability of the kidneys to adequately excrete phosphorus is reduced. The hormonal changes that occur with CKD would suggest that the intestine should compensate for impaired renal phosphorus excretion by reducing fractional intestinal phosphorus absorption. However, limited studies in CKD animal models and patients with CKD suggest that there may be a break in this homeostatic response where the intestine fails to compensate. As many existing therapies for phosphate management in CKD are aimed at reducing absolute intestinal phosphorus absorption, better understanding of the factors that influence fractional and absolute absorption, the mechanism by which intestinal phosphate absorption occurs, and how CKD modifies these is a much-needed area of study.
Osteoporosis-related bone fragility fractures are a major public health concern. Given the potential for adverse side effects of pharmacological treatment, many have sought alternative treatments, ...including dietary changes. Based on recent evidence that polyphenol-rich foods, like blueberries, increase calcium absorption and bone mineral density (BMD), we hypothesized that blueberry polyphenols would improve bone biomechanical properties. To test this, 5-month-old ovariectomized Sprague-Dawley rats (
n
= 10/gp) were orally gavaged for 90 days with either a purified extract of blueberry polyphenols (0–1000 mg total polyphenols/kg bw/day) or lyophilized blueberries (50 mg total polyphenols/kg bw/day). Upon completion of the dosing regimen, right femur, right tibia, and L1–L4 vertebrae were harvested and assessed for bone mineral density (BMD), with femurs being further analyzed for biomechanical properties via three-point bending. There were no differences in BMD at any of the sites analyzed. For bone mechanical properties, the only statistically significant difference was the high dose group having greater ultimate stress than the medium dose, although in the absence of differences in other measures of bone mechanical properties, we concluded that this result, while statistically significant, had little biological significance. Our results indicate that blueberry polyphenols had little impact on BMD or bone mechanical properties in an animal model of estrogen deficiency-induced bone loss.
Bone calcium balance is the net gain, loss, or equilibrium of calcium moving to and from bone, which reflects bone balance. There are currently no clinically available tools for measuring real-time ...bone balance. In this issue, Shroff et al. demonstrate the use of natural stable calcium isotope ratios as a novel biomarker of bone balance in children with chronic kidney disease on dialysis that is highly repeatable and associated with radiological and biochemical markers of bone metabolism.
Abstract
Summary
Machine Learning with Digital Signal Processing and Graphical User Interface (MLDSP-GUI) is an open-source, alignment-free, ultrafast, computationally lightweight, and standalone ...software tool with an interactive GUI for comparison and analysis of DNA sequences. MLDSP-GUI is a general-purpose tool that can be used for a variety of applications such as taxonomic classification, disease classification, virus subtype classification, evolutionary analyses, among others.
Availability and implementation
MLDSP-GUI is open-source, cross-platform compatible, and is available under the terms of the Creative Commons Attribution 4.0 International license (http://creativecommons.org/licenses/by/4.0/). The executable and dataset files are available at https://sourceforge.net/projects/mldsp-gui/.
Supplementary information
Supplementary data are available at Bioinformatics online.