Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent ...availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have been intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now or will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity of well-studied fungi call for gene-prediction tools tailored to them.
SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the generation of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The pipeline has been developed and streamlined by comparing its predictions to manually curated gene models in three fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl predicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running the HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best homology to known proteins and best agreement with the RNA-Seq data.
SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and novel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is freely available from http://sourceforge.net/projects/snowyowl/.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in ...research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results.
To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts.
By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
We describe a comprehensive genomic characterization of adrenocortical carcinoma (ACC). Using this dataset, we expand the catalogue of known ACC driver genes to include PRKAR1A, RPL22, TERF2, CCNE1, ...and NF1. Genome wide DNA copy-number analysis revealed frequent occurrence of massive DNA loss followed by whole-genome doubling (WGD), which was associated with aggressive clinical course, suggesting WGD is a hallmark of disease progression. Corroborating this hypothesis were increased TERT expression, decreased telomere length, and activation of cell-cycle programs. Integrated subtype analysis identified three ACC subtypes with distinct clinical outcome and molecular alterations which could be captured by a 68-CpG probe DNA-methylation signature, proposing a strategy for clinical stratification of patients based on molecular markers.
Display omitted
•Standardized molecular data from 91 cases of adrenocortical carcinoma•Driver genes including TP53, ZNFR3, CTNNB1, PRKAR1A, CCNE1, and TERF2•Whole-genome doubling event is a marker for ACC progression•Three prognostic molecular subtypes captured by a DNA-methylation signature
Zheng et al. perform comprehensive genomic characterization of 91 cases of adrenocortical carcinoma (ACC). This analysis expands the list of driver genes in ACC, reveals whole-genome doubling as a hallmark of ACC progression, and identifies three ACC subtypes with distinct clinical outcome.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Diverse epidemiological factors are associated with hepatocellular carcinoma (HCC) prevalence in different populations. However, the global landscape of the genetic changes in HCC genomes ...underpinning different epidemiological and ancestral backgrounds still remains uncharted. Here a collection of data from 503 liver cancer genomes from different populations uncovered 30 candidate driver genes and 11 core pathway modules. Furthermore, a collaboration of two large-scale cancer genome projects comparatively analyzed the trans-ancestry substitution signatures in 608 liver cancer cases and identified unique mutational signatures that predominantly contribute to Asian cases. This work elucidates previously unexplored ancestry-associated mutational processes in HCC development. A combination of hotspot TERT promoter mutation, TERT focal amplification and viral genome integration occurs in more than 68% of cases, implicating TERT as a central and ancestry-independent node of hepatocarcinogenesis. Newly identified alterations in genes encoding metabolic enzymes, chromatin remodelers and a high proportion of mTOR pathway activations offer potential therapeutic and diagnostic opportunities.
Full text
Available for:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
Altered gut microbiota community dynamics are implicated in diverse human diseases including inflammatory disorders such as neuro-Behçet's disease (NBD) and multiple sclerosis (MS). Traditionally, ...microbiota communities are analysed uniformly across control and disease groups, but recent reports of subsample clustering indicate a potential need for analytical stratification. The objectives of this study are to analyse and compare faecal microbiota community signatures of ethno-geographical, age and gender matched adult healthy controls (HC), MS and NBD individuals.
Faecal microbiota community compositions in adult HC (n=14), NBD patients (n=13) and MS (n=13) were analysed by 16S rRNA gene sequencing and standard bioinformatics pipelines. Bipartite networks were then used to identify and re-analyse dominant compositional clusters in respective groups.
We identified Prevotella and Bacteroides dominated subsample clusters in HC, MS, and NBD cohorts. Our study confirmed previous reports that Prevotella is a major dysbiotic target in these diseases. We demonstrate that subsample stratification is required to identify significant disease-associated microbiota community shifts with increased Clostridiales evident in Prevotella-stratified NBD and Bacteroides-stratified MS patients.
Patient cohort stratification may be needed to facilitate identification of common microbiota community shifts for causation testing in disease.
Accurate diagnosis and stratification of children with irritable bowel syndrome (IBS) remain challenging. Given the central role of recurrent abdominal pain in IBS, we evaluated the relationships of ...pediatric IBS and abdominal pain with intestinal microbes and fecal metabolites using a comprehensive clinical characterization and multiomics strategy. Using rigorous clinical phenotyping, we identified preadolescent children (aged 7 to 12 years) with Rome III IBS (n = 23) and healthy controls (n = 22) and characterized their fecal microbial communities using whole-genome shotgun metagenomics and global unbiased fecal metabolomic profiling. Correlation-based approaches and machine learning algorithms identified associations between microbes, metabolites, and abdominal pain. IBS cases differed from controls with respect to key bacterial taxa (eg, Flavonifractor plautii and Lachnospiraceae bacterium 7_1_58FAA), metagenomic functions (eg, carbohydrate metabolism and amino acid metabolism), and higher-order metabolites (eg, secondary bile acids, sterols, and steroid-like compounds). Significant associations between abdominal pain frequency and severity and intestinal microbial features were identified. A random forest classifier built on metagenomic and metabolic markers successfully distinguished IBS cases from controls (area under the curve, 0.93). Leveraging multiple lines of evidence, intestinal microbes, genes/pathways, and metabolites were associated with IBS, and these features were capable of distinguishing children with IBS from healthy children. These multi-omics features, and their links to childhood IBS coupled with nutritional interventions, may lead to new microbiome-guided diagnostic and therapeutic strategies.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Rhesus macaques (Macaca mulatta) are the most widely used nonhuman primate in biomedical research, have the largest natural geographic distribution of any nonhuman primate, and have been the focus of ...much evolutionary and behavioral investigation. Consequently, rhesus macaques are one of the most thoroughly studied nonhuman primate species. However, little is known about genome-wide genetic variation in this species. A detailed understanding of extant genomic variation among rhesus macaques has implications for the use of this species as a model for studies of human health and disease, as well as for evolutionary population genomics. Whole-genome sequencing analysis of 133 rhesus macaques revealed more than 43.7 million single-nucleotide variants, including thousands predicted to alter protein sequences, transcript splicing, and transcription factor binding sites. Rhesus macaques exhibit 2.5-fold higher overall nucleotide diversity and slightly elevated putative functional variation compared with humans. This functional variation in macaques provides opportunities for analyses of coding and noncoding variation, and its cellular consequences. Despite modestly higher levels of nonsynonymous variation in the macaques, the estimated distribution of fitness effects and the ratio of nonsynonymous to synonymous variants suggest that purifying selection has had stronger effects in rhesus macaques than in humans. Demographic reconstructions indicate this species has experienced a consistently large but fluctuating population size. Overall, the results presented here provide new insights into the population genomics of nonhuman primates and expand genomic information directly relevant to primate models of human disease.
Abstract only
e15549
Background: Liquid biopsy is a candidate for detection of minimal residual disease (MRD) in early colorectal cancer. Detection of circulating tumor (ct)DNA in early stage cancer ...is challenging given the low abundance of ctDNA molecules. We developed a tissue agnostic MRD Test to optimize sensitivity and specificity of ctDNA detection by integrating tumor-specific genomic mutation and DNA methylation signatures coupled to noise-reduction of non-tumor background cfDNA signals. Methods: This MRD Test was developed following CLIA, Nex-StoCT Working Group, and AMP/CAP guidance and validation principles. Using a single input plasma sample, data from somatic and epigenomic targeted capture panels are integrated to generate a final test result: ctDNA detected or ctDNA not detected. ctDNA detected is defined by the presence of mutation or epigenomic classifier inputs exceeding a defined threshold. Quality control (QC) measures were implemented in both process and final sequencing results. Analytical specificity, sensitivity, and accuracy were determined using 130 samples. Results: The MRD Test specificity was 95% (70/74) using colonoscopy screened negative samples.A retrospective review of the clinical histories of these false positive samples showed high risk clinical features such as smoking, family history of CRC, and sub-optimal bowel prep prior to the colonoscopy. Analytical sensitivity (LoD) was established by diluting late stage CRC samples at two clinically relevant cfDNA inputs at three dilutions (30ng or 10ng, ranging 0.1% to 0.75% tumor fraction). The 95% LoD was determined to be 0.3% for 10ng and < 0.1% for 30ng, the lowest level tested for each input. Accuracy was determined by diluting advanced CRC samples to 0.5-0.75% MAF tested at 30ng or 10ng cfDNA with a detection result of 100% (38/38). Conclusions: We present a CLIA validated assay with performance characteristics sufficient to detect residual disease in Stage II/III CRC post-curative intent therapy. This assay is currently being used in multiple interventional clinical trials.
Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these ...structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods.
We demonstrate Parliament's efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus.
HS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, the HS1011 data constitute a public resource for novel SV discovery, software calibration, and personal genome structural variation analysis.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK