BACKGROUND: The mountain pine beetle, Dendroctonus ponderosae Hopkins, is the most serious insect pest of western North American pine forests. A recent outbreak destroyed more than 15 million ...hectares of pine forests, with major environmental effects on forest health, and economic effects on the forest industry. The outbreak has in part been driven by climate change, and will contribute to increased carbon emissions through decaying forests. RESULTS: We developed a genome sequence resource for the mountain pine beetle to better understand the unique aspects of this insect's biology. A draft de novo genome sequence was assembled from paired-end, short-read sequences from an individual field-collected male pupa, and scaffolded using mate-paired, short-read genomic sequences from pooled field-collected pupae, paired-end short-insert whole-transcriptome shotgun sequencing reads of mRNA from adult beetle tissues, and paired-end Sanger EST sequences from various life stages. We describe the cytochrome P450, glutathione S-transferase, and plant cell wall-degrading enzyme gene families important to the survival of the mountain pine beetle in its harsh and nutrient-poor host environment, and examine genome-wide single-nucleotide polymorphism variation. A horizontally transferred bacterial sucrose-6-phosphate hydrolase was evident in the genome, and its tissue-specific transcription suggests a functional role for this beetle. CONCLUSIONS: Despite Coleoptera being the largest insect order with over 400,000 described species, including many agricultural and forest pest species, this is only the second genome sequence reported in Coleoptera, and will provide an important resource for the Curculionoidea and other insects.
Insights into Conifer Giga-Genomes De La Torre, Amanda R.; Birol, Inanc; Bousquet, Jean ...
Plant physiology (Bethesda),
12/2014, Letnik:
166, Številka:
4
Journal Article
Recenzirano
Odprti dostop
Insights from sequenced genomes of major land plant lineages have advanced research in almost every aspect of plant biology. Until recently, however, assembled genome sequences of gymnosperms have ...been missing from this picture. Conifers of the pine family (Pinaceae) are a group of gymnosperms that dominate large parts of the world's forests. Despite their ecological and economic importance, conifers seemed long out of reach for complete genome sequencing, due in part to their enormous genome size (20-30 Gb) and the highly repetitive nature of their genomes. Technological advances in genome sequencing and assembly enabled the recent publication of three conifer genomes: white spruce (Picea glauca), Norway spruce (Picea abies), and loblolly pine (Pinus taeda). Tliese genome sequences revealed distinctive features compared with other plant genomes and may represent a window into the past of seed plant genomes. This Update highlights recent advances, remaining challenges, and opportunities in light of the publication of the first conifer and gymnosperm genomes.
In clinical oncology, many diagnostic tasks rely on the identification of cells in histopathology images. While supervised machine learning techniques necessitate the need for labels, providing ...manual cell annotations is time-consuming. In this paper, we propose a self-supervised framework (enVironment-aware cOntrastive cell represenTation learning: VOLTA) for cell representation learning in histopathology images using a technique that accounts for the cell's mutual relationship with its environment. We subject our model to extensive experiments on data collected from multiple institutions comprising over 800,000 cells and six cancer types. To showcase the potential of our proposed framework, we apply VOLTA to ovarian and endometrial cancers and demonstrate that our cell representations can be utilized to identify the known histotypes of ovarian cancer and provide insights that link histopathology and molecular subtypes of endometrial cancer. Unlike supervised models, we provide a framework that can empower discoveries without any annotation data, even in situations where sample sizes are limited.
JAGuaR is an alignment protocol for RNA-seq reads that uses an extended reference to increase alignment sensitivity. It uses BWA to align reads to the genome and reference transcript models ...(including annotated exon-exon junctions) specifically allowing for the possibility of a single read spanning multiple exons. Reads aligned to the transcript models are then re-mapped on to genomic coordinates, transforming alignments that span multiple exons into large-gapped alignments on the genome. While JAGuaR does not detect novel junctions, we demonstrate how JAGuaR generates fast and accurate transcriptome alignments, which allows for both sensitive and specific SNV calling.
Imprinting is a critical part of normal embryonic development in mammals, controlled by defined parent-of-origin (PofO) differentially methylated regions (DMRs) known as imprinting control regions. ...Direct nanopore sequencing of DNA provides a means to detect allelic methylation and to overcome the drawbacks of methylation array and short-read technologies. Here, we used publicly available nanopore sequencing data for 12 standard B-lymphocyte cell lines to acquire the genome-wide mapping of imprinted intervals in humans. Using the sequencing data, we were able to phase 95% of the human methylome and detect 94% of the previously well-characterized, imprinted DMRs. In addition, we found 42 novel imprinted DMRs (16 germline and 26 somatic), which were confirmed using whole-genome bisulfite sequencing (WGBS) data. Analysis of WGBS data in mouse (
Mus musculus
), rhesus monkey (
Macaca mulatta
), and chimpanzee (
Pan troglodytes
) suggested that 17 of these imprinted DMRs are conserved. Some of the novel imprinted intervals are within or close to imprinted genes without a known DMR. We also detected subtle parental methylation bias, spanning several kilobases at seven known imprinted clusters. At these blocks, hypermethylation occurs at the gene body of expressed allele(s) with mutually exclusive H3K36me3 and H3K27me3 allelic histone marks. These results expand upon our current knowledge of imprinting and the potential of nanopore sequencing to identify imprinting regions using only parent-offspring trios, as opposed to the large multi-generational pedigrees that have previously been required.
The role for routine whole genome and transcriptome analysis (WGTA) for poor prognosis pediatric cancers remains undetermined. Here, we characterize somatic mutations, structural rearrangements, copy ...number variants, gene expression, immuno-profiles and germline cancer predisposition variants in children and adolescents with relapsed, refractory or poor prognosis malignancies who underwent somatic WGTA and matched germline sequencing. Seventy-nine participants with a median age at enrollment of 8.8 y (range 6 months to 21.2 y) are included. Germline pathogenic/likely pathogenic variants are identified in 12% of participants, of which 60% were not known prior. Therapeutically actionable variants are identified by targeted gene report and whole genome in 32% and 62% of participants, respectively, and increase to 96% after integrating transcriptome analyses. Thirty-two molecularly informed therapies are pursued in 28 participants with 54% achieving a clinical benefit rate; objective response or stable disease ≥6 months. Integrated WGTA identifies therapeutically actionable variants in almost all tumors and are directly translatable to clinical care of children with poor prognosis cancers.
14-3-3 proteins are ubiquitously expressed regulators of various cellular functions, including proliferation, metabolism, and differentiation, and altered 14-3-3 expression is associated with ...development and progression of cancer. We report a transforming 14-3-3 oncoprotein, which we identified through conventional cytogenetics and whole-transcriptome sequencing analysis as a highly recurrent genetic mechanism in a clinically aggressive form of uterine sarcoma: high-grade endometrial stromal sarcoma (ESS). The 14-3-3 oncoprotein results from a t(10;17) genomic rearrangement, leading to fusion between 14-3-3ε (YWHAE) and either of two nearly identical FAM22 family members (FAM22A or FAM22B). Expression of YWHAE—FAM22 fusion oncoproteins was demonstrated by immunoblot in t (10;17)-bearing frozen tumor and cell line samples. YWHAE—FAM22 fusion gene knockdowns were performed with shRNAs and siRNAs targeting various FAM22A exons in an t(10;17)-bearing ESS cell line (ESS1): Fusion protein expression was inhibited, with corresponding reduction in cell growth and migration. YWHAE—FAM22 maintains a structurally and functionally intact 14-3-3ε (YWHAE) protein-binding domain, which is directed to the nucleus by a FAM22 nuclear localization sequence. In contrast to classic ESS, harboring JAZF1 genetic fusions, YWHAE—FAM22 ESS display high-grade histologic features, a distinct gene-expression profile, and a more aggressive clinical course. Fluorescence in situ hybridization analysis demonstrated absolute specificity of YWHAE—FAM22A/B genetic rearrangement for high-grade ESS, with no fusions detected in other uterine and nonuterine mesenchymal tumors (55 tumor types, n = 827). These discoveries reveal diagnostically and therapeutically relevant models for characterizing aberrant 14-3-3 oncogenic functions.
Ovarian carcinoma has the highest mortality of all female reproductive cancers and current treatment has become histotype-specific. Pathologists diagnose five common histotypes by microscopic ...examination, however, histotype determination is not straightforward, with only moderate interobserver agreement between general pathologists (Cohen's kappa 0.54-0.67). We hypothesized that machine learning (ML)-based image classification models may be able to recognize ovarian carcinoma histotype sufficiently well that they could aid pathologists in diagnosis. We trained four different artificial intelligence (AI) algorithms based on deep convolutional neural networks to automatically classify hematoxylin and eosin-stained whole slide images. Performance was assessed through cross-validation on the training set (948 slides corresponding to 485 patients), and on an independent test set of 60 patients from another institution. The best-performing model achieved a diagnostic concordance of 81.38% (Cohen's kappa of 0.7378) in our training set, and 80.97% concordance (Cohen's kappa 0.7547) on the external dataset. Eight cases misclassified by ML in the external set were reviewed by two subspecialty pathologists blinded to the diagnoses, molecular and immunophenotype data, and ML-based predictions. Interestingly, in 4 of 8 cases from the external dataset, the expert review pathologists rendered diagnoses, based on blind review of the whole section slides classified by AI, that were in agreement with AI rather than the integrated reference diagnosis. The performance characteristics of our classifiers indicate potential for improved diagnostic performance if used as an adjunct to conventional histopathology.
Pink salmon (Oncorhynchus gorbuscha) adults are the smallest of the five Pacific salmon native to the western Pacific Ocean. Pink salmon are also the most abundant of these species and account for a ...large proportion of the commercial value of the salmon fishery worldwide. A two-year life history of pink salmon generates temporally isolated populations that spawn either in even-years or odd-years. To uncover the influence of this genetic isolation, reference genome assemblies were generated for each year-class and whole genome re-sequencing data was collected from salmon of both year-classes. The salmon were sampled from six Canadian rivers and one Japanese river. At multiple centromeres we identified peaks of Fst between year-classes that were millions of base-pairs long. The largest Fst peak was also associated with a million base-pair chromosomal polymorphism found in the odd-year genome near a centromere. These Fst peaks may be the result of a centromere drive or a combination of reduced recombination and genetic drift, and they could influence speciation. Other regions of the genome influenced by odd-year and even-year temporal isolation and tentatively under selection were mostly associated with genes related to immune function, organ development/maintenance, and behaviour.
Follicular lymphoma (FL) and diffuse large B-cell lymphoma (DLBCL) are the two most common non-Hodgkin lymphomas (NHLs). Here we sequenced tumour and matched normal DNA from 13 DLBCL cases and one FL ...case to identify genes with mutations in B-cell NHL. We analysed RNA-seq data from these and another 113 NHLs to identify genes with candidate mutations, and then re-sequenced tumour and matched normal DNA from these cases to confirm 109 genes with multiple somatic mutations. Genes with roles in histone modification were frequent targets of somatic mutation. For example, 32% of DLBCL and 89% of FL cases had somatic mutations in MLL2, which encodes a histone methyltransferase, and 11.4% and 13.4% of DLBCL and FL cases, respectively, had mutations in MEF2B, a calcium-regulated gene that cooperates with CREBBP and EP300 in acetylating histones. Our analysis suggests a previously unappreciated disruption of chromatin biology in lymphomagenesis.