Many genomes have been sequenced to high-quality draft status using Sanger capillary electrophoresis and/or newer short-read sequence data and whole genome assembly techniques. However, even the best ...draft genomes contain gaps and other imperfections due to limitations in the input data and the techniques used to build draft assemblies. Sequencing biases, repetitive genomic features, genomic polymorphism, and other complicating factors all come together to make some regions difficult or impossible to assemble. Traditionally, draft genomes were upgraded to "phase 3 finished" status using time-consuming and expensive Sanger-based manual finishing processes. For more facile assembly and automated finishing of draft genomes, we present here an automated approach to finishing using long-reads from the Pacific Biosciences RS (PacBio) platform. Our algorithm and associated software tool, PBJelly, (publicly available at https://sourceforge.net/projects/pb-jelly/) automates the finishing process using long sequence reads in a reference-guided assembly process. PBJelly also provides "lift-over" co-ordinate tables to easily port existing annotations to the upgraded assembly. Using PBJelly and long PacBio reads, we upgraded the draft genome sequences of a simulated Drosophila melanogaster, the version 2 draft Drosophila pseudoobscura, an assembly of the Assemblathon 2.0 budgerigar dataset, and a preliminary assembly of the Sooty mangabey. With 24× mapped coverage of PacBio long-reads, we addressed 99% of gaps and were able to close 69% and improve 12% of all gaps in D. pseudoobscura. With 4× mapped coverage of PacBio long-reads we saw reads address 63% of gaps in our budgerigar assembly, of which 32% were closed and 63% improved. With 6.8× mapped coverage of mangabey PacBio long-reads we addressed 97% of gaps and closed 66% of addressed gaps and improved 19%. The accuracy of gap closure was validated by comparison to Sanger sequencing on gaps from the original D. pseudoobscura draft assembly and shown to be dependent on initial reference quality.
A golden goat genome Worley, Kim C
Nature genetics,
04/2017, Letnik:
49, Številka:
4
Journal Article
Recenzirano
Odprti dostop
The newly described de novo goat genome sequence is the most contiguous diploid vertebrate assembly generated thus far using whole-genome assembly and scaffolding methods. The contiguity of this ...assembly is approaching that of the finished human and mouse genomes and suggests an affordable roadmap to high-quality references for thousands of species.
Our understanding of the evolutionary history of primates is undergoing continual revision due to ongoing genome sequencing efforts. Bolstered by growing fossil evidence, these data have led to ...increased acceptance of once controversial hypotheses regarding phylogenetic relationships, hybridization and introgression, and the biogeographical history of primate groups. Among these findings is a pattern of recent introgression between species within all major primate groups examined to date, though little is known about introgression deeper in time. To address this and other phylogenetic questions, here, we present new reference genome assemblies for 3 Old World monkey (OWM) species: Colobus angolensis ssp. palliatus (the black and white colobus), Macaca nemestrina (southern pig-tailed macaque), and Mandrillus leucophaeus (the drill). We combine these data with 23 additional primate genomes to estimate both the species tree and individual gene trees using thousands of loci. While our species tree is largely consistent with previous phylogenetic hypotheses, the gene trees reveal high levels of genealogical discordance associated with multiple primate radiations. We use strongly asymmetric patterns of gene tree discordance around specific branches to identify multiple instances of introgression between ancestral primate lineages. In addition, we exploit recent fossil evidence to perform fossil-calibrated molecular dating analyses across the tree. Taken together, our genome-wide data help to resolve multiple contentious sets of relationships among primates, while also providing insight into the biological processes and technical artifacts that led to the disagreements in the first place.
Chemosensory-related gene (CRG) families have been studied extensively in insects, but their evolutionary history across the Arthropoda had remained relatively unexplored. Here, we address current ...hypotheses and prior conclusions on CRG family evolution using a more comprehensive data set. In particular, odorant receptors were hypothesized to have proliferated during terrestrial colonization by insects (hexapods), but their association with other pancrustacean clades and with independent terrestrial colonizations in other arthropod subphyla have been unclear. We also examine hypotheses on which arthropod CRG family is most ancient. Thus, we reconstructed phylogenies of CRGs, including those from new arthropod genomes and transcriptomes, and mapped CRG gains and losses across arthropod lineages. Our analysis was strengthened by including crustaceans, especially copepods, which reside outside the hexapod/branchiopod clade within the subphylum Pancrustacea. We generated the first high-resolution genome sequence of the copepod Eurytemora affinis and annotated its CRGs. We found odorant receptors and odorant binding proteins present only in hexapods (insects) and absent from all other arthropod lineages, indicating that they are not universal adaptations to land. Gustatory receptors likely represent the oldest chemosensory receptors among CRGs, dating back to the Placozoa. We also clarified and confirmed the evolutionary history of antennal ionotropic receptors across the Arthropoda. All antennal ionotropic receptors in E. affinis were expressed more highly in males than in females, suggestive of an association with male mate-recognition behavior. This study is the most comprehensive comparative analysis to date of CRG family evolution across the largest and most speciose metazoan phylum Arthropoda.
Analysing population genomic data from killer whale ecotypes, which we estimate have globally radiated within less than 250,000 years, we show that genetic structuring including the segregation of ...potentially functional alleles is associated with socially inherited ecological niche. Reconstruction of ancestral demographic history revealed bottlenecks during founder events, likely promoting ecological divergence and genetic drift resulting in a wide range of genome-wide differentiation between pairs of allopatric and sympatric ecotypes. Functional enrichment analyses provided evidence for regional genomic divergence associated with habitat, dietary preferences and post-zygotic reproductive isolation. Our findings are consistent with expansion of small founder groups into novel niches by an initial plastic behavioural response, perpetuated by social learning imposing an altered natural selection regime. The study constitutes an important step towards an understanding of the complex interaction between demographic history, culture, ecological adaptation and evolution at the genomic level.
A comprehensive transcriptome analysis has been performed on protein-coding RNAs of Strongylocentrotus purpuratus, including 10 different embryonic stages, six feeding larval and metamorphosed ...juvenile stages, and six adult tissues. In this study, we pooled the transcriptomes from all of these sources and focused on the insights they provide for gene structure in the genome of this recently sequenced model system. The genome had initially been annotated by use of computational gene model prediction algorithms. A large fraction of these predicted genes were recovered in the transcriptome when the reads were mapped to the genome and appropriately filtered and analyzed. However, in a manually curated subset, we discovered that more than half the computational gene model predictions were imperfect, containing errors such as missing exons, prediction of nonexistent exons, erroneous intron/exon boundaries, fusion of adjacent genes, and prediction of multiple genes from single genes. The transcriptome data have been used to provide a systematic upgrade of the gene model predictions throughout the genome, very greatly improving the research usability of the genomic sequence. We have constructed new public databases that incorporate information from the transcriptome analyses. The transcript-based gene model data were used to define average structural parameters for S. purpuratus protein-coding genes. In addition, we constructed a custom sea urchin gene ontology, and assigned about 7000 different annotated transcripts to 24 functional classes. Strong correlations became evident between given functional ontology classes and structural properties, including gene size, exon number, and exon and intron size.
The value of new genome references Worley, Kim C.; Richards, Stephen; Rogers, Jeffrey
Experimental cell research,
09/2017, Letnik:
358, Številka:
2
Journal Article
Recenzirano
Odprti dostop
Genomic information has become a ubiquitous and almost essential aspect of biological research. Over the last 10–15 years, the cost of generating sequence data from DNA or RNA samples has ...dramatically declined and our ability to interpret those data increased just as remarkably. Although it is still possible for biologists to conduct interesting and valuable research on species for which genomic data are not available, the impact of having access to a high quality whole genome reference assembly for a given species is nothing short of transformational. Research on a species for which we have no DNA or RNA sequence data is restricted in fundamental ways. In contrast, even access to an initial draft quality genome (see below for definitions) opens a wide range of opportunities that are simply not available without that reference genome assembly. Although a complete discussion of the impact of genome sequencing and assembly is beyond the scope of this short paper, the goal of this review is to summarize the most common and highest impact contributions that whole genome sequencing and assembly has had on comparative and evolutionary biology.
Domestication fundamentally reshaped animal morphology, physiology and behaviour, offering the opportunity to investigate the molecular processes driving evolutionary change. Here we assess sheep ...domestication and artificial selection by comparing genome sequence from 43 modern breeds (Ovis aries) and their Asian mouflon ancestor (O. orientalis) to identify selection sweeps. Next, we provide a comparative functional annotation of the sheep genome, validated using experimental ChIP-Seq of sheep tissue. Using these annotations, we evaluate the impact of selection and domestication on regulatory sequences and find that sweeps are significantly enriched for protein coding genes, proximal regulatory elements of genes and genome features associated with active transcription. Finally, we find individual sites displaying strong allele frequency divergence are enriched for the same regulatory features. Our data demonstrate that remodelling of gene expression is likely to have been one of the evolutionary forces that drove phenotypic diversification of this common livestock species.
Human chromosome 19 has many unique characteristics including gene density more than double the genome-wide average and 20 large tandemly clustered gene families. It also has the highest GC content ...of any chromosome, especially outside gene clusters. The high GC content and concomitant high content of hypermutable CpG sites raises the possibility chromosome 19 exhibits higher levels of nucleotide diversity both within and between species, and may possess greater variation in DNA methylation that regulates gene expression.
We examined GC and CpG content of chromosome 19 orthologs across representatives of the primate order. In all 12 primate species with suitable genome assemblies, chromosome 19 orthologs have the highest GC content of any chromosome. CpG dinucleotides and CpG islands are also more prevalent in chromosome 19 orthologs than other chromosomes. GC and CpG content are generally higher outside the gene clusters. Intra-species variation based on SNPs in human common dbSNP, rhesus, crab eating macaque, baboon and marmoset datasets is most prevalent on chromosome 19 and its orthologs. Inter-species comparisons based on phyloP conservation show accelerated nucleotide evolution for chromosome 19 promoter flanking and enhancer regions. These same regulatory regions show the highest CpG density of any chromosome suggesting they possess considerable methylome regulatory potential.
The pattern of high GC and CpG content in chromosome 19 orthologs, particularly outside gene clusters, is present from human to mouse lemur representing 74 million years of primate evolution. Much CpG variation exists both within and between primate species with a portion of this variation occurring in regulatory regions.
Studies of Y Chromosome evolution have focused primarily on gene decay, a consequence of suppression of crossing-over with the X Chromosome. Here, we provide evidence that suppression of X-Y ...crossing-over unleashed a second dynamic: selfish X-Y arms races that reshaped the sex chromosomes in mammals as different as cattle, mice, and men. Using super-resolution sequencing, we explore the Y Chromosome of
(bull) and find it to be dominated by massive, lineage-specific amplification of testis-expressed gene families, making it the most gene-dense Y Chromosome sequenced to date. As in mice, an X-linked homolog of a bull Y-amplified gene has become testis-specific and amplified. This evolutionary convergence implies that lineage-specific X-Y coevolution through gene amplification, and the selfish forces underlying this phenomenon, were dominatingly powerful among diverse mammalian lineages. Together with Y gene decay, X-Y arms races molded mammalian sex chromosomes and influenced the course of mammalian evolution.