Transcriptome sequencing and assembly represent a great resource for the study of non-model species, and many metrics have been used to evaluate and compare these assemblies. Unfortunately, it is ...still unclear which of these metrics accurately reflect assembly quality.
We simulated sequencing transcripts of Drosophila melanogaster. By assembling these simulated reads using both a "perfect" and a modern transcriptome assembler while varying read length and sequencing depth, we evaluated quality metrics to determine whether they 1) revealed perfect assemblies to be of higher quality, and 2) revealed perfect assemblies to be more complete as data quantity increased.Several commonly used metrics were not consistent with these expectations, including average contig coverage and length, though they became consistent when singletons were included in the analysis. We found several annotation-based metrics to be consistent and informative, including contig reciprocal best hit count and contig unique annotation count. Finally, we evaluated a number of novel metrics such as reverse annotation count, contig collapse factor, and the ortholog hit ratio, discovering that each assess assembly quality in unique ways.
Although much attention has been given to transcriptome assembly, little research has focused on determining how best to evaluate assemblies, particularly in light of the variety of options available for read length and sequencing depth. Our results provide an important review of these metrics and give researchers tools to produce the highest quality transcriptome assemblies.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Introgressive hybridization is now recognized as a widespread phenomenon, but its role in evolution remains contested. Here, we use newly available reference genome assemblies to investigate ...phylogenetic relationships and introgression in a medically important group of Afrotropical mosquito sibling species. We have identified the correct species branching order to resolve a contentious phylogeny and show that lineages leading to the principal vectors of human malaria were among the first to split. Pervasive autosomal introgression between these malaria vectors means that only a small fraction of the genome, mainly on the X chromosome, has not crossed species boundaries. Our results suggest that traits enhancing vectorial capacity may be gained through interspecific gene flow, including between nonsister species. Mosquito adaptability across genomesVirtually everyone has first-hand experience with mosquitoes. Few recognize the subtle biological distinctions among these bloodsucking flies that render some bites mere nuisances and others the initiation of a potentially life-threatening infection. By sequencing the genomes of several mosquitoes in depth, Neafsey et al. and Fontaine et al. reveal clues that explain the mystery of why only some species of one genus of mosquitoes are capable of transmitting human malaria (see the Perspective by Clark and Messer).Science, this issue 10.1126/science.1258524 and 10.1126/science.1258522; see also p. 27
Chromosomal inversions can lead to reproductive isolation and adaptation in insects such as Drosophila melanogaster and the non-model malaria vector Anopheles gambiae. Inversions can be detected and ...characterized using principal component analysis (PCA) of single nucleotide polymorphisms (SNPs). To aid in developing such methods, we formed a new benchmark derived from three publicly-available insect data. We then used this benchmark to perform an extended validation of our software for inversion analysis (Asaph). Through that process, we identified and characterized several problematic test cases liable to misinterpretation that can help guide PCA-based inversion detection. Lastly, we re-analyzed the 2R chromosome arm of 150 An. gambiae and coluzzii samples and observed two inversions (2Rc and 2Rd) that were previously known but not annotated in these particular individuals. The resulting benchmark data set and methods will be useful for future inversion detection based solely on SNP data.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Improved computational modeling of protein translation rates, including better prediction of where translational slowdowns along an mRNA sequence may occur, is critical for understanding ...co-translational folding. Because codons within a synonymous codon group are translated at different rates, many computational translation models rely on analyzing synonymous codons. Some models rely on genome-wide codon usage bias (CUB), believing that globally rare and common codons are the most informative of slow and fast translation, respectively. Others use the CUB observed only in highly expressed genes, which should be under selective pressure to be translated efficiently (and whose CUB may therefore be more indicative of translation rates). No prior work has analyzed these models for their ability to predict translational slowdowns. Here, we evaluate five models for their association with slowly translated positions as denoted by two independent ribosome footprint (RFP) count experiments from S. cerevisiae, because RFP data is often considered as a "ground truth" for translation rates across mRNA sequences. We show that all five considered models strongly associate with the RFP data and therefore have potential for estimating translational slowdowns. However, we also show that there is a weak correlation between RFP counts for the same genes originating from independent experiments, even when their experimental conditions are similar. This raises concerns about the efficacy of using current RFP experimental data for estimating translation rates and highlights a potential advantage of using computational models to understand translation rates instead.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Several recent studies have demonstrated the use of Roche 454 sequencing technology for de novo transcriptome analysis. Low error rates and high coverage also allow for effective SNP discovery and ...genetic diversity estimates. However, genetically diverse datasets, such as those sourced from natural populations, pose challenges for assembly programs and subsequent analysis. Further, estimating the effectiveness of transcript discovery using Roche 454 transcriptome data is still a difficult task.
Using the Roche 454 FLX Titanium platform, we sequenced and assembled larval transcriptomes for two butterfly species: the Propertius duskywing, Erynnis propertius (Lepidoptera: Hesperiidae) and the Anise swallowtail, Papilio zelicaon (Lepidoptera: Papilionidae). The Expressed Sequence Tags (ESTs) generated represent a diverse sample drawn from multiple populations, developmental stages, and stress treatments. Despite this diversity, > 95% of the ESTs assembled into long (> 714 bp on average) and highly covered (> 9.6x on average) contigs. To estimate the effectiveness of transcript discovery, we compared the number of bases in the hit region of unigenes (contigs and singletons) to the length of the best match silkworm (Bombyx mori) protein--this "ortholog hit ratio" gives a close estimate on the amount of the transcript discovered relative to a model lepidopteran genome. For each species, we tested two assembly programs and two parameter sets; although CAP3 is commonly used for such data, the assemblies produced by Celera Assembler with modified parameters were chosen over those produced by CAP3 based on contig and singleton counts as well as ortholog hit ratio analysis. In the final assemblies, 1,413 E. propertius and 1,940 P. zelicaon unigenes had a ratio > 0.8; 2,866 E. propertius and 4,015 P. zelicaon unigenes had a ratio > 0.5.
Ultimately, these assemblies and SNP data will be used to generate microarrays for ecoinformatics examining climate change tolerance of different natural populations. These studies will benefit from high quality assemblies with few singletons (less than 26% of bases for each assembled transcriptome are present in unassembled singleton ESTs) and effective transcript discovery (over 6,500 of our putative orthologs cover at least 50% of the corresponding model silkworm gene).
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
VectorBase is a National Institute of Allergy and Infectious Diseases supported Bioinformatics Resource Center (BRC) for invertebrate vectors of human pathogens. Now in its 11th year, VectorBase ...currently hosts the genomes of 35 organisms including a number of non-vectors for comparative analysis. Hosted data range from genome assemblies with annotated gene features, transcript and protein expression data to population genetics including variation and insecticide-resistance phenotypes. Here we describe improvements to our resource and the set of tools available for interrogating and accessing BRC data including the integration of Web Apollo to facilitate community annotation and providing Galaxy to support user-based workflows. VectorBase also actively supports our community through hands-on workshops and online tutorials. All information and data are freely available from our website at https://www.vectorbase.org/.
The fall armyworm (Spodoptera frugiperda (J.E. Smith)) is a highly polyphagous agricultural pest with long-distance migratory behavior threatening food security worldwide. This pest has a host range ...of > 80 plant species, but two host strains are recognized based on their association with corn (C-strain) or rice and smaller grasses (R-strain). The population genomics of the United States (USA) fall armyworm remains poorly characterized to date despite its agricultural threat.
In this study, the population structure and genetic diversity in 55 S. frugiperda samples from Argentina, Brazil, Kenya, Puerto Rico and USA were surveyed to further our understanding of whole genome nuclear diversity. Comparisons at the genomic level suggest a panmictic S. frugiperda population, with only a minor reduction in gene flow between the two overwintering populations in the continental USA, also corresponding to distinct host strains at the mitochondrial level. Two maternal lines were detected from analysis of mitochondrial genomes. We found members from the Eastern Hemisphere interspersed within both continental USA overwintering subpopulations, suggesting multiple individuals were likely introduced to Africa.
Our research is the largest diverse collection of United States S. frugiperda whole genome sequences characterized to date, covering eight continental states and a USA territory (Puerto Rico). The genomic resources presented provide foundational information to understand gene flow at the whole genome level among S. frugiperda populations. Based on the genomic similarities found between host strains and laboratory vs. field samples, our findings validate the experimental use of laboratory strains and the host strain differentiation based on mitochondria and sex-linked genetic markers extends to minor genome wide differences with some exceptions showing mixture between host strains is likely occurring in field populations.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
454 DNA sequencing technology achieves significant throughput relative to traditional approaches. More than 261,000 ESTs were generated by 454 Life Sciences from cDNA isolated using laser capture ...microdissection (LCM) from the developmentally important shoot apical meristem (SAM) of maize (Zea mays L.). This single sequencing run annotated >25,000 maize genomic sequences and also captured approximately 400 expressed transcripts for which homologous sequences have not yet been identified in other species. Approximately 70% of the ESTs generated in this study had not been captured during a previous EST project conducted using a cDNA library constructed from hand-dissected apex tissue that is highly enriched for SAMs. In addition, at least 30% of the 454-ESTs do not align to any of the approximately 648,000 extant maize ESTs using conservative alignment criteria. These results indicate that the combination of LCM and the deep sequencing possible with 454 technology enriches for SAM transcripts not present in current EST collections. RT-PCR was used to validate the expression of 27 genes whose expression had been detected in the SAM via LCM-454 technology, but that lacked orthologs in GenBank. Significantly, transcripts from approximately 74% (20/27) of these validated SAM-expressed "orphans" were not detected in meristem-rich immature ears. We conclude that the coupling of LCM and 454 sequencing technologies facilitates the discovery of rare, possibly cell-type-specific transcripts.
Gap junctions are ubiquitous throughout the nervous system, mediating critical signal transmission and integration, as well as emergent network properties. In mammalian retina, gap junctions within ...the Aii amacrine cell-ON cone bipolar cell (CBC) network are essential for night vision, modulation of day vision, and contribute to visual impairment in retinal degenerations, yet neither the extended network topology nor its conservation is well established. Here, we map the network contribution of gap junctions using a high resolution connectomics dataset of an adult female rabbit retina. Gap junctions are prominent synaptic components of ON CBC classes, constituting 5-25% of all axonal synaptic contacts. Many of these mediate canonical transfer of rod signals from Aii cells to ON CBCs for night vision, and we find that the uneven distribution of Aii signals to ON CBCs is conserved in rabbit, including one class entirely lacking direct Aii coupling. However, the majority of gap junctions formed by ON CBCs unexpectedly occur between ON CBCs, rather than with Aii cells. Such coupling is extensive, creating an interconnected network with numerous lateral paths both within, and particularly across, these parallel processing streams. Coupling patterns are precise with ON CBCs accepting and rejecting unique combinations of partnerships according to robust rulesets. Coupling specificity extends to both size and spatial topologies, thereby rivaling the synaptic specificity of chemical synapses. These ON CBC coupling motifs dramatically extend the coupled Aii-ON CBC network, with implications for signal flow in both scotopic and photopic retinal networks during visual processing and disease.
Electrical synapses mediated by gap junctions are fundamental components of neural networks. In retina, coupling within the Aii-ON CBC network shapes visual processing in both the scotopic and photopic networks. In retinal degenerations, these same gap junctions mediate oscillatory activity that contributes to visual impairment. Here, we use high resolution connectomics strategies to identify gap junctions and cellular partnerships. We describe novel, pervasive motifs both within and across classes of ON CBCs that dramatically extend the Aii-ON CBC network. These motifs are highly specific with implications for both signal processing within the retina and therapeutic interventions for blinding conditions. These findings highlight the underappreciated contribution of coupling motifs in retinal circuitry and the necessity of their detection in connectomics studies.
The molecular mechanisms and genetic architecture that facilitate adaptive radiation of lineages remain elusive. Polymorphic chromosomal inversions, due to their recombination‐reducing effect, are ...proposed instruments of ecotypic differentiation. Here, we study an ecologically diversifying lineage of Anopheles gambiae, known as the Bamako chromosomal form based on its unique complement of three chromosomal inversions, to explore the impact of these inversions on ecotypic differentiation. We used pooled and individual genome sequencing of Bamako, typical (non‐Bamako) An. gambiae and the sister species Anopheles coluzzii to investigate evolutionary relationships and genomewide patterns of nucleotide diversity and differentiation among lineages. Despite extensive shared polymorphism and limited differentiation from the other taxa, Bamako clusters apart from the other taxa, and forms a maximally supported clade in neighbour‐joining trees based on whole‐genome data (including inversions) or solely on collinear regions. Nevertheless, FST outlier analysis reveals that the majority of differentiated regions between Bamako and typical An. gambiae are located inside chromosomal inversions, consistent with their role in the ecological isolation of Bamako. Exceptionally differentiated genomic regions were enriched for genes implicated in nervous system development and signalling. Candidate genes associated with a selective sweep unique to Bamako contain substitutions not observed in sympatric samples of the other taxa, and several insecticide resistance gene alleles shared between Bamako and other taxa segregate at sharply different frequencies in these samples. Bamako represents a useful window into the initial stages of ecological and genomic differentiation from sympatric populations in this important group of malaria vectors.