NextGen sequencing is a powerful and cost efficient tool for ultra-high-throughput genome and transcriptome analysis. One of the key features of next generation sequencing is de novo whole genome ...sequencing, but assembly and genome finishing is still a major challenge due to short reads generated by these technologies. The 2kb-5kb mate pair reads combined with Illumina short pair-end reads are used in getting better genomic coverage across the genome. The standard 2kb-5kb Illumina mate-pair library construction protocol does not allow barcoding, and has built-in limitations that prevent getting more than 36bp reads at either end, as increasing read length can lead to elevated error rate. This is due to the fact that the junction reads cannot be identified easily if working with de novo assembly or those reads got discarded, since they would not align to reference sequence. Here, we demonstrate a modified 2kb-5kb mate pair library construction protocol for Illumina technologies that allows long barcoded, mate-paired reads without increasing error rates.
A large, non-coding ATTCT repeat expansion causes the neurodegenerative disorder, spinocerebellar ataxia type 10 (SCA10). In a subset of SCA10 patients, interruption motifs are present at the 5' end ...of the expansion and strongly correlate with epileptic seizures. Thus, interruption motifs are a predictor of the epileptic phenotype and are hypothesized to act as a phenotypic modifier in SCA10. Yet, the exact internal sequence structure of SCA10 expansions remains unknown due to limitations in current technologies for sequencing across long extended tracts of tandem nucleotide repeats. We used the third generation sequencing technology, Single Molecule Real Time (SMRT) sequencing, to obtain full-length contiguous expansion sequences, ranging from 2.5 to 4.4 kb in length, from three SCA10 patients with different clinical presentations. We obtained sequence spanning the entire length of the expansion and identified the structure of known and novel interruption motifs within the SCA10 expansion. The exact interruption patterns in expanded SCA10 alleles will allow us to further investigate the potential contributions of these interrupting sequences to the pathogenic modification leading to the epilepsy phenotype in SCA10. Our results also demonstrate that SMRT sequencing is useful for deciphering long tandem repeats that pose as "gaps" in the human genome sequence.
The first Citrus tristeza virus (CTV) genomes completely sequenced (19.3-kb positive-sense RNA), from four biologically distinct isolates, are unexpectedly divergent in nucleotide sequence (up to 60% ...divergence). Understanding of whether these large sequence differences resulted from recent evolution is important for the design of disease management strategies, particularly the use of genetically engineered mild (essentially symptomless)-strain cross protection and RNA-mediated transgenic resistance. The complete sequence of a mild isolate (T30) which has been endemic in Florida for about a century was found to be nearly identical to the genomic sequence of a mild isolate (T385) from Spain. Moreover, samples of sequences of other isolates from distinct geographic locations, maintained in different citrus hosts and also separated in time (B252 from Taiwan, B272 from Colombia, and B354 from California), were nearly identical to the T30 sequence. The sequence differences between these isolates were within or near the range of variability of the T30 population. A possible explanation for these results is that the parents of isolates T30, T385, B252, B272, and B354 have a common origin, probably Asia, and have changed little since they were dispersed throughout the world by the movement of citrus. Considering that the nucleotide divergence among the other known CTV genomes is much greater than those expected for strains of the same virus, the remarkable similarity of these five isolates indicates a high degree of evolutionary stasis in some CTV populations.
We present evidence that a newly discovered mosquito virus from Culex nigripalpus is an unusual member of the family BACULOVIRIDAE: Development of this virus was restricted to nuclei of midgut ...epithelial cells in the gastric caeca and posterior stomach. The globular occlusion bodies were not enveloped, measured around 400 nm in diameter, occurred exclusively in nuclei of infected cells and typically contained four, sometimes up to eight, virions. The developmental sequence involved two virion phenotypes: an occluded form (ODV) that initiated infection in the midgut epithelial cells, and a budded form that spread the infection in the midgut. Each ODV contained one rod-shaped enveloped nucleocapsid (40x200 nm). The double-stranded DNA genome was approximately 105-110 kbp with an estimated GC content of 52%. We have sequenced approximately one-third of the genome and detected 96 putative ORFs of 50 amino acids or more including several genes considered to be unique to baculoviruses. Phylogenetic analysis of the amino acid sequences of DNApol and p74 placed this virus in a separate clade from the genera NUCLEOPOLYHEDROVIRUS: and GRANULOVIRUS: We provisionally assign this virus in the genus NUCLEOPOLYHEDROVIRUS:, henceforth abbreviated as CuniNPV (for Culex nigripalpus nucleopolyhedrovirus), and suggest that, awaiting additional data to clarify its taxonomic status, it may be a member of a new genus within the family BACULOVIRIDAE:
Multiple recent publications on RNA-Seq have demonstrated the power of next generation sequencing technologies in whole transcriptome analysis. The vendor specific protocols used for RNA library ...construction typically require at least 100ng of total RNA. However, under certain conditions such as single cells, stem cells, difficult to isolate cell types, or fractionated cancer cells, only a small amount of material is available. In these cases, effective transcriptome profiling requires amplification of subnanogram amounts of RNA. Several RNA amplification kits are available for amplification prior to library construction and next generation sequencing but these kits have not been comprehensively field evaluated for accuracy and performance of RNA-Seq for picogram amounts of RNA.
This study conducted by the DNA Sequencing Research Group (DSRG) focuses on the evaluation of amplification kits for RNA-Seq. Four commercial amplification kits were chosen: Ovation v2 (NuGEN Technologies), SMARTer (Clontech), Seqplex (Sigma Aldrich), and Super-AMP (Miltenyi Biotech). Starting material was 5ng, 500pg and 50pg of human total reference RNA (Clontech) spiked with Ambion ERCC control mix (Life Technologies) following the manufacturer's protocol. Each kit was tested at 3 different sites to assess reproducibility. Total RNA and ERCC RNA spike-in control mixes from the same lots were sent to 12 ABRF lab sites for amplification and cDNA generation. Ideally, this would have resulted in 36 different amplified samples, 3 from each input RNA. Libraries were constructed at one site from the amplified cDNAs using the TruSeq RNA library preparation kit on the Tecan Freedom EVO Liquid Handling Robot. As an unamplified control, ribosomal depletion and PolyA selection were performed separately using 5ng, 100ng and 1ug of total RNA prior to library construction. All libraries were pooled and sequenced using the Illumina HiSeq platform. An overview of the study and the results will be presented.
ABSTRACT
We characterized the copper resistance genes in strain XvP26 of
Xanthomonas campestris
pv. vesicatoria, which was originally isolated from a pepper plant in Taiwan. The copper resistance ...genes were localized to a 7,652-bp region which, based on pulsed-field gel electrophoresis and Southern hybridization, was determined to be located on the chromosome. These genes hybridized only weakly, as determined by Southern analysis, to other copper resistance genes in
Xanthomonas
and
Pseudomonas
strains. We identified five open reading frames (ORFs) whose products exhibited high levels of amino acid sequence identity to the products of previously reported copper genes. Mutations in ORF1, ORF3, and ORF4 removed copper resistance, whereas mutations in ORF5 resulted in an intermediate copper resistance phenotype and insertions in ORF2 had no effect on resistance conferred to a copper-sensitive recipient in transconjugant tests. Based on sequence analysis, ORF1 was determined to have high levels of identity with the CopR (66%) and PcoR (63%) genes in
Pseudomonas syringae
pv. tomato and
Escherichia coli
, respectively. ORF2 and ORF5 had high levels of identity with the PcoS gene in
E. coli
and the gene encoding a putative copper-containing oxidoreductase signal peptide protein in
Sinorhizobium meliloti
, respectively. ORF3 and ORF4 exhibited 23% identity to the gene encoding a cation efflux system membrane protein, CzcC, and 62% identity to the gene encoding a putative copper-containing oxidoreductase protein, respectively. The latter two ORFs were determined to be induced following exposure to low concentrations of copper, while addition of Co, Cd, or Zn resulted in no significant induction. PCR analysis of 51 pepper and 34 tomato copper-resistant
X. campestris
pv. vesicatoria strains collected from several regions in Taiwan between 1987 and 2000 and nine copper-resistant strains from the United States and South America showed that successful amplification of DNA was obtained only for strain XvP26. The organization of this set of copper resistance genes appears to be uncommon, and the set appears to occur rarely in
X. campestris
pv. vesicatoria.
Second generation sequencing (SGS) technologies revolutionized the field of genomics, transcriptomics and epigenomics. These technologies generate massive amounts of data at modest cost. While the ...introduction of second-generation sequencing reduced per-nucleotide cost of DNA sequencing, whole genome finishing currently remains prohibitively expensive. The single molecule, real time (SMRT) DNA sequencing (Eid et al, 2009) offers potential advantages over other current sequencing technologies for its faster turnaround time, lower cost, and longer read lengths. Unlike SGS, single molecule sequencing does not rely on PCR and thus overcomes biases related to phasing and PCR amplification. However, sequencing DNA molecules in its native state imposes other challenges associated with possible impurities present in the sample preparation. While the purity of the DNA fragment library is important for good quality sequencing using any sequencing technology, SMRT technology is highly sensitive to carryover impurities introduced during library construction. Here, we demonstrate how impurities present in a commonly used DNA library protocol impact sequencing performance on PacBio.
We describe a modification of a protocol for the isolation of BAC DNA using a silica membrane-based kit designed for the isolation of plasmid DNA. The major advantages of this protocol are the ...expediency of the procedure, the high yield and purity, and the high quality of the BAC DNA that is suitable for direct sequencing.
Levels of three major dehydrins of 65, 60, and 14 kDa have been observed to increase in blueberry (Vaccinium smpp.) floral buds during chill unit accumulation and cold acclimation and decrease during ...deacclimation and resumption of growth. Indeed, levels of the 65‐, 60‐, and 14‐kDa dehydrins increase such that they become the most predominant proteins visible on sodium dodecyl sulfate (SDS)‐polyacrylamide gels. The peptide sequence information from the 65‐ and 60‐kDa dehydrins was used to synthesize degenerate DNA primers for amplification of a part of the gene(s) encoding the dehydrins. One pair of primers amplified a 174‐bp fragment. The 174‐bp fragment was used to screen a cDNA library (prepared from RNA from cold‐acclimated blueberry floral buds) and resulted in the isolation of a clone with a 2.0‐kb insert. The cDNA was sequenced and found to be a full‐length clone encoding a K5‐type dehydrin (5 K boxes). Five high‐confidence peptide sequences, ranging from 9 to 25 amino acids long, obtained from the 60‐kDa dehydrin exactly matched sequences encoded within the cDNA clone. Furthermore, amino acid composition of the 60‐kDa dehydrin agreed well with the expected amino acid composition based on the cDNA sequence. However, the DNA sequence and coupled in vitro transcription/translation reactions of the cDNA clone indicated that it encodes a dehydrin with a native molecular mass of ∼40 kDa instead of 60 kDa. Experiments to determine if the dehydrins undergo post‐translational modifications revealed that the 65‐ and 60‐kDa dehydrins are glycosylated. Thus, our results indicate that the 2.0‐kb dehydrin cDNA encodes the native version of the 60‐kDa dehydrin. The dehydrin cDNA hybridized on RNA blots to two chilling/cold‐responsive messages of 2.0 and 0.5 kb. Both the 2.0‐ and 0.5‐kb messages increased to higher levels more quickly in the cold‐hardy cultivar Bluecrop than in the less hardy cultivar Tifblue. In addition, the 0.5‐kb message remained at a higher level longer in Bluecrop than in Tifblue.
Acid invertases are glycoproteins that catalyze the hydrolysis of sucrose to glucose and fructose and are associated with metabolic sink tissues in a variety of plant species. Acid invertases are ...divided into cell wall-bound invertases (INCW) and soluble invertases based on their location in the cell. We describe here the isolation and characterization of two cell wall invertase cDNA (
Incw1 and
Incw2) and genomic clones. Since the deduced amino acid sequences of
Incw1 and
Incw2 clones are more similar to carrot cell wall invertases than they are to maize soluble invertase, we conclude
Incw1 and
Incw2 represent cell wallbound invertases. Both genomic clones have six introns and seven exons, typical of most other acid invertase genes.
Incw1 mRNA is present in cell suspension culture, etiolated shoots, roots and, at much reduced steady state levels, in developing endosperm. In contrast,
Incw2 mRNA is present in shoots and developing endosperm, but lacking in roots and the
miniature1 (
mn1-1) mutant endosperm.
In situ hybridization studies show that the
Incw2 mRNA is confined to the basal endosperm transfer cells in a developing kernel.