Spatial and molecular characteristics determine tissue function, yet high-resolution methods to capture both concurrently are lacking. Here, we developed high-definition spatial transcriptomics, ...which captures RNA from histological tissue sections on a dense, spatially barcoded bead array. Each experiment recovers several hundred thousand transcript-coupled spatial barcodes at 2-μm resolution, as demonstrated in mouse brain and primary breast cancer. This opens the way to high-resolution spatial analysis of cells and tissues.
The human brain has enormously complex cellular diversity and connectivities fundamental to our neural functions, yet difficulties in interrogating individual neurons has impeded understanding of the ...underlying transcriptional landscape. We developed a scalable approach to sequence and quantify RNA molecules in isolated neuronal nuclei from a postmortem brain, generating 3227 sets of single-neuron data from six distinct regions of the cerebral cortex. Using an iterative clustering and classification approach, we identified 16 neuronal subtypes that were further annotated on the basis of known markers and cortical cytoarchitecture. These data demonstrate a robust and scalable method for identifying and categorizing single nuclear transcriptomes, revealing shared genes sufficient to distinguish previously unknown and orthologous neuronal subtypes as well as regional identity and transcriptomic heterogeneity within the human brain.
We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of ...generating libraries comprised of 9216 indexed pools, each of which contains thousands of sparsely sequenced long fragments ranging from 5 kilobases to > 1 megabase. These pools are "subhaploid," in that the lengths of fragments contained in each pool sums to ∼5% to 10% of the full genome. The scaffolding approach described here, termed fragScaff, leverages coincidences between the content of different pools as a source of contiguity information. Specifically, CPT-seq data is mapped to a de novo genome assembly, followed by the identification of pairs of contigs or scaffolds whose ends disproportionately co-occur in the same indexed pools, consistent with true adjacency in the genome. Such candidate "joins" are used to construct a graph, which is then resolved by a minimum spanning tree. As a proof-of-concept, we apply CPT-seq and fragScaff to substantially boost the contiguity of de novo assemblies of the human, mouse, and fly genomes, increasing the scaffold N50 of de novo assemblies by eight- to 57-fold with high accuracy. We also demonstrate that fragScaff is complementary to Hi-C-based contact probability maps, providing midrange contiguity to support robust, accurate chromosome-scale de novo genome assemblies without the need for laborious in vivo cloning steps. Finally, we demonstrate CPT-seq as a means of anchoring unplaced novel human contigs to the reference genome as well as for detecting misassembled sequences.
The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective ...novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today.
We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets.
Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis.
The most polymorphic part of the human genome, the
encodes over 160 proteins of diverse function. Half of them, including the
and
genes, are directly involved in immune responses. Consequently, the
...region strongly associates with numerous diseases and clinical therapies. Notoriously, the
region has been intractable to high-throughput analysis at complete sequence resolution, and current reference haplotypes are inadequate for large-scale studies. To address these challenges, we developed a method that specifically captures and sequences the 4.8-Mbp
region from genomic DNA. For 95
homozygous cell lines we assembled, de novo, a set of high-fidelity contigs and a sequence scaffold, representing a mean 98% of the target region. Included are six alternative
reference sequences of the human genome that we completed and refined. Characterization of the sequence and structural diversity of the
region shows the approach accurately determines the sequences of the highly polymorphic
and
genes and the complex structural diversity of complement factor
It has also uncovered extensive and unexpected diversity in other
genes; an example is
, which encodes a lung mucin and exhibits more coding sequence alleles than any
or
gene studied here. More than 60% of the coding sequence alleles analyzed were previously uncharacterized. We have created a substantial database of robust reference
haplotype sequences that will enable future population scale studies of this complicated and clinically important region of the human genome.
Circulating tumor cells (CTC) mediate metastatic spread of many solid tumors and enumeration of CTCs is currently used as a prognostic indicator of survival in metastatic prostate cancer patients. ...Some evidence suggests that it is possible to derive additional information about tumors from expression analysis of CTCs, but the technical difficulty of isolating and analyzing individual CTCs has limited progress in this area. To assess the ability of a new generation of MagSweeper to isolate intact CTCs for downstream analysis, we performed mRNA-Seq on single CTCs isolated from the blood of patients with metastatic prostate cancer and on single prostate cancer cell line LNCaP cells spiked into the blood of healthy donors. We found that the MagSweeper effectively isolated CTCs with a capture efficiency that matched the CellSearch platform. However, unlike CellSearch, the MagSweeper facilitates isolation of individual live CTCs without contaminating leukocytes. Importantly, mRNA-Seq analysis showed that the MagSweeper isolation process did not have a discernible impact on the transcriptional profile of single LNCaPs isolated from spiked human blood, suggesting that any perturbations caused by the MagSweeper process on the transcriptional signature of isolated cells are modest. Although the RNA from patient CTCs showed signs of significant degradation, consistent with reports of short half-lives and apoptosis amongst CTCs, transcriptional signatures of prostate tissue and of cancer were readily detectable with single CTC mRNA-Seq. These results demonstrate that the MagSweeper provides access to intact CTCs and that these CTCs can potentially supply clinically relevant information.
Ban-Lan-Gen, the root tissues derived from several morphologically indistinguishable plant species, have been used widely in traditional Chinese medicines for numerous years. The identification of ...reliable markers to distinguish various source plant species is critical for the effective and safe use of products containing Ban-Lan-Gen. Here, we analyzed and characterized the complete chloroplast (cp) genome sequence of
(Nees) Kuntze to identify high-resolution markers for the species determination of Southern Ban-Lan-Gen. Total DNA was extracted and subjected to next-generation sequencing. The cp genome was then assembled, and the gaps were filled using PCR amplification and Sanger sequencing. Genome annotation was conducted using CpGAVAS web server. The genome was 144,133 bp in length, presenting a typical quadripartite structure of large (LSC; 91,666 bp) and small (SSC; 17,328 bp) single-copy regions separated by a pair of inverted repeats (IRs; 17,811 bp). The genome encodes 113 unique genes, including 79 protein-coding, 30 transfer RNA, and 4 ribosomal RNA genes. A total of 20 tandem, 2 forward, and 6 palindromic repeats were detected in the genome. A phylogenetic analysis based on 65 protein-coding genes showed that
was closely related to
and
, which belong to the same family, Acanthaceae. One interesting feature is that the IR regions apparently undergo simultaneous contraction and expansion, resulting in the presence of single copies of rps19, rpl2, rpl23, and ycf2 in the LSC region and the duplication of psbA and trnH genes in the IRs. This study provides the first complete cp genome in the genus
, containing critical information for the classification of various
species in the future. This study also provides the foundation for precisely determining the plant sources of Ban-Lan-Gen.
While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo ...sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology.
We have used multiplexed high-throughput sequencing to characterize changes in small RNA populations that occur during viral infection in animal cells. Small RNA-based mechanisms such as RNA ...interference (RNAi) have been shown in plant and invertebrate systems to play a key role in host responses to viral infection. Although homologs of the key RNAi effector pathways are present in mammalian cells, and can launch an RNAi-mediated degradation of experimentally targeted mRNAs, any role for such responses in mammalian host-virus interactions remains to be characterized. Six different viruses were examined in 41 experimentally susceptible and resistant host systems. We identified virus-derived small RNAs (vsRNAs) from all six viruses, with total abundance varying from "vanishingly rare" (less than 0.1% of cellular small RNA) to highly abundant (comparable to abundant micro-RNAs "miRNAs"). In addition to the appearance of vsRNAs during infection, we saw a number of specific changes in host miRNA profiles. For several infection models investigated in more detail, the RNAi and Interferon pathways modulated the abundance of vsRNAs. We also found evidence for populations of vsRNAs that exist as duplexed siRNAs with zero to three nucleotide 3' overhangs. Using populations of cells carrying a Hepatitis C replicon, we observed strand-selective loading of siRNAs onto Argonaute complexes. These experiments define vsRNAs as one possible component of the interplay between animal viruses and their hosts.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Haplotype-resolved genome sequencing promises to unlock a wealth of information in population and medical genetics. However, for the vast majority of genomes sequenced to date, haplotypes have not ...been determined because of cumbersome haplotyping workflows that require fractions of the genome to be sequenced in a large number of compartments. Here we demonstrate barcode partitioning of long DNA molecules in a single compartment using "on-bead" barcoded tagmentation. The key to the method that we call "contiguity preserving transposition" sequencing on beads (CPTv2-seq) is transposon-mediated transfer of homogenous populations of barcodes from beads to individual long DNA molecules that get fragmented at the same time (tagmentation). These are then processed to sequencing libraries wherein all sequencing reads originating from each long DNA molecule share a common barcode. Single-tube, bulk processing of long DNA molecules with ∼150,000 different barcoded bead types provides a barcode-linked read structure that reveals long-range molecular contiguity. This technology provides a simple, rapid, plate-scalable and automatable route to accurate, haplotype-resolved sequencing, and phasing of structural variants of the genome.