We present a statistical model for patterns of genetic variation in samples of unrelated individuals from natural populations. This model is based on the idea that, over short regions, haplotypes in ...a population tend to cluster into groups of similar haplotypes. To capture the fact that, because of recombination, this clustering tends to be local in nature, our model allows cluster memberships to change continuously along the chromosome according to a hidden Markov model. This approach is flexible, allowing for both “block-like” patterns of linkage disequilibrium (LD) and gradual decline in LD with distance. The resulting model is also fast and, as a result, is practicable for large data sets (e.g., thousands of individuals typed at hundreds of thousands of markers). We illustrate the utility of the model by applying it to dense single-nucleotide–polymorphism genotype data for the tasks of imputing missing genotypes and estimating haplotypic phase. For imputing missing genotypes, methods based on this model are as accurate or more accurate than existing methods. For haplotype estimation, the point estimates are slightly less accurate than those from the best existing methods (e.g., for unrelated Centre d'Etude du Polymorphisme Humain individuals from the HapMap project, switch error was 0.055 for our method vs. 0.051 for PHASE) but require a small fraction of the computational cost. In addition, we demonstrate that the model accurately reflects uncertainty in its estimates, in that probabilities computed using the model are approximately well calibrated. The methods described in this article are implemented in a software package, fastPHASE, which is available from the Stephens Lab Web site.
Although many algorithms exist for estimating haplotypes from genotype data, none of them take full account of both the decay of linkage disequilibrium (LD) with distance and the order and spacing of ...genotyped markers. Here, we describe an algorithm that does take these factors into account, using a flexible model for the decay of LD with distance that can handle both “blocklike” and “nonblocklike” patterns of LD. We compare the accuracy of this approach with a range of other available algorithms in three ways: for reconstruction of randomly paired, molecularly determined male X chromosome haplotypes; for reconstruction of haplotypes obtained from trios in an autosomal region; and for estimation of missing genotypes in 50 autosomal genes that have been completely resequenced in 24 African Americans and 23 individuals of European descent. For the autosomal data sets, our new approach clearly outperforms the best available methods, whereas its accuracy in inferring the X chromosome haplotypes is only slightly superior. For estimation of missing genotypes, our method performed slightly better when the two subsamples were combined than when they were analyzed separately, which illustrates its robustness to population stratification. Our method is implemented in the software package PHASE (v2.1.1), available from the Stephens Lab Web site.
Early detection of pancreatic ductal adenocarcinoma (PDAC) remains elusive. Precursor lesions of PDAC, specifically intraductal papillary mucinous neoplasms (IPMNs), represent a
pathway to invasive ...neoplasia, although the molecular correlates of progression remain to be fully elucidated. Single-cell transcriptomics provides a unique avenue for dissecting both the epithelial and microenvironmental heterogeneities that accompany multistep progression from noninvasive IPMNs to PDAC.
Single-cell RNA sequencing was performed through droplet-based sequencing on 5,403 cells from 2 low-grade IPMNs (LGD-IPMNs), 2 high-grade IPMNs (HGD-IPMN), and 2 PDACs (all surgically resected).
Analysis of single-cell transcriptomes revealed heterogeneous alterations within the epithelium and the tumor microenvironment during the progression of noninvasive dysplasia to invasive cancer. Although HGD-IPMNs expressed many core signaling pathways described in PDAC, LGD-IPMNs harbored subsets of single cells with a transcriptomic profile that overlapped with invasive cancer. Notably, a proinflammatory immune component was readily seen in low-grade IPMNs, composed of cytotoxic T cells, activated T-helper cells, and dendritic cells, which was progressively depleted during neoplastic progression, accompanied by infiltration of myeloid-derived suppressor cells. Finally, stromal myofibroblast populations were heterogeneous and acquired a previously described tumor-promoting and immune-evading phenotype during invasive carcinogenesis.
This study demonstrates the ability to perform high-resolution profiling of the transcriptomic changes that occur during multistep progression of cystic PDAC precursors to cancer. Notably, single-cell analysis provides an unparalleled insight into both the epithelial and microenvironmental heterogeneities that accompany early cancer pathogenesis and might be a useful substrate to identify targets for cancer interception.
.
Sequencing studies of breast tumour cohorts have identified many prevalent mutations, but provide limited insight into the genomic diversity within tumours. Here we developed a whole-genome and exome ...single cell sequencing approach called nuc-seq that uses G2/M nuclei to achieve 91% mean coverage breadth. We applied this method to sequence single normal and tumour nuclei from an oestrogen-receptor-positive (ER(+)) breast cancer and a triple-negative ductal carcinoma. In parallel, we performed single nuclei copy number profiling. Our data show that aneuploid rearrangements occurred early in tumour evolution and remained highly stable as the tumour masses clonally expanded. In contrast, point mutations evolved gradually, generating extensive clonal diversity. Using targeted single-molecule sequencing, many of the diverse mutations were shown to occur at low frequencies (<10%) in the tumour mass. Using mathematical modelling we found that the triple-negative tumour cells had an increased mutation rate (13.3×), whereas the ER(+) tumour cells did not. These findings have important implications for the diagnosis, therapeutic treatment and evolution of chemoresistance in breast cancer.
Most patients diagnosed with resected pancreatic adenocarcinoma (PDAC) survive less than 5 years, but a minor subset survives longer. Here, we dissect the role of the tumor microbiota and the immune ...system in influencing long-term survival. Using 16S rRNA gene sequencing, we analyzed the tumor microbiome composition in PDAC patients with short-term survival (STS) and long-term survival (LTS). We found higher alpha-diversity in the tumor microbiome of LTS patients and identified an intra-tumoral microbiome signature (Pseudoxanthomonas-Streptomyces-Saccharopolyspora-Bacillus clausii) highly predictive of long-term survivorship in both discovery and validation cohorts. Through human-into-mice fecal microbiota transplantation (FMT) experiments from STS, LTS, or control donors, we were able to differentially modulate the tumor microbiome and affect tumor growth as well as tumor immune infiltration. Our study demonstrates that PDAC microbiome composition, which cross-talks to the gut microbiome, influences the host immune response and natural history of the disease.
Display omitted
Display omitted
•PDAC long-term survivors display high tumor microbial diversity and immunoactivation•A PDAC tumoral microbiome signature predicts PDAC long-term survival•The gut microbiome modulates the PDAC tumor microbiome landscape•Fecal microbial transplants can modulate tumors immunosuppression and growth
The distinct tumor microbiome from pancreatic cancer long-term survivors can be used to predict PDAC survival in humans, and transfer of long-term survivor gut microbiomes can alter the tumor microbiome and tumor growth in mouse models.
Lynch syndrome is the most common cause of hereditary colorectal cancer and is secondary to germline alterations in one of four DNA mismatch repair (MMR) genes. Here we aimed to provide novel ...insights into the initiation of MMR-deficient (MMRd) colorectal carcinogenesis by characterizing the expression profile of MMRd intestinal stem cells (ISC). A tissue-specific MMRd mouse model (Villin-Cre;Msh2
) was crossed with a reporter mouse (
) to trace and isolate ISCs (Lgr5+) using flow cytometry. Three different ISC genotypes (
-KO,
-HET, and
-WT) were isolated and processed for mRNA-seq and mass spectrometry, followed by bioinformatic analyses to identify expression signatures of complete MMRd and haplo-insufficiency. These findings were validated using qRT-PCR, IHC, and whole transcriptomic sequencing in mouse tissues, organoids, and a cohort of human samples, including normal colorectal mucosa, premalignant lesions, and early-stage colorectal cancers from patients with Lynch syndrome and patients with familial adenomatous polyposis (FAP) as controls.
-KO ISCs clustered together with differentiated intestinal epithelial cells from all genotypes. Gene-set enrichment analysis indicated inhibition of replication, cell-cycle progression, and the Wnt pathway and activation of epithelial signaling and immune reaction. An expression signature derived from MMRd ISCs successfully distinguished MMRd neoplastic lesions of patients with Lynch syndrome from FAP controls. SPP1 was specifically upregulated in MMRd ISCs and colocalized with LGR5 in Lynch syndrome colorectal premalignant lesions and tumors. These results show that expression signatures of MMRd ISC recapitulate the initial steps of Lynch syndrome carcinogenesis and have the potential to unveil novel biomarkers of early cancer initiation. SIGNIFICANCE: The transcriptomic and proteomic profile of MMR-deficient intestinal stem cells displays a unique set of genes with potential roles as biomarkers of cancer initiation and early progression.
Breast cancer is one of the most commonly diagnosed cancers in women. While there are several effective therapies for breast cancer and important single gene prognostic/predictive markers, more than ...40,000 women die from this disease every year. The increasing availability of large-scale genomic datasets provides opportunities for identifying factors that influence breast cancer survival in smaller, well-defined subsets. The purpose of this study was to investigate the genomic landscape of various breast cancer subtypes and its potential associations with clinical outcomes. We used statistical analysis of sequence data generated by the Cancer Genome Atlas initiative including somatic mutation load (SML) analysis, Kaplan–Meier survival curves, gene mutational frequency, and mutational enrichment evaluation to study the genomic landscape of breast cancer. We show that ER
+
, but not ER
−
, tumors with high SML associate with poor overall survival (HR = 2.02). Further, these high mutation load tumors are enriched for coincident mutations in both DNA damage repair and ER signature genes. While it is known that somatic mutations in specific genes affect breast cancer survival, this study is the first to identify that SML may constitute an important global signature for a subset of ER
+
tumors prone to high mortality. Moreover, although somatic mutations in individual DNA damage genes affect clinical outcome, our results indicate that coincident mutations in DNA damage response and signature ER genes may prove more informative for ER
+
breast cancer survival. Next generation sequencing may prove an essential tool for identifying pathways underlying poor outcomes and for tailoring therapeutic strategies.
Most patients with pancreatic ductal adenocarcinoma (PDAC) present with surgically unresectable cancer. As a result, endoscopic ultrasound-guided fine-needle aspiration (EUS-FNA) is the most common ...biospecimen source available for diagnosis in treatment-naïve patients. Unfortunately, these limited samples are often not considered adequate for genomic analysis, precluding the opportunity for enrollment on precision medicine trials.
Applying an epithelial cell adhesion molecule (EpCAM)-enrichment strategy, we show the feasibility of using real-world EUS-FNA for in-depth, molecular-barcoded, whole-exome sequencing (WES) and somatic copy-number alteration (SCNA) analysis in 23 patients with PDAC.
Potentially actionable mutations were identified in >20% of patients. Further, an increased mutational burden and higher aneuploidy in WES data were associated with an adverse prognosis. To identify predictive biomarkers for first-line chemotherapy, we developed an SCNA-based complexity score that was associated with response to platinum-based regimens in this cohort.
Collectively, these results emphasize the feasibility of real-world cytology samples for in-depth genomic characterization of PDAC and show the prognostic potential of SCNA for PDAC diagnosis.