Cancers are caused by the accumulation of genomic alterations. Therefore, analyses of cancer genome sequences and structures provide insights for understanding cancer biology, diagnosis and therapy. ...The application of second-generation DNA sequencing technologies (also known as next-generation sequencing) - through whole-genome, whole-exome and whole-transcriptome approaches - is allowing substantial advances in cancer genomics. These methods are facilitating an increase in the efficiency and resolution of detection of each of the principal types of somatic cancer genome alterations, including nucleotide substitutions, small insertions and deletions, copy number alterations, chromosomal rearrangements and microbial infections. This Review focuses on the methodological considerations for characterizing somatic genome alterations in cancer and the future prospects for these approaches.
Clonal hematopoiesis of indeterminate potential (CHIP) refers to clonal expansion of hematopoietic stem cells attributable to acquired leukemic mutations in genes such as
or
. In humans, CHIP ...associates with prevalent myocardial infarction. In mice, CHIP accelerates atherosclerosis and increases IL-6/IL-1β expression, raising the hypothesis that IL-6 pathway antagonism in CHIP carriers would decrease cardiovascular disease (CVD) risk.
We analyzed exome sequences from 35 416 individuals in the UK Biobank without prevalent CVD, to identify participants with
or
CHIP. We used the
p.Asp358Ala coding mutation as a genetic proxy for IL-6 inhibition. We tested the association of CHIP status with incident CVD events (myocardial infarction, coronary revascularization, stroke, or death), and whether it was modified by
p.Asp358Ala.
We identified 1079 (3.0%) individuals with CHIP, including 432 (1.2%) with large clones (allele fraction >10%). During 6.9-year median follow-up, CHIP associated with increased incident CVD event risk (hazard ratio, 1.27 95% CI, 1.04-1.56,
=0.019), with greater risk from large CHIP clones (hazard ratio, 1.59 95% CI, 1.21-2.09,
<0.001).
p.Asp358Ala attenuated CVD event risk among participants with large CHIP clones (hazard ratio, 0.46 95% CI, 0.29-0.73,
<0.001) but not in individuals without CHIP (hazard ratio, 0.95 95% CI, 0.89-1.01,
=0.08;
=0.003). In 9951 independent participants, the association of CHIP status with myocardial infarction similarly varied by
p.Asp358Ala (
=0.036).
CHIP is associated with increased risk of incident CVD. Among carriers of large CHIP clones, genetically reduced IL-6 signaling abrogated this risk.
Repair of DNA interstrand crosslinks requires action of multiple DNA repair pathways, including homologous recombination. Here, we report a de novo heterozygous T131P mutation in RAD51/FANCR, the key ...recombinase essential for homologous recombination, in a patient with Fanconi anemia-like phenotype. In vitro, RAD51-T131P displays DNA-independent ATPase activity, no DNA pairing capacity, and a co-dominant-negative effect on RAD51 recombinase function. However, the patient cells are homologous recombination proficient due to the low ratio of mutant to wild-type RAD51 in cells. Instead, patient cells are sensitive to crosslinking agents and display hyperphosphorylation of Replication Protein A due to increased activity of DNA2 and WRN at the DNA interstrand crosslinks. Thus, proper RAD51 function is important during DNA interstrand crosslink repair outside of homologous recombination. Our study provides a molecular basis for how RAD51 and its associated factors may operate in a homologous recombination-independent manner to maintain genomic integrity.
Display omitted
•A dominant-negative mutation in RAD51 is identified in Fanconi anemia-like patient•RAD51 T131P-expressing cells are ICL repair defective but HR proficient•RAD51 T131P has unregulated ATPase activity poisoning wild-type RAD51•Defective RAD51 function results in DNA2/WRN-dependent degradation of DNA
Defects in DNA interstrand crosslink (ICL) repair have detrimental consequences, including stem cell failure and tumorigenesis. Wang et al. uncover a new subtype of Fanconi anemia, FA-R, in which a de novo negative co-dominant RAD51 (FANCR) mutation results in ICL repair defect without affecting RAD51-dependent homologous recombination.
Pacific Biosciences technology provides a fundamentally new data type that provides the potential to overcome some limitations of current next generation sequencing platforms by providing ...significantly longer reads, single molecule sequencing, low composition bias and an error profile that is orthogonal to other platforms. With these potential advantages in mind, we here evaluate the utility of the Pacific Biosciences RS platform for human medical amplicon resequencing projects.
We evaluated the Pacific Biosciences technology for SNP discovery in medical resequencing projects using the Genome Analysis Toolkit, observing high sensitivity and specificity for calling differences in amplicons containing known true or false SNPs. We assessed data quality: most errors were indels (~14%) with few apparent miscalls (~1%). In this work, we define a custom data processing pipeline for Pacific Biosciences data for human data analysis.
Critically, the error properties were largely free of the context-specific effects that affect other sequencing technologies. These data show excellent utility for follow-up validation and extension studies in human data and medical genetics projects, but can be extended to other organisms with a reference genome.
Estimating an epidemic's trajectory is crucial for developing public health responses to infectious diseases, but case data used for such estimation are confounded by variable testing practices. We ...show that the population distribution of viral loads observed under random or symptom-based surveillance-in the form of cycle threshold (Ct) values obtained from reverse transcription quantitative polymerase chain reaction testing-changes during an epidemic. Thus, Ct values from even limited numbers of random samples can provide improved estimates of an epidemic's trajectory. Combining data from multiple such samples improves the precision and robustness of this estimation. We apply our methods to Ct values from surveillance conducted during the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic in a variety of settings and offer alternative approaches for real-time estimates of epidemic trajectories for outbreak management and response.
Determining how somatic copy number alterations (SCNAs) promote cancer is an important goal. We characterized SCNA patterns in 4,934 cancers from The Cancer Genome Atlas Pan-Cancer data set. ...Whole-genome doubling, observed in 37% of cancers, was associated with higher rates of every other type of SCNA, TP53 mutations, CCNE1 amplifications and alterations of the PPP2R complex. SCNAs that were internal to chromosomes tended to be shorter than telomere-bounded SCNAs, suggesting different mechanisms underlying their generation. Significantly recurrent focal SCNAs were observed in 140 regions, including 102 without known oncogene or tumor suppressor gene targets and 50 with significantly mutated genes. Amplified regions without known oncogenes were enriched for genes involved in epigenetic regulation. When levels of genomic disruption were accounted for, 7% of region pairs were anticorrelated, and these regions tended to encompass genes whose proteins physically interact, suggesting related functions. These results provide insights into mechanisms of generation and functional consequences of cancer-related SCNAs.
Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and ...evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.
Although autism has a clear genetic component, the high genetic heterogeneity of the disorder has been a challenge for the identification of causative genes. We used homozygosity analysis to identify ...probands from nonconsanguineous families that showed evidence of distant shared ancestry, suggesting potentially recessive mutations. Whole-exome sequencing of 16 probands revealed validated homozygous, potentially pathogenic recessive mutations that segregated perfectly with disease in 4/16 families. The candidate genes (UBE3B, CLTCL1, NCKAP5L, ZNF18) encode proteins involved in proteolysis, GTPase-mediated signaling, cytoskeletal organization, and other pathways. Furthermore, neuronal depolarization regulated the transcription of these genes, suggesting potential activity-dependent roles in neurons. We present a multidimensional strategy for filtering whole-exome sequence data to find candidate recessive mutations in autism, which may have broader applicability to other complex, heterogeneous disorders.
Genomic analysis of tumours has led to the identification of hundreds of cancer genes on the basis of the presence of mutations in protein-coding regions. By contrast, much less is known about ...cancer-causing mutations in non-coding regions. Here we perform deep sequencing in 360 primary breast cancers and develop computational methods to identify significantly mutated promoters. Clear signals are found in the promoters of three genes. FOXA1, a known driver of hormone-receptor positive breast cancer, harbours a mutational hotspot in its promoter leading to overexpression through increased E2F binding. RMRP and NEAT1, two non-coding RNA genes, carry mutations that affect protein binding to their promoters and alter expression levels. Our study shows that promoter regions harbour recurrent mutations in cancer with functional consequences and that the mutations occur at similar frequencies as in coding regions. Power analyses indicate that more such regions remain to be discovered through deep sequencing of adequately sized cohorts of patients.
Although a few cancer genes are mutated in a high proportion of tumours of a given type (>20%), most are mutated at intermediate frequencies (2-20%). To explore the feasibility of creating a ...comprehensive catalogue of cancer genes, we analysed somatic point mutations in exome sequences from 4,742 human cancers and their matched normal-tissue samples across 21 cancer types. We found that large-scale genomic analysis can identify nearly all known cancer genes in these tumour types. Our analysis also identified 33 genes that were not previously known to be significantly mutated in cancer, including genes related to proliferation, apoptosis, genome stability, chromatin regulation, immune evasion, RNA processing and protein homeostasis. Down-sampling analysis indicates that larger sample sizes will reveal many more genes mutated at clinically important frequencies. We estimate that near-saturation may be achieved with 600-5,000 samples per tumour type, depending on background mutation frequency. The results may help to guide the next stage of cancer genomics.