Evidence for the etiology of autism spectrum disorders (ASDs) has consistently pointed to a strong genetic component complicated by substantial locus heterogeneity. We sequenced the exomes of 20 ...individuals with sporadic ASD (cases) and their parents, reasoning that these families would be enriched for de novo mutations of major effect. We identified 21 de novo mutations, 11 of which were protein altering. Protein-altering mutations were significantly enriched for changes at highly conserved residues. We identified potentially causative de novo events in 4 out of 20 probands, particularly among more severely affected individuals, in FOXP1, GRIN2B, SCN1A and LAMC3. In the FOXP1 mutation carrier, we also observed a rare inherited CNTNAP2 missense variant, and we provide functional support for a multi-hit model for disease risk. Our results show that trio-based exome sequencing is a powerful approach for identifying new candidate genes for ASDs and suggest that de novo mutations may contribute substantially to the genetic etiology of ASDs.
Exome sequencing - the targeted sequencing of the subset of the human genome that is protein coding - is a powerful and cost-effective new tool for dissecting the genetic basis of diseases and traits ...that have proved to be intractable to conventional gene-discovery strategies. Over the past 2 years, experimental and analytical approaches relating to exome sequencing have established a rich framework for discovering the genes underlying unsolved Mendelian disorders. Additionally, exome sequencing is being adapted to explore the extent to which rare alleles explain the heritability of complex diseases and health-related traits. These advances also set the stage for applying exome and whole-genome sequencing to facilitate clinical diagnosis and personalized disease-risk profiling.
Genome-wide association studies suggest that common genetic variants explain only a modest fraction of heritable risk for common diseases, raising the question of whether rare variants account for a ...significant fraction of unexplained heritability. Although DNA sequencing costs have fallen markedly, they remain far from what is necessary for rare and novel variants to be routinely identified at a genome-wide scale in large cohorts. We have therefore sought to develop second-generation methods for targeted sequencing of all protein-coding regions ('exomes'), to reduce costs while enriching for discovery of highly penetrant variants. Here we report on the targeted capture and massively parallel sequencing of the exomes of 12 humans. These include eight HapMap individuals representing three populations, and four unrelated individuals with a rare dominantly inherited disorder, Freeman-Sheldon syndrome (FSS). We demonstrate the sensitive and specific identification of rare and common variants in over 300 megabases of coding sequence. Using FSS as a proof-of-concept, we show that candidate genes for Mendelian disorders can be identified by exome sequencing of a small number of unrelated, affected individuals. This strategy may be extendable to diseases with more complex genetics through larger sample sizes and appropriate weighting of non-synonymous variants by predicted functional impact.
Kawasaki disease (KD) is a pediatric vasculitis that damages the coronary arteries in 25% of untreated and approximately 5% of treated children. Epidemiologic data suggest that KD is triggered by ...unidentified infection(s) in genetically susceptible children. To investigate genetic determinants of KD susceptibility, we performed a genome-wide association study (GWAS) in 119 Caucasian KD cases and 135 matched controls with stringent correction for possible admixture, followed by replication in an independent cohort and subsequent fine-mapping, for a total of 893 KD cases plus population and family controls. Significant associations of 40 SNPs and six haplotypes, identifying 31 genes, were replicated in an independent cohort of 583 predominantly Caucasian KD families, with NAALADL2 (rs17531088, p(combined) = 1.13 x 10(-6)) and ZFHX3 (rs7199343, p(combined) = 2.37 x 10(-6)) most significantly associated. Sixteen associated variants with a minor allele frequency of >0.05 that lay within or close to known genes were fine-mapped with HapMap tagging SNPs in 781 KD cases, including 590 from the discovery and replication stages. Original or tagging SNPs in eight of these genes replicated the original findings, with seven genes having further significant markers in adjacent regions. In four genes (ZFHX3, NAALADL2, PPP1R14C, and TCP1), the neighboring markers were more significantly associated than the originally associated variants. Investigation of functional relationships between the eight fine-mapped genes using Ingenuity Pathway Analysis identified a single functional network (p = 10(-13)) containing five fine-mapped genes-LNX1, CAMK2D, ZFHX3, CSMD1, and TCP1-with functional relationships potentially related to inflammation, apoptosis, and cardiovascular pathology. Pair-wise blood transcript levels were measured during acute and convalescent KD for all fine-mapped genes, revealing a consistent trend of significantly reduced transcript levels prior to treatment. This is one of the first GWAS in an infectious disease. We have identified novel, plausible, and functionally related variants associated with KD susceptibility that may also be relevant to other cardiovascular diseases.
RNA modifications, such as N
-methyladenosine (m
A), modulate functions of cellular RNA species. However, quantifying differences in RNA modifications has been challenging. Here we develop a ...computational method, xPore, to identify differential RNA modifications from nanopore direct RNA sequencing (RNA-seq) data. We evaluate our method on transcriptome-wide m
A profiling data, demonstrating that xPore identifies positions of m
A sites at single-base resolution, estimates the fraction of modified RNA species in the cell and quantifies the differential modification rate across conditions. We apply xPore to direct RNA-seq data from six cell lines and multiple myeloma patient samples without a matched control sample and find that many m
A sites are preserved across cell types, whereas a subset exhibit significant differences in their modification rates. Our results show that RNA modifications can be identified from direct RNA-seq data with high accuracy, enabling analysis of differential modifications and expression from a single high-throughput experiment.
Responses of tropical cyclones (TCs) to CO
2
doubling are explored using coupled global climate models (GCMs) with increasingly refined atmospheric/land horizontal grids (~ 200 km, ~ 50 km and ~ ...25 km). The three models exhibit similar changes in background climate fields thought to regulate TC activity, such as relative sea surface temperature (SST), potential intensity, and wind shear. However, global TC frequency decreases substantially in the 50 km model, while the 25 km model shows no significant change. The ~ 25 km model also has a substantial and spatially-ubiquitous increase of Category 3–4–5 hurricanes. Idealized perturbation experiments are performed to understand the TC response. Each model’s transient fully-coupled 2 × CO
2
TC activity response is largely recovered by “time-slice” experiments using time-invariant SST perturbations added to each model’s own SST climatology. The TC response to SST forcing depends on each model’s background climatological SST biases: removing these biases leads to a global TC intensity increase in the ~ 50 km model, and a global TC frequency increase in the ~ 25 km model, in response to CO
2
-induced warming patterns and CO
2
doubling. Isolated CO
2
doubling leads to a significant TC frequency decrease, while isolated uniform SST warming leads to a significant global TC frequency increase; the ~ 25 km model has a greater tendency for frequency increase. Global TC frequency responds to both (1) changes in TC “seeds”, which increase due to warming (more so in the ~ 25 km model) and decrease due to higher CO
2
concentrations, and (2) less efficient development of these“seeds” into TCs, largely due to the nonlinear relation between temperature and saturation specific humidity.
Endoscopic mucosal biopsies of primary gastric cancers (GCs) are used to guide diagnosis, biomarker testing and treatment. Spatial intratumoural heterogeneity (ITH) may influence biopsy-derived ...information. We aimed to study ITH of primary GCs and matched lymph node metastasis (LN
).
GC resection samples were annotated to identify primary tumour superficial (PT
), primary tumour deep (PT
) and LN
subregions. For each subregion, we determined (1) transcriptomic profiles (NanoString 'PanCancer Progression Panel', 770 genes); (2) next-generation sequencing (NGS, 225 gastrointestinal cancer-related genes); (3) DNA copy number profiles by multiplex ligation-dependent probe amplification (MLPA, 16 genes); and (4) histomorphological phenotypes.
NanoString profiling of 64 GCs revealed no differences between PT
and PT
, while 43% of genes were differentially expressed between PT
versus PT
and 38% in PT
versus LN
. Only 16% of genes were differently expressed between PT
and LN
. Several genes with therapeutic potential (eg
,
and
) were overexpressed in LN
and PT
compared with PT
. NGS data revealed orthogonal support of NanoString results with 40% mutations present in PT
and/or LN
, but not in PT
. Conversely, only 6% of mutations were present in PT
and were absent in PT
and LN
. MLPA demonstrated significant ITH between subregions and progressive genomic changes from PT
to PT
/LN
.
In GC, regional lymph node metastases are likely to originate from deeper subregions of the primary tumour. Future clinical trials of novel targeted therapies must consider assessment of deeper subregions of the primary tumour and/or metastases as several therapeutically relevant genes are only mutated, overexpressed or amplified in these regions.
We demonstrate the first successful application of exome sequencing to discover the gene for a rare mendelian disorder of unknown cause, Miller syndrome (MIM%263750). For four affected individuals in ...three independent kindreds, we captured and sequenced coding regions to a mean coverage of 40x and sufficient depth to call variants at approximately 97% of each targeted exome. Filtering against public SNP databases and eight HapMap exomes for genes with two previously unknown variants in each of the four individuals identified a single candidate gene, DHODH, which encodes a key enzyme in the pyrimidine de novo biosynthesis pathway. Sanger sequencing confirmed the presence of DHODH mutations in three additional families with Miller syndrome. Exome sequencing of a small number of unrelated affected individuals is a powerful, efficient strategy for identifying the genes underlying rare mendelian disorders and will likely transform the genetic analysis of monogenic traits.
To catalog protein-altering mutations that may drive the development of prostate cancers and their progression to metastatic disease systematically, we performed whole-exome sequencing of 23 prostate ...cancers derived from 16 different lethal metastatic tumors and three high-grade primary carcinomas. All tumors were propagated in mice as xenografts, designated the LuCaP series, to model phenotypic variation, such as responses to cancer-directed therapeutics. Although corresponding normal tissue was not available for most tumors, we were able to take advantage of increasingly deep catalogs of human genetic variation to remove most germline variants. On average, each tumor genome contained ∼200 novel nonsynonymous variants, of which the vast majority was specific to individual carcinomas. A subset of genes was recurrently altered across tumors derived from different individuals, including TP53, DLK2, GPC6, and SDF4. Unexpectedly, three prostate cancer genomes exhibited substantially higher mutation frequencies, with 2,000–4,000 novel coding variants per exome. A comparison of castration-resistant and castration-sensitive pairs of tumor lines derived from the same prostate cancer highlights mutations in the Wnt pathway as potentially contributing to the development of castration resistance. Collectively, our results indicate that point mutations arising in coding regions of advanced prostate cancers are common but, with notable exceptions, very few genes are mutated in a substantial fraction of tumors. We also report a previously undescribed subtype of prostate cancers exhibiting "hypermutated" genomes, with potential implications for resistance to cancer therapeutics. Our results also suggest that increasingly deep catalogs of human germline variation may challenge the necessity of sequencing matched tumor-normal pairs.
Current methods for determining RNA structure with short-read sequencing cannot capture most differences between distinct transcript isoforms. Here we present RNA structure analysis using nanopore ...sequencing (PORE-cupine), which combines structure probing using chemical modifications with direct long-read RNA sequencing and machine learning to detect secondary structures in cellular RNAs. PORE-cupine also captures global structural features, such as RNA-binding-protein binding sites and reactivity differences at single-nucleotide variants. We show that shared sequences in different transcript isoforms of the same gene can fold into different structures, highlighting the importance of long-read sequencing for obtaining phase information. We also demonstrate that structural differences between transcript isoforms of the same gene lead to differences in translation efficiency. By revealing isoform-specific RNA structure, PORE-cupine will deepen understanding of the role of structures in controlling gene regulation.