Of 7028 disorders with suspected Mendelian inheritance, 1139 are recessive and have an established molecular basis. Although individually uncommon, Mendelian diseases collectively account for ~20% of ...infant mortality and ~10% of pediatric hospitalizations. Preconception screening, together with genetic counseling of carriers, has resulted in remarkable declines in the incidence of several severe recessive diseases including Tay-Sachs disease and cystic fibrosis. However, extension of preconception screening to most severe disease genes has hitherto been impractical. Here, we report a preconception carrier screen for 448 severe recessive childhood diseases. Rather than costly, complete sequencing of the human genome, 7717 regions from 437 target genes were enriched by hybrid capture or microdroplet polymerase chain reaction, sequenced by next-generation sequencing (NGS) to a depth of up to 2.7 gigabases, and assessed with stringent bioinformatic filters. At a resultant 160x average target coverage, 93% of nucleotides had at least 20x coverage, and mutation detection/genotyping had ~95% sensitivity and ~100% specificity for substitution, insertion/deletion, splicing, and gross deletion mutations and single-nucleotide polymorphisms. In 104 unrelated DNA samples, the average genomic carrier burden for severe pediatric recessive mutations was 2.8 and ranged from 0 to 7. The distribution of mutations among sequenced samples appeared random. Twenty-seven percent of mutations cited in the literature were found to be common polymorphisms or misannotated, underscoring the need for better mutation databases as part of a comprehensive carrier testing strategy. Given the magnitude of carrier burden and the lower cost of testing compared to treating these conditions, carrier screening by NGS made available to the general population may be an economical way to reduce the incidence of and ameliorate suffering associated with severe recessive childhood disorders.
Although histones can form nucleosomes on virtually any genomic sequence, DNA sequences show considerable variability in their binding affinity. We have used DNA sequences of Saccharomyces cerevisiae ...whose nucleosome binding affinities have been experimentally determined (Yuan et al. 2005) to train a support vector machine to identify the nucleosome formation potential of any given sequence of DNA. The DNA sequences whose nucleosome formation potential are most accurately predicted are those that contain strong nucleosome forming or inhibiting signals and are found within nucleosome length stretches of genomic DNA with continuous nucleosome formation or inhibition signals. We have accurately predicted the experimentally determined nucleosome positions across a well-characterized promoter region of S. cerevisiae and identified strong periodicity within 199 center-aligned mononucleosomes studied recently (Segal et al. 2006) despite there being no periodicity information used to train the support vector machine. Our analysis suggests that only a subset of nucleosomes are likely to be positioned by intrinsic sequence signals. This observation is consistent with the available experimental data and is inconsistent with the proposal of a nucleosome positioning code. Finally, we show that intrinsic nucleosome positioning signals are both more inhibitory and more variable in promoter regions than in open reading frames in S. cerevisiae.
Anaplastic lymphoma kinase (ALK) fusion is the most common mechanism for overexpression and activation in non–small-cell lung carcinoma. Several fusion partners of ALK have been reported, including ...echinoderm microtubule-associated protein-like 4, TRK-fused gene, kinesin family member 5B, kinesin light chain 1 (KLC1), protein tyrosine phosphatase and nonreceptor type 3, and huntingtin interacting protein 1 (HIP1).
A 60-year-old Korean man had a lung mass which was a poorly differentiated adenocarcinoma with ALK overexpression. By using an Anchored Multiplex polymerase chain reaction assay and sequencing, we found that tumor had a novel translocated promoter region (TPR)-ALK fusion. The fusion transcript was generated from an intact, in-frame fusion of TPR exon 15 and ALK exon 20 (t(1;2)(q31.1;p23)). The TPR-ALK fusion encodes a predicted protein of 1192 amino acids with a coiled-coil domain encoded by the 5’-2nd of the TPR and juxtamembrane and kinase domains encoded by the 3’-end of the ALK.
The novel fusion gene and its protein TRP-ALK, harboring coiled-coil and kinase domains, could possess transforming potential and responses to treatment with ALK inhibitors. This case is the first report of TPR-ALK fusion transcript in clinical tumor samples and could provide a novel diagnostic and therapeutic candidate target for patients with cancer, including non–small-cell lung carcinoma.
We developed a massive-scale RNA sequencing protocol, short quantitative random RNA libraries or SQRL, to survey the complexity, dynamics and sequence content of transcriptomes in a near-complete ...fashion. This method generates directional, random-primed, linear cDNA libraries that are optimized for next-generation short-tag sequencing. We surveyed the poly(A)(+) transcriptomes of undifferentiated mouse embryonic stem cells (ESCs) and embryoid bodies (EBs) at an unprecedented depth (10 Gb), using the Applied Biosystems SOLiD technology. These libraries capture the genomic landscape of expression, state-specific expression, single-nucleotide polymorphisms (SNPs), the transcriptional activity of repeat elements, and both known and new alternative splicing events. We investigated the impact of transcriptional complexity on current models of key signaling pathways controlling ESC pluripotency and differentiation, highlighting how SQRL can be used to characterize transcriptome content and dynamics in a quantitative and reproducible manner, and suggesting that our understanding of transcriptional complexity is far from complete.
Cancer is driven by mutation. Worldwide, tobacco smoking is the principal lifestyle exposure that causes cancer, exerting carcinogenicity through >60 chemicals that bind and mutate DNA. Using ...massively parallel sequencing technology, we sequenced a small-cell lung cancer cell line, NCI-H209, to explore the mutational burden associated with tobacco smoking. A total of 22,910 somatic substitutions were identified, including 134 in coding exons. Multiple mutation signatures testify to the cocktail of carcinogens in tobacco smoke and their proclivities for particular bases and surrounding sequence context. Effects of transcription-coupled repair and a second, more general, expression-linked repair pathway were evident. We identified a tandem duplication that duplicates exons 3-8 of CHD7 in frame, and another two lines carrying PVT1-CHD7 fusion genes, indicating that CHD7 may be recurrently rearranged in this disease. These findings illustrate the potential for next-generation sequencing to provide unprecedented insights into mutational processes, cellular repair pathways and gene networks associated with cancer.
Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from ...11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas-70% of the European ancestry in today's African Americans dates back to European gene flow happening only 7-8 generations ago.
Abnormalities of genomic methylation patterns are lethal or cause disease, but the cues that normally designate CpG dinucleotides for methylation are poorly understood. We have developed a new method ...of methylation profiling that has single-CpG resolution and can address the methylation status of repeated sequences. We have used this method to determine the methylation status of >275 million CpG sites in human and mouse DNA from breast and brain tissues. Methylation density at most sequences was found to increase linearly with CpG density and to fall sharply at very high CpG densities, but transposons remained densely methylated even at higher CpG densities. The presence of histone H2A.Z and histone H3 di- or trimethylated at lysine 4 correlated strongly with unmethylated DNA and occurred primarily at promoter regions. We conclude that methylation is the default state of most CpG dinucleotides in the mammalian genome and that a combination of local dinucleotide frequencies, the interaction of repeated sequences, and the presence or absence of histone variants or modifications shields a population of CpG sites (most of which are in and around promoters) from DNA methyltransferases that lack intrinsic sequence specificity.
Methylation, the addition of methyl groups to cytosine (C), plays an important role in the regulation of gene expression in both normal and dysfunctional cells. During bisulfite conversion and ...subsequent PCR amplification, unmethylated Cs are converted into thymine (T), while methylated Cs will not be converted. Sequencing of this bisulfite-treated DNA permits the detection of methylation at specific sites. Through the introduction of next-generation sequencing technologies (NGS) simultaneous analysis of methylation motifs in multiple regions provides the opportunity for hypothesis-free study of the entire methylome. Here we present a whole methylome sequencing study that compares two different bisulfite conversion methods (in solution versus in gel), utilizing the high throughput of the SOLiD System. Advantages and disadvantages of the two different bisulfite conversion methods for constructing sequencing libraries are discussed. Furthermore, the application of the SOLiD bisulfite sequencing to larger and more complex genomes is shown with preliminary in silico created bisulfite converted reads.
Acute leukemia is the most common pediatric malignancy. Some studies suggest early-life exposures to air pollution increase risk of childhood leukemia. Therefore, we explored the association between ...maternal residential proximity to major roadways and risk of acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML). Information on cases with acute leukemia (n = 2030) was obtained for the period 1995-2011 from the Texas Cancer Registry. Birth certificate controls were frequency matched (10:1) on birth year (n = 20,300). Three residential proximity measures were assessed: (1) distance to nearest major roadway, (2) residence within 500 meters of a major roadway, and (3) roadway density. Multivariate logistic regression was used to generate adjusted odds ratios (aOR) and 95% confidence intervals (CI). Mothers who lived ≤500 meters to a major roadway were not more likely to have a child who developed ALL (OR = 1.03; 95% CI: 0.91-1.16) or AML (OR = 0.84; 95% CI: 0.64-1.11). Mothers who lived in areas characterized by high roadway density were not more likely to have children who developed ALL (OR = 1.06, 95% CI: 0.93-1.20) or AML (OR = 0.83, 95% CI: 0.61-1.13). Our results do not support the hypothesis that maternal proximity to major roadways is strongly associated with childhood acute leukemia. Future assessments evaluating the role of early-life exposure to environmental factors on acute leukemia risk should explore novel methods for directly measuring exposures during relevant periods of development.
Background
Lymphoma is one of the most common pediatric malignancies; however, there are few well‐established risk factors. Therefore, we investigated if maternal and perinatal characteristics ...influenced the risk of childhood lymphoma.
Procedure
Information on cases (n = 374) diagnosed with lymphoma and born in Texas for the period 1995–2011 was obtained from the Texas Cancer Registry. Birth certificate controls were randomly selected at a ratio of 10 controls per 1 case for the same period of 1995–2011. Unconditional logistic regression was used to generate unadjusted (OR) and adjusted odds ratios (aOR) and 95% confidence intervals (CI) for the following histologic subtypes: Hodgkin (HL), Burkitt (BL), and non‐BL non‐HLs (non‐BL NHLs).
Results
Overall, our findings indicate specific maternal and perinatal characteristics influence childhood lymphoma risk. Mexico‐born mothers were more likely to have offspring who developed BL compared to mothers born in the United States (U.S.; aOR: 2.15; 95% CI: 1.06–4.36). Further, mothers who resided at time of delivery in a county on the U.S.‐Mexico border were more likely to give birth to offspring who developed non‐BL NHL (aOR: 1.72; 95% CI: 1.11–2.67) compared to mothers not living on the U.S.‐Mexico border at time of infant birth. Last, infants born large‐for‐gestational‐age experienced a twofold increase in BL risk (aOR: 2.00; 95% CI: 1.10–3.65).
Conclusions
In this population‐based assessment, we confirmed previously reported risk predictors of childhood lymphoma, including sex of infant, while highlighting novel risk factors that warrant assessment in future studies.