Human genetic variation is expected to play a central role in personalized medicine. Yet only a fraction of the natural genetic variation that is harbored by humans has been discovered to date. Here ...we report almost 2 million small insertions and deletions (INDELs) that range from 1 bp to 10,000 bp in length in the genomes of 79 diverse humans. These variants include 819,363 small INDELs that map to human genes. Small INDELs frequently were found in the coding exons of these genes, and several lines of evidence indicate that such variation is a major determinant of human biological diversity. Microarray-based genotyping experiments revealed several interesting observations regarding the population genetics of small INDEL variation. For example, we found that many of our INDELs had high levels of linkage disequilibrium (LD) with both HapMap SNPs and with high-scoring SNPs from genome-wide association studies. Overall, our study indicates that small INDEL variation is likely to be a key factor underlying inherited traits and diseases in humans.
Although a large proportion (44%) of the human genome is occupied by transposons and transposon-like repetitive elements, only a small proportion (<0.05%) of these elements remain active today. ...Recent evidence indicates that ∼35–40 subfamilies of Alu , L1 and SVA elements (and possibly HERV-K elements) remain actively mobile in the human genome. These active transposons are of great interest because they continue to produce genetic diversity in human populations and also cause human diseases by integrating into genes. In this review, we examine these active human transposons and explore mechanistic factors that influence their mobilization.
Fluorine-rich granites and rhyolites occur throughout the southern Rocky Mountains, but the origin of F-enrichment has remained unclear. We test if F-enrichment could be inherited from ancient mafic ...lower crust by: (1) measuring amphibole compositions, including F and Cl contents, of lower crustal mafic granulite xenoliths from northern Colorado to determine if they are unusually enriched in halogens; (2) analyzing whole-rock elemental and Sr, Nd, and Pb isotopic compositions for upper crustal Cretaceous to Oligocene igneous rocks in Colorado to evaluate their sources; and (3) comparing batch melting models of mafic lower crustal source rocks to melt F and Cl abundances derived from biotite data from the F-rich silicic Never Summer batholith. This approach allows us to better determine if the mafic lower crust was pre-enriched in F, if it is concentrated enough to generate F-rich anatectic melts, and if geochemical data support an ancient lower crustal origin for the F-rich rocks in the southern Rocky Mountains. Electron microprobe analyses of amphibole in lower crustal mafic granulite xenoliths show they contain 0.56-1.38 wt% F and 0.45-0.73 wt% Cl. Titanium in calcium amphibole thermometry indicates that the amphiboles equilibrated at high to ultrahigh temperature conditions (805 to 940 °C), and semiquantitative amphibole thermobarometry indicates the amphiboles equilibrated at 0.5 to 1.0 GPa prior to entrainment in magmas during the Devonian. Mass balance calculations, based on these new measurements, indicate parts of the mafic lower crust in Colorado are at least 3.5 times more enriched in F than average mafic lower crust. Intrusions coeval with the Laramide Orogeny (75 to 38 Ma) pre-date F-rich magmatism in Colorado and have Sr and Nd isotopic compositions consistent with mafic lower crust ± mantle sources, but many of these intrusions contain elevated Sr/Y ratios (>40) that suggest amphibole was a stable phase during magma generation. The F-rich igneous rocks from the Never Summer igneous complex and Colorado Mineral Belt also have Sr and Nd isotopic compositions that overlap with the lower crustal mafic granulite xenoliths, but they have lower Sr/Y, higher Nb and Y abundances, and distinctly less radiogenic 206Pb/204Pbi compositions than preceding Laramide magmatism. Batch melt modeling indicates low-degree partial melts derived from rocks similar to the mafic lower crustal xenoliths we analyzed can yield silicic melts with >2000 ppm F, similar to estimated F melt concentrations for silicic melts that are interpreted to be parental to evolved leucogranites. We suggest that F-rich silicic melts in the southern Rocky Mountains were sourced from garnet-free mafic lower crust, and that fluid-absent breakdown of amphibole in ultrahigh temperature metamorphic rocks was a key process in their generation. Based on the composition of high-F amphibole measured from lower crustal xenoliths, the temperature of amphibole breakdown and melt generation for these F-enriched source rocks is likely >100 °C higher than similar lower crust with low or average F abundances. As such, these source rocks only melted during periods of unusually high heat flow into the lower crust, such as during an influx of mantle-derived magmas related to rifting or the post-Laramide ignimbrite flare-up in the region. These data have direct implications for the genesis of porphyry Mo mineralization, because they indicate that pre-enrichment of F in the deep crust could be a necessary condition for later anatexis and generation of F-rich magmas.
Abstract
Long Interspersed Element-1 (LINE-1) retrotransposition contributes to inter- and intra-individual genetic variation and occasionally can lead to human genetic disorders. Various strategies ...have been developed to identify human-specific LINE-1 (L1Hs) insertions from short-read whole genome sequencing (WGS) data; however, they have limitations in detecting insertions in complex repetitive genomic regions. Here, we developed a computational tool (PALMER) and used it to identify 203 non-reference L1Hs insertions in the NA12878 benchmark genome. Using PacBio long-read sequencing data, we identified L1Hs insertions that were absent in previous short-read studies (90/203). Approximately 81% (73/90) of the L1Hs insertions reside within endogenous LINE-1 sequences in the reference assembly and the analysis of unique breakpoint junction sequences revealed 63% (57/90) of these L1Hs insertions could be genotyped in 1000 Genomes Project sequences. Moreover, we observed that amplification biases encountered in single-cell WGS experiments led to a wide variation in L1Hs insertion detection rates between four individual NA12878 cells; under-amplification limited detection to 32% (65/203) of insertions, whereas over-amplification increased false positive calls. In sum, these data indicate that L1Hs insertions are often missed using standard short-read sequencing approaches and long-read sequencing approaches can significantly improve the detection of L1Hs insertions present in individual genomes.
Transposable genetic elements are abundant in the genomes of most organisms, including humans. These endogenous mutagens can alter genes, promote genomic rearrangements, and may help to drive the ...speciation of organisms. In this study, we identified almost 11,000 transposon copies that are differentially present in the human and chimpanzee genomes. Most of these transposon copies were mobilized after the existence of a common ancestor of humans and chimpanzees, ∼6 million years ago.
Alu, L1, and SVA insertions accounted for >95% of the insertions in both species. Our data indicate that humans have supported higher levels of transposition than have chimpanzees during the past several million years and have amplified different transposon subfamilies. In both species, ∼34% of the insertions were located within known genes. These insertions represent a form of species-specific genetic variation that may have contributed to the differential evolution of humans and chimpanzees. In addition to providing an initial overview of recently mobilized elements, our collections will be useful for assessing the impact of these insertions on their hosts and for studying the transposition mechanisms of these elements.
The genetic basis for combined pituitary hormone deficiency (CPHD) is complex, involving 30 genes in a variety of syndromic and nonsyndromic presentations. Molecular diagnosis of this disorder is ...valuable for predicting disease progression, avoiding unnecessary surgery, and family planning. We expect that the application of high throughput sequencing will uncover additional contributing genes and eventually become a valuable tool for molecular diagnosis. For example, in the last 3 years, six new genes have been implicated in CPHD using whole-exome sequencing. In this review, we present a historical perspective on gene discovery for CPHD and predict approaches that may facilitate future gene identification projects conducted by clinicians and basic scientists. Guidelines for systematic reporting of genetic variants and assigning causality are emerging. We apply these guidelines retrospectively to reports of the genetic basis of CPHD and summarize modes of inheritance and penetrance for each of the known genes. In recent years, there have been great improvements in databases of genetic information for diverse populations. Some issues remain that make molecular diagnosis challenging in some cases. These include the inherent genetic complexity of this disorder, technical challenges like uneven coverage, differing results from variant calling and interpretation pipelines, the number of tolerated genetic alterations, and imperfect methods for predicting pathogenicity. We discuss approaches for future research in the genetics of CPHD.
Active protein translation can be assessed and measured using ribosome profiling sequencing strategies. Prevailing analytical approaches applied to this technology make use of sequence fragment ...length profiling or reading frame occupancy enrichment to differentiate between active translation and background noise, however they do not consider additional characteristics inherent to the technology which limits their overall accuracy.
Here, we present an analytical tool that models the overall tri-nucleotide periodicity of ribosomal occupancy using a classifier based on spectral coherence. Our software, SPECtre, examines the relationship of normalized ribosome profiling read coverage over a rolling series of windows along a transcript relative to an idealized reference signal without the matched requirement of mRNA-Seq.
A comparison of SPECtre against previously published methods on existing data shows a marked improvement in accuracy for detecting active translation and exhibits overall high accuracy at a low false discovery rate. In addition, SPECtre performs comparably to a recently published method similarly based on spectral coherence, however with reduced runtime and memory requirements. SPECtre is available as an open source software package at https://github.com/mills-lab/spectreok .
Mucoepidermoid Carcinomas (MEC) represent the most common malignancies of salivary glands. Approximately 50% of all MEC cases are known to harbor
gene fusions, but the additional molecular drivers ...remain largely uncharacterized. Here, we sought to resolve controversy around the role of human papillomavirus (HPV) as a potential driver of mucoepidermoid carcinoma. Bioinformatics analysis was performed on 48 MEC transcriptomes. Subsequent targeted capture DNA sequencing was used to annotate HPV content and integration status in the host genome. HPV of any type was only identified in 1/48 (2%) of the MEC transcriptomes analyzed. Importantly, the one HPV16+ tumor expressed high levels of p16, had high expression of HPV16 oncogenes E6 and E7, and displayed a complex integration pattern that included breakpoints into 13 host genes including
,
,
,
and
as well as 9 non-genic regions. In this cohort, HPV is a rare driver of MEC but may have a substantial etiologic role in cases that harbor the virus. Genetic mechanisms of host genome integration are similar to those observed in other head and neck cancers.
Copy number variants (CNVs) represent a substantial source of genomic variation in vertebrates and have been associated with numerous human diseases. Despite this, the extent of CNVs in the ...zebrafish, an important model for human disease, remains unknown. Using 80 zebrafish genomes, representing three commonly used laboratory strains and one native population, we constructed a genome-wide, high-resolution CNV map for the zebrafish comprising 6,080 CNV elements and encompassing 14.6% of the zebrafish reference genome. This amount of copy number variation is four times that previously observed in other vertebrates, including humans. Moreover, 69% of the CNV elements exhibited strain specificity, with the highest number observed for Tubingen. This variation likely arose, in part, from Tubingen's large founding size and composite population origin. Additional population genetic studies also provided important insight into the origins and substructure of these commonly used laboratory strains. This extensive variation among and within zebrafish strains may have functional effects that impact phenotype and, if not properly addressed, such extensive levels of germ-line variation and population substructure in this commonly used model organism can potentially confound studies intended for translation to human diseases.
Copy number variants (CNVs) account for the majority of human genomic diversity in terms of base coverage. Here, we have developed and applied a new method to combine high-resolution array ...comparative genomic hybridization (CGH) data with whole-genome DNA sequencing data to obtain a comprehensive catalog of common CNVs in Asian individuals. The genomes of 30 individuals from three Asian populations (Korean, Chinese and Japanese) were interrogated with an ultra-high-resolution array CGH platform containing 24 million probes. Whole-genome sequencing data from a reference genome (NA10851, with 28.3× coverage) and two Asian genomes (AK1, with 27.8× coverage and AK2, with 32.0× coverage) were used to transform the relative copy number information obtained from array CGH experiments into absolute copy number values. We discovered 5,177 CNVs, of which 3,547 were putative Asian-specific CNVs. These common CNVs in Asian populations will be a useful resource for subsequent genetic studies in these populations, and the new method of calling absolute CNVs will be essential for applying CNV data to personalized medicine.