The rDNA clusters and flanking sequences on human chromosomes 13, 14, 15, 21 and 22 represent large gaps in the current genomic assembly. The organization and the degree of divergence of the human ...rDNA units within an individual nucleolar organizer region (NOR) are only partially known. To address this lacuna, we previously applied transformation-associated recombination (TAR) cloning to isolate individual rDNA units from chromosome 21. That approach revealed an unexpectedly high level of heterogeneity in human rDNA, raising the possibility of corresponding variations in ribosome dynamics. We have now applied the same strategy to analyze an entire rDNA array end-to-end from a copy of chromosome 22. Sequencing of TAR isolates provided the entire NOR sequence, including proximal and distal junctions that may be involved in nucleolar function. Comparison of the newly sequenced rDNAs to reference sequence for chromosomes 22 and 21 revealed variants that are shared in human rDNA in individuals from different ethnic groups, many of them at high frequency. Analysis infers comparable intra- and inter-individual divergence of rDNA units on the same and different chromosomes, supporting the concerted evolution of rDNA units. The results provide a route to investigate further the role of rDNA variation in nucleolar formation and in the empirical associations of nucleoli with pathology.
DNA sequencing identifies common and rare genetic variants for association studies, but studies typically focus on variants in nuclear DNA and ignore the mitochondrial genome. In fact, analyzing ...variants in mitochondrial DNA (mtDNA) sequences presents special problems, which we resolve here with a general solution for the analysis of mtDNA in next-generation sequencing studies. The new program package comprises 1) an algorithm designed to identify mtDNA variants (i.e., homoplasmies and heteroplasmies), incorporating sequencing error rates at each base in a likelihood calculation and allowing allele fractions at a variant site to differ across individuals; and 2) an estimation of mtDNA copy number in a cell directly from whole-genome sequencing data. We also apply the methods to DNA sequence from lymphocytes of ~2,000 SardiNIA Project participants. As expected, mothers and offspring share all homoplasmies but a lesser proportion of heteroplasmies. Both homoplasmies and heteroplasmies show 5-fold higher transition/transversion ratios than variants in nuclear DNA. Also, heteroplasmy increases with age, though on average only ~1 heteroplasmy reaches the 4% level between ages 20 and 90. In addition, we find that mtDNA copy number averages ~110 copies/lymphocyte and is ~54% heritable, implying substantial genetic regulation of the level of mtDNA. Copy numbers also decrease modestly but significantly with age, and females on average have significantly more copies than males. The mtDNA copy numbers are significantly associated with waist circumference (p-value = 0.0031) and waist-hip ratio (p-value = 2.4×10-5), but not with body mass index, indicating an association with central fat distribution. To our knowledge, this is the largest population analysis to date of mtDNA dynamics, revealing the age-imposed increase in heteroplasmy, the relatively high heritability of copy number, and the association of copy number with metabolic traits.
Abstract
Despite the key role of the human ribosome in protein biosynthesis, little is known about the extent of sequence variation in ribosomal DNA (rDNA) or its pre-rRNA and rRNA products. We ...recovered ribosomal DNA segments from a single human chromosome 21 using transformation-associated recombination (TAR) cloning in yeast. Accurate long-read sequencing of 13 isolates covering ∼0.82 Mb of the chromosome 21 rDNA complement revealed substantial variation among tandem repeat rDNA copies, several palindromic structures and potential errors in the previous reference sequence. These clones revealed 101 variant positions in the 45S transcription unit and 235 in the intergenic spacer sequence. Approximately 60% of the 45S variants were confirmed in independent whole-genome or RNA-seq data, with 47 of these further observed in mature 18S/28S rRNA sequences. TAR cloning and long-read sequencing enabled the accurate reconstruction of multiple rDNA units and a new, high-quality 44 838 bp rDNA reference sequence, which we have annotated with variants detected from chromosome 21 of a single individual. The large number of variants observed reveal heterogeneity in human rDNA, opening up the possibility of corresponding variations in ribosome dynamics.
The obesity epidemic is responsible for a substantial economic burden in developed countries and is a major risk factor for type 2 diabetes and cardiovascular disease. The disease is the result not ...only of several environmental risk factors, but also of genetic predisposition. To take advantage of recent advances in gene-mapping technology, we executed a genome-wide association scan to identify genetic variants associated with obesity-related quantitative traits in the genetically isolated population of Sardinia. Initial analysis suggested that several SNPs in the FTO and PFKP genes were associated with increased BMI, hip circumference, and weight. Within the FTO gene, rs9930506 showed the strongest association with BMI (p = 8.6 x10(-7)), hip circumference (p = 3.4 x 10(-8)), and weight (p = 9.1 x 10(-7)). In Sardinia, homozygotes for the rare "G" allele of this SNP (minor allele frequency = 0.46) were 1.3 BMI units heavier than homozygotes for the common "A" allele. Within the PFKP gene, rs6602024 showed very strong association with BMI (p = 4.9 x 10(-6)). Homozygotes for the rare "A" allele of this SNP (minor allele frequency = 0.12) were 1.8 BMI units heavier than homozygotes for the common "G" allele. To replicate our findings, we genotyped these two SNPs in the GenNet study. In European Americans (N = 1,496) and in Hispanic Americans (N = 839), we replicated significant association between rs9930506 in the FTO gene and BMI (p-value for meta-analysis of European American and Hispanic American follow-up samples, p = 0.001), weight (p = 0.001), and hip circumference (p = 0.0005). We did not replicate association between rs6602024 and obesity-related traits in the GenNet sample, although we found that in European Americans, Hispanic Americans, and African Americans, homozygotes for the rare "A" allele were, on average, 1.0-3.0 BMI units heavier than homozygotes for the more common "G" allele. In summary, we have completed a whole genome-association scan for three obesity-related quantitative traits and report that common genetic variants in the FTO gene are associated with substantial changes in BMI, hip circumference, and body weight. These changes could have a significant impact on the risk of obesity-related morbidity in the general population.
β-Thalassemia and sickle cell disease both display a great deal of phenotypic heterogeneity, despite being generally thought of as simple Mendelian diseases. The reasons for this are not well ...understood, although the level of fetal hemoglobin (HbF) is one well characterized ameliorating factor in both of these conditions. To better understand the genetic basis of this heterogeneity, we carried out genome-wide scans with 362,129 common SNPs on 4,305 Sardinians to look for genetic linkage and association with HbF levels, as well as other red blood cell-related traits. Among major variants affecting HbF levels, SNP rs11886868 in the BCL11A gene was strongly associated with this trait (P < 10⁻³⁵). The C allele frequency was significantly higher in Sardinian individuals with elevated HbF levels, detected by screening for β-thalassemia, and patients with attenuated forms of β-thalassemia vs. those with thalassemia major. We also show that the same BCL11A variant is strongly associated with HbF levels in a large cohort of sickle cell patients. These results indicate that BCL11A variants, by modulating HbF levels, act as an important ameliorating factor of the β-thalassemia phenotype, and it is likely they could help ameliorate other hemoglobin disorders. We expect our findings will help to characterize the molecular mechanisms of fetal globin regulation and could eventually contribute to the development of new therapeutic approaches for β-thalassemia and sickle cell anemia.
We report sequencing-based whole-genome association analyses to evaluate the impact of rare and founder variants on stature in 6,307 individuals on the island of Sardinia. We identify two variants ...with large effects. One variant, which introduces a stop codon in the GHR gene, is relatively frequent in Sardinia (0.87% versus <0.01% elsewhere) and in the homozygous state causes Laron syndrome involving short stature. We find that this variant reduces height in heterozygotes by an average of 4.2 cm (-0.64 s.d.). The other variant, in the imprinted KCNQ1 gene (minor allele frequency (MAF) = 7.7% in Sardinia versus <1% elsewhere) reduces height by an average of 1.83 cm (-0.31 s.d.) when maternally inherited. Additionally, polygenic scores indicate that known height-decreasing alleles are at systematically higher frequencies in Sardinians than would be expected by genetic drift. The findings are consistent with selection for shorter stature in Sardinia and a suggestive human example of the proposed 'island effect' reducing the size of large mammals.
High serum uric acid levels elevate pro-inflammatory-state gout crystal arthropathy and place individuals at high risk for cardiovascular morbidity and mortality. Genome-wide scans in the genetically ...isolated Sardinian population identified variants associated with serum uric acid levels as a quantitative trait. They mapped within GLUT9, a Chromosome 4 glucose transporter gene predominantly expressed in liver and kidney. SNP rs6855911 showed the strongest association (p = 1.84 x 10(-16)), along with eight others (p = 7.75 x 10(-16) to 6.05 x 10(-11)). Individuals homozygous for the rare allele of rs6855911 (minor allele frequency = 0.26) had 0.6 mg/dl less uric acid than those homozygous for the common allele; the results were replicated in an unrelated cohort from Tuscany. Our results suggest that polymorphisms in GLUT9 could affect glucose metabolism and uric acid synthesis and/or renal reabsorption, influencing serum uric acid levels over a wide range of values.
Complex trait genome-wide association studies (GWAS) provide an efficient strategy for evaluating large numbers of common variants in large numbers of individuals and for identifying trait-associated ...variants. Nevertheless, GWAS often leave much of the trait heritability unexplained. We hypothesized that some of this unexplained heritability might be due to common and rare variants that reside in GWAS identified loci but lack appropriate proxies in modern genotyping arrays. To assess this hypothesis, we re-examined 7 genes (APOE, APOC1, APOC2, SORT1, LDLR, APOB, and PCSK9) in 5 loci associated with low-density lipoprotein cholesterol (LDL-C) in multiple GWAS. For each gene, we first catalogued genetic variation by re-sequencing 256 Sardinian individuals with extreme LDL-C values. Next, we genotyped variants identified by us and by the 1000 Genomes Project (totaling 3,277 SNPs) in 5,524 volunteers. We found that in one locus (PCSK9) the GWAS signal could be explained by a previously described low-frequency variant and that in three loci (PCSK9, APOE, and LDLR) there were additional variants independently associated with LDL-C, including a novel and rare LDLR variant that seems specific to Sardinians. Overall, this more detailed assessment of SNP variation in these loci increased estimates of the heritability of LDL-C accounted for by these genes from 3.1% to 6.5%. All association signals and the heritability estimates were successfully confirmed in a sample of ∼10,000 Finnish and Norwegian individuals. Our results thus suggest that focusing on variants accessible via GWAS can lead to clear underestimates of the trait heritability explained by a set of loci. Further, our results suggest that, as prelude to large-scale sequencing efforts, targeted re-sequencing efforts paired with large-scale genotyping will increase estimates of complex trait heritability explained by known loci.
To examine transcription factor (TF) network(s), we created mouse ESC lines, in each of which 1 of 50 TFs tagged with a FLAG moiety is inserted into a ubiquitously controllable ...tetracycline-repressible locus. Of the 50 TFs,
Cdx2 provoked the most extensive transcriptome perturbation in ESCs, followed by
Esx1,
Sox9,
Tcf3,
Klf4, and
Gata3. ChIP-Seq revealed that CDX2 binds to promoters of upregulated target genes. By contrast, genes downregulated by CDX2 did not show CDX2 binding but were enriched with binding sites for POU5F1, SOX2, and NANOG. Genes with binding sites for these core TFs were also downregulated by the induction of at least 15 other TFs, suggesting a common initial step for ESC differentiation mediated by interference with the binding of core TFs to their target genes. These ESC lines provide a fundamental resource to study biological networks in ESCs and mice.
In type I blepharophimosis/ptosis/epicanthus inversus syndrome (BPES), eyelid abnormalities are associated with ovarian failure. Type II BPES shows only the eyelid defects, but both types map to ...chromosome 3q23. We have positionally cloned a novel, putative winged helix/forkhead transcription factor gene, FOXL2, that is mutated to produce truncated proteins in type I families and larger proteins in type II. Consistent with an involvement in those tissues, FOXL2 is selectively expressed in the mesenchyme of developing mouse eyelids and in adult ovarian follicles; in adult humans, it appears predominantly in the ovary. FOXL2 represents a candidate gene for the polled/intersex syndrome XX sex-reversal goat.