There is growing recognition that epivariations, most often recognized as promoter hypermethylation events that lead to gene silencing, are associated with a number of human diseases. However, little ...information exists on the prevalence and distribution of rare epigenetic variation in the human population. In order to address this, we performed a survey of methylation profiles from 23,116 individuals using the Illumina 450k array. Using a robust outlier approach, we identified 4,452 unique autosomal epivariations, including potentially inactivating promoter methylation events at 384 genes linked to human disease. For example, we observed promoter hypermethylation of BRCA1 and LDLR at population frequencies of ∼1 in 3,000 and ∼1 in 6,000, respectively, suggesting that epivariations may underlie a fraction of human disease which would be missed by purely sequence-based approaches. Using expression data, we confirmed that many epivariations are associated with outlier gene expression. Analysis of variation data and monozygous twin pairs suggests that approximately two-thirds of epivariations segregate in the population secondary to underlying sequence mutations, while one-third are likely sporadic events that occur post-zygotically. We identified 25 loci where rare hypermethylation coincided with the presence of an unstable CGG tandem repeat, validated the presence of CGG expansions at several loci, and identified the putative molecular defect underlying most of the known folate-sensitive fragile sites in the genome. Our study provides a catalog of rare epigenetic changes in the human genome, gives insight into the underlying origins and consequences of epivariations, and identifies many hypermethylated CGG repeat expansions.
Certain human traits such as neurodevelopmental disorders (NDs) and congenital anomalies (CAs) are believed to be primarily genetic in origin. However, even after whole-genome sequencing (WGS), a ...substantial fraction of such disorders remain unexplained. We hypothesize that some cases of ND-CA are caused by aberrant DNA methylation leading to dysregulated genome function. Comparing DNA methylation profiles from 489 individuals with ND-CAs against 1534 controls, we identify epivariations as a frequent occurrence in the human genome. De novo epivariations are significantly enriched in cases, while RNAseq analysis shows that epivariations often have an impact on gene expression comparable to loss-of-function mutations. Additionally, we detect and replicate an enrichment of rare sequence mutations overlapping CTCF binding sites close to epivariations, providing a rationale for interpreting non-coding variation. We propose that epivariations contribute to the pathogenesis of some patients with unexplained ND-CAs, and as such likely have diagnostic relevance.
The recent identification of progenitor populations that contribute to the developing heart in a distinct spatial and temporal manner has fundamentally improved our understanding of cardiac ...development. However, the mechanisms that direct atrial versus ventricular specification remain largely unknown. Here we report the identification of a progenitor population that gives rise primarily to cardiovascular cells of the ventricles and only to few atrial cells (<5%) of the differentiated heart. These progenitors are specified during gastrulation, when they transiently express Foxa2, a gene not previously implicated in cardiac development. Importantly, Foxa2+ cells contribute to previously identified progenitor populations in a defined pattern and ratio. Lastly, we describe an analogous Foxa2+ population during differentiation of embryonic stem cells. Together, these findings provide insight into the developmental origin of ventricular and atrial cells, and may lead to the establishment of new strategies for generating chamber-specific cell types from pluripotent stem cells.
Although DNA methylation is the best characterized epigenetic mark, the mechanism by which it is targeted to specific regions in the genome remains unclear. Recent studies have revealed that local ...DNA methylation profiles might be dictated by cis-regulatory DNA sequences that mainly operate via DNA-binding factors. Consistent with this finding, we have recently shown that disruption of CTCF-binding sites by rare single nucleotide variants (SNVs) can underlie cis-linked DNA methylation changes in patients with congenital anomalies. These data raise the hypothesis that rare genetic variation at transcription factor binding sites (TFBSs) might contribute to local DNA methylation patterning. In this work, by combining blood genome-wide DNA methylation profiles, whole genome sequencing-derived SNVs from 247 unrelated individuals along with 133 predicted TFBS motifs derived from ENCODE ChIP-Seq data, we observed an association between the disruption of binding sites for multiple TFs by rare SNVs and extreme DNA methylation values at both local and, to a lesser extent, distant CpGs. While the majority of these changes affected only single CpGs, 24% were associated with multiple outlier CpGs within ±1kb of the disrupted TFBS. Interestingly, disruption of functionally constrained sites within TF motifs lead to larger DNA methylation changes at nearby CpG sites. Altogether, these findings suggest that rare SNVs at TFBS negatively influence TF-DNA binding, which can lead to an altered local DNA methylation profile. Furthermore, subsequent integration of DNA methylation and RNA-Seq profiles from cardiac tissues enabled us to observe an association between rare SNV-directed DNA methylation and outlier expression of nearby genes. In conclusion, our findings not only provide insights into the effect of rare genetic variation at TFBS on shaping local DNA methylation and its consequences on genome regulation, but also provide a rationale to incorporate DNA methylation data to interpret the functional role of rare variants.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Expansions of short tandem repeats are the cause of many neurogenetic disorders including familial amyotrophic lateral sclerosis, Huntington disease, and many others. Multiple methods have been ...recently developed that can identify repeat expansions in whole genome or exome sequencing data. Despite the widely recognized need for visual assessment of variant calls in clinical settings, current computational tools lack the ability to produce such visualizations for repeat expansions. Expanded repeats are difficult to visualize because they correspond to large insertions relative to the reference genome and involve many misaligning and ambiguously aligning reads.
We implemented REViewer, a computational method for visualization of sequencing data in genomic regions containing long repeat expansions and FlipBook, a companion image viewer designed for manual curation of large collections of REViewer images. To generate a read pileup, REViewer reconstructs local haplotype sequences and distributes reads to these haplotypes in a way that is most consistent with the fragment lengths and evenness of read coverage. To create appropriate training materials for onboarding new users, we performed a concordance study involving 12 scientists involved in short tandem repeat research. We used the results of this study to create a user guide that describes the basic principles of using REViewer as well as a guide to the typical features of read pileups that correspond to low confidence repeat genotype calls. Additionally, we demonstrated that REViewer can be used to annotate clinically relevant repeat interruptions by comparing visual assessment results of 44 FMR1 repeat alleles with the results of triplet repeat primed PCR. For 38 of these alleles, the results of visual assessment were consistent with triplet repeat primed PCR.
Read pileup plots generated by REViewer offer an intuitive way to visualize sequencing data in regions containing long repeat expansions. Laboratories can use REViewer and FlipBook to assess the quality of repeat genotype calls as well as to visually detect interruptions or other imperfections in the repeat sequence and the surrounding flanking regions. REViewer and FlipBook are available under open-source licenses at https://github.com/illumina/REViewer and https://github.com/broadinstitute/flipbook respectively.
Identification of imprinted genes, demonstrating a consistent preference towards the paternal or maternal allelic expression, is important for the understanding of gene expression regulation during ...embryonic development and of the molecular basis of developmental disorders with a parent-of-origin effect. Combining allelic analysis of RNA-Seq data with phased genotypes in family trios provides a powerful method to detect parent-of-origin biases in gene expression.
We report findings in 296 family trios from two large studies: 165 lymphoblastoid cell lines from the 1000 Genomes Project and 131 blood samples from the Genome of the Netherlands (GoNL) participants. Based on parental haplotypes, we identified > 2.8 million transcribed heterozygous SNVs phased for parental origin and developed a robust statistical framework for measuring allelic expression. We identified a total of 45 imprinted genes and one imprinted unannotated transcript, including multiple imprinted transcripts showing incomplete parental expression bias that was located adjacent to strongly imprinted genes. For example, PXDC1, a gene which lies adjacent to the paternally expressed gene FAM50B, shows a 2:1 paternal expression bias. Other imprinted genes had promoter regions that coincide with sites of parentally biased DNA methylation identified in the blood from uniparental disomy (UPD) samples, thus providing independent validation of our results. Using the stranded nature of the RNA-Seq data in lymphoblastoid cell lines, we identified multiple loci with overlapping sense/antisense transcripts, of which one is expressed paternally and the other maternally. Using a sliding window approach, we searched for imprinted expression across the entire genome, identifying a novel imprinted putative lncRNA in 13q21.2. Overall, we identified 7 transcripts showing parental bias in gene expression which were not reported in 4 other recent RNA-Seq studies of imprinting.
Our methods and data provide a robust and high-resolution map of imprinted gene expression in the human genome.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Short tandem repeats (STRs) contribute significantly to genetic diversity in humans, including disease-causing variation. Although the effect of STR variation on gene expression has been extensively ...assessed, their impact on epigenetics has been poorly studied and limited to specific genomic regions. Here, we investigated the hypothesis that some STRs act as independent regulators of local DNA methylation in the human genome and modify risk of common human traits. To address these questions, we first analyzed two independent data sets comprising PCR-free whole-genome sequencing (WGS) and genome-wide DNA methylation levels derived from whole-blood samples in 245 (discovery cohort) and 484 individuals (replication cohort). Using genotypes for 131,635 polymorphic STRs derived from WGS using HipSTR, we identified 11,870 STRs that associated with DNA methylation levels (mSTRs) of 11,774 CpGs (Bonferroni
< 0.001) in our discovery cohort, with 90% successfully replicating in our second cohort. Subsequently, through fine-mapping using CAVIAR we defined 585 of these mSTRs as the likely causal variants underlying the observed associations (fm-mSTRs) and linked a fraction of these to previously reported genome-wide association study signals, providing insights into the mechanisms underlying complex human traits. Furthermore, by integrating gene expression data, we observed that 12.5% of the tested fm-mSTRs also modulate expression levels of nearby genes, reinforcing their regulatory potential. Overall, our findings expand the catalog of functional sequence variants that affect genome regulation, highlighting the importance of incorporating STRs in future genetic association analysis and epigenetics data for the interpretation of trait-associated variants.
The human genome contains tens of thousands of large tandem repeats and hundreds of genes that show common and highly variable copy-number changes. Due to their large size and repetitive nature, ...these variable number tandem repeats (VNTRs) and multicopy genes are generally recalcitrant to standard genotyping approaches and, as a result, this class of variation is poorly characterized. However, several recent studies have demonstrated that copy-number variation of VNTRs can modify local gene expression, epigenetics, and human traits, indicating that many have a functional role. Here, using read depth from whole-genome sequencing to profile copy number, we report results of a phenome-wide association study (PheWAS) of VNTRs and multicopy genes in a discovery cohort of ∼35,000 samples, identifying 32 traits associated with copy number of 38 VNTRs and multicopy genes at 1% FDR. We replicated many of these signals in an independent cohort and observed that VNTRs showing trait associations were significantly enriched for expression QTLs with nearby genes, providing strong support for our results. Fine-mapping studies indicated that in the majority (∼90%) of cases, the VNTRs and multicopy genes we identified represent the causal variants underlying the observed associations. Furthermore, several lie in regions where prior SNV-based GWASs have failed to identify any significant associations with these traits. Our study indicates that copy number of VNTRs and multicopy genes contributes to diverse human traits and suggests that complex structural variants potentially explain some of the so-called “missing heritability” of SNV-based GWASs.
Variable number tandem repeats (VNTRs) are composed of large tandemly repeated motifs, many of which are highly polymorphic in copy number. However, because of their large size and repetitive nature, ...they remain poorly studied. To investigate the regulatory potential of VNTRs, we used read-depth data from Illumina whole-genome sequencing to perform association analysis between copy number of ∼70,000 VNTRs (motif size ≥ 10 bp) with both gene expression (404 samples in 48 tissues) and DNA methylation (235 samples in peripheral blood), identifying thousands of VNTRs that are associated with local gene expression (eVNTRs) and DNA methylation levels (mVNTRs). Using an independent cohort, we validated 73%–80% of signals observed in the two discovery cohorts, while allelic analysis of VNTR length and CpG methylation in 30 Oxford Nanopore genomes gave additional support for mVNTR loci, thus providing robust evidence to support that these represent genuine associations. Further, conditional analysis indicated that many eVNTRs and mVNTRs act as QTLs independently of other local variation. We also observed strong enrichments of eVNTRs and mVNTRs for regulatory features such as enhancers and promoters. Using the Human Genome Diversity Panel, we define sets of VNTRs that show highly divergent copy numbers among human populations and show that these are enriched for regulatory effects and preferentially associate with genes that have been linked with human phenotypes through GWASs. Our study provides strong evidence supporting functional variation at thousands of VNTRs and defines candidate sets of VNTRs, copy number variation of which potentially plays a role in numerous human phenotypes.
Fusarium wilt caused by Fusarium oxysporum f. sp. ciceris is one of the serious disease causes tremendous loss to crop throughout the world and has assumed serious proportions in the recent years. ...Wilt affected chickpea plant roots samples were collected from forty eight different regions of Maharashtra. The collected samples were characterized by designed ITS primers. The Fusarium oxysporum f. sp. ciceris (Foc) showed about 302 bp amplicon when subjected to PCR with these primers. Further the amplified product was digested with five restriction enzymes viz. AluI, EcoRI, HaeIII, MboI and MspI and restriction fragment length polymorphism (RFLP) pattern were analysed. A dendrogram derived from ITS-RFLP analysis of the rDNA region divided the Foc isolates into four clusters. The maximum genetic distance 0.75 exhibited by five Foc isolates viz. Foc-30, Foc-36, Foc-24, Foc-25 and Foc-48 belonging to Shahartakli, Shelur, Nashik, Satara, Kolhapur regions respectively and found to be more diverse among the other Foc isolates. Whereas isolate no. Foc-20, Foc-21 and Foc-8 from Ambajogai, Karjat and Rahata shown minimum genetic distance (0.13). This showed the genetic variability among the Foc isolates collected from different regions of Maharashtra.