The multifaceted control of gene expression requires tight coordination of regulatory mechanisms at transcriptional and post-transcriptional level. Here, we studied the interdependence of ...transcription initiation, splicing and polyadenylation events on single mRNA molecules by full-length mRNA sequencing.
In MCF-7 breast cancer cells, we find 2700 genes with interdependent alternative transcription initiation, splicing and polyadenylation events, both in proximal and distant parts of mRNA molecules, including examples of coupling between transcription start sites and polyadenylation sites. The analysis of three human primary tissues (brain, heart and liver) reveals similar patterns of interdependency between transcription initiation and mRNA processing events. We predict thousands of novel open reading frames from full-length mRNA sequences and obtained evidence for their translation by shotgun proteomics. The mapping database rescues 358 previously unassigned peptides and improves the assignment of others. By recognizing sample-specific amino-acid changes and novel splicing patterns, full-length mRNA sequencing improves proteogenomics analysis of MCF-7 cells.
Our findings demonstrate that our understanding of transcriptome complexity is far from complete and provides a basis to reveal largely unresolved mechanisms that coordinate transcription initiation and mRNA processing.
SNP panels that uniquely identify an individual are useful for genetic and forensic research. Previously recommended SNP panels are based on DNA profiles and mostly contain intragenic SNPs. With the ...increasing interest in RNA expression profiles, we aimed for establishing a SNP panel for both DNA and RNA-based genotyping.
To determine a small set of SNPs with maximally discriminative power, genotype calls were obtained from DNA and blood-derived RNA sequencing data belonging to healthy, geographically dispersed, Dutch individuals. SNPs were selected based on different criteria like genotype call rate, minor allele frequency, Hardy-Weinberg equilibrium and linkage disequilibrium. A panel of 50 SNPs was sufficient to identify an individual uniquely: the probability of identity was 6.9 × 10
when assuming no family relations and 1.2 × 10
when accounting for the presence of full sibs. The ability of the SNP panel to uniquely identify individuals on DNA and RNA level was validated in an independent population dataset. The panel is applicable to individuals from European descent, with slightly lower power in non-Europeans. Whereas most of the genes containing the 50 SNPs are expressed in various tissues, our SNP panel needs optimization for other tissues than blood.
This first DNA/RNA SNP panel will be useful to identify sample mix-ups in biomedical research and for assigning DNA and RNA stains in crime scenes to unique individuals.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Unambiguous sequence variant descriptions are important in reporting the outcome of clinical diagnostic DNA tests. The standard nomenclature of the Human Genome Variation Society (HGVS) describes the ...observed variant sequence relative to a given reference sequence. We propose an efficient algorithm for the extraction of HGVS descriptions from two sequences with three main requirements in mind: minimizing the length of the resulting descriptions, minimizing the computation time and keeping the unambiguous descriptions biologically meaningful.
Our algorithm is able to compute the HGVS descriptions of complete chromosomes or other large DNA strings in a reasonable amount of computation time and its resulting descriptions are relatively small. Additional applications include updating of gene variant database contents and reference sequence liftovers.
The algorithm is accessible as an experimental service in the Mutalyzer program suite (https://mutalyzer.nl). The C++ source code and Python interface are accessible at: https://github.com/mutalyzer/description-extractor.
j.k.vis@lumc.nl.
Unambiguous variant descriptions are of utmost importance in clinical genetic diagnostics, scientific literature and genetic databases. The Human Genome Variation Society (HGVS) publishes a ...comprehensive set of guidelines on how variants should be correctly and unambiguously described. We present the implementation of the Mutalyzer 2 tool suite, designed to automatically apply the HGVS guidelines so users do not have to deal with the HGVS intricacies explicitly to check and correct their variant descriptions.
Mutalyzer is profusely used by the community, having processed over 133 million descriptions since its launch. Over a five year period, Mutalyzer reported a correct input in ∼50% of cases. In 41% of the cases either a syntactic or semantic error was identified and for ∼7% of cases, Mutalyzer was able to automatically correct the description.
Mutalyzer is an Open Source project under the GNU Affero General Public License. The source code is available on GitHub (https://github.com/mutalyzer/mutalyzer) and a running instance is available at: https://mutalyzer.nl.
Most disease-associated genetic variants are noncoding, making it challenging to design experiments to understand their functional consequences. Identification of expression quantitative trait loci ...(eQTLs) has been a powerful approach to infer the downstream effects of disease-associated variants, but most of these variants remain unexplained. The analysis of DNA methylation, a key component of the epigenome, offers highly complementary data on the regulatory potential of genomic regions. Here we show that disease-associated variants have widespread effects on DNA methylation in trans that likely reflect differential occupancy of trans binding sites by cis-regulated transcription factors. Using multiple omics data sets from 3,841 Dutch individuals, we identified 1,907 established trait-associated SNPs that affect the methylation levels of 10,141 different CpG sites in trans (false discovery rate (FDR) < 0.05). These included SNPs that affect both the expression of a nearby transcription factor (such as NFKB1, CTCF and NKX2-3) and methylation of its respective binding site across the genome. Trans methylation QTLs effectively expose the downstream effects of disease-associated variants.
Genetic risk factors often localize to noncoding regions of the genome with unknown effects on disease etiology. Expression quantitative trait loci (eQTLs) help to explain the regulatory mechanisms ...underlying these genetic associations. Knowledge of the context that determines the nature and strength of eQTLs may help identify cell types relevant to pathophysiology and the regulatory networks underlying disease. Here we generated peripheral blood RNA-seq data from 2,116 unrelated individuals and systematically identified context-dependent eQTLs using a hypothesis-free strategy that does not require previous knowledge of the identity of the modifiers. Of the 23,060 significant cis-regulated genes (false discovery rate (FDR) ≤ 0.05), 2,743 (12%) showed context-dependent eQTL effects. The majority of these effects were influenced by cell type composition. A set of 145 cis-eQTLs depended on type I interferon signaling. Others were modulated by specific transcription factors binding to the eQTL SNPs.
We show that epigenome- and transcriptome-wide association studies (EWAS and TWAS) are prone to significant inflation and bias of test statistics, an unrecognized phenomenon introducing spurious ...findings if left unaddressed. Neither GWAS-based methodology nor state-of-the-art confounder adjustment methods completely remove bias and inflation. We propose a Bayesian method to control bias and inflation in EWAS and TWAS based on estimation of the empirical null distribution. Using simulations and real data, we demonstrate that our method maximizes power while properly controlling the false positive rate. We illustrate the utility of our method in large-scale EWAS and TWAS meta-analyses of age and smoking.
The methylome is subject to genetic and environmental effects. Their impact may depend on sex and age, resulting in sex- and age-related physiological variation and disease susceptibility. Here we ...estimate the total heritability of DNA methylation levels in whole blood and estimate the variance explained by common single nucleotide polymorphisms at 411,169 sites in 2,603 individuals from twin families, to establish a catalogue of between-individual variation in DNA methylation. Heritability estimates vary across the genome (mean=19%) and interaction analyses reveal thousands of sites with sex-specific heritability as well as sites where the environmental variance increases with age. Integration with previously published data illustrates the impact of genome and environment across the lifespan at methylation sites associated with metabolic traits, smoking and ageing. These findings demonstrate that our catalogue holds valuable information on locations in the genome where methylation variation between people may reflect disease-relevant environmental exposures or genetic variation.
Different exposures, including diet, physical activity, or external conditions can contribute to genotype-environment interactions (G×E). Although high-dimensional environmental data are increasingly ...available and multiple exposures have been implicated with G×E at the same loci, multi-environment tests for G×E are not established. Here, we propose the structured linear mixed model (StructLMM), a computationally efficient method to identify and characterize loci that interact with one or more environments. After validating our model using simulations, we applied StructLMM to body mass index in the UK Biobank, where our model yields previously known and novel G×E signals. Finally, in an application to a large blood eQTL dataset, we demonstrate that StructLMM can be used to study interactions with hundreds of environmental variables.
Although it is assumed that epigenetic mechanisms, such as changes in DNA methylation (DNAm), underlie the relationship between adverse intrauterine conditions and adult metabolic health, evidence ...from human studies remains scarce. Therefore, we evaluated whether DNAm in whole blood mediated the association between prenatal famine exposure and metabolic health in 422 individuals exposed to famine in utero and 463 (sibling) controls. We implemented a two-step analysis, namely, a genome-wide exploration across 342,596 cytosine-phosphate-guanine dinucleotides (CpGs) for potential mediators of the association between prenatal famine exposure and adult body mass index (BMI), serum triglycerides (TG), or glucose concentrations, which was followed by formal mediation analysis. DNAm mediated the association of prenatal famine exposure with adult BMI and TG but not with glucose. DNAm at
(cg09349128), a gene involved in energy metabolism, mediated 13.4% 95% confidence interval (CI), 5 to 28% of the association between famine exposure and BMI. DNAm at six CpGs, including
(cg19693031), influencing β cell function, and
(cg07397296), affecting lipid metabolism, together mediated 80% (95% CI, 38.5 to 100%) of the association between famine exposure and TG. Analyses restricted to those exposed to famine during early gestation identified additional CpGs mediating the relationship with TG near
(glycolysis) and
(adipogenesis). DNAm at the CpGs involved was associated with gene expression in an external data set and correlated with DNAm levels in fat depots in additional postmortem data. Our data are consistent with the hypothesis that epigenetic mechanisms mediate the influence of transient adverse environmental factors in early life on long-term metabolic health. The specific mechanism awaits elucidation.