Technical variation plays an important role in microarray-based gene expression studies, and batch effects explain a large proportion of this noise. It is therefore mandatory to eliminate technical ...variation while maintaining biological variability. Several strategies have been proposed for the removal of batch effects, although they have not been evaluated in large-scale longitudinal gene expression data. In this study, we aimed at identifying a suitable method for batch effect removal in a large study of microarray-based longitudinal gene expression. Monocytic gene expression was measured in 1092 participants of the Gutenberg Health Study at baseline and 5-year follow up. Replicates of selected samples were measured at both time points to identify technical variability. Deming regression, Passing-Bablok regression, linear mixed models, non-linear models as well as ReplicateRUV and ComBat were applied to eliminate batch effects between replicates. In a second step, quantile normalization prior to batch effect correction was performed for each method. Technical variation between batches was evaluated by principal component analysis. Associations between body mass index and transcriptomes were calculated before and after batch removal. Results from association analyses were compared to evaluate maintenance of biological variability. Quantile normalization, separately performed in each batch, combined with ComBat successfully reduced batch effects and maintained biological variability. ReplicateRUV performed perfectly in the replicate data subset of the study, but failed when applied to all samples. All other methods did not substantially reduce batch effects in the replicate data subset. Quantile normalization plus ComBat appears to be a valuable approach for batch correction in longitudinal gene expression data.
Variability of gene expression in human may link gene sequence variability and phenotypes; however, non-genetic variations, alone or in combination with genetics, may also influence expression traits ...and have a critical role in physiological and disease processes.
To get better insight into the overall variability of gene expression, we assessed the transcriptome of circulating monocytes, a key cell involved in immunity-related diseases and atherosclerosis, in 1,490 unrelated individuals and investigated its association with >675,000 SNPs and 10 common cardiovascular risk factors. Out of 12,808 expressed genes, 2,745 expression quantitative trait loci were detected (P<5.78x10(-12)), most of them (90%) being cis-modulated. Extensive analyses showed that associations identified by genome-wide association studies of lipids, body mass index or blood pressure were rarely compatible with a mediation by monocyte expression level at the locus. At a study-wide level (P<3.9x10(-7)), 1,662 expression traits (13.0%) were significantly associated with at least one risk factor. Genome-wide interaction analyses suggested that genetic variability and risk factors mostly acted additively on gene expression. Because of the structure of correlation among expression traits, the variability of risk factors could be characterized by a limited set of independent gene expressions which may have biological and clinical relevance. For example expression traits associated with cigarette smoking were more strongly associated with carotid atherosclerosis than smoking itself.
This study demonstrates that the monocyte transcriptome is a potent integrator of genetic and non-genetic influences of relevance for disease pathophysiology and risk assessment.
Dilated cardiomyopathy (DCM) is an important cause of heart failure with a strong familial component. We performed an exome-wide array-based association study (EWAS) to assess the contribution of ...missense variants to sporadic DCM.
116,855 single nucleotide variants (SNVs) were analyzed in 2796 DCM patients and 6877 control subjects from 6 populations of European ancestry. We confirmed two previously identified associations with SNVs in BAG3 and ZBTB17 and discovered six novel DCM-associated loci (Q-value<0.01). The lead-SNVs at novel loci are common and located in TTN, SLC39A8, MLIP, FLNC, ALPK3 and FHOD3. In silico fine mapping identified HSPB7 as the most likely candidate at the ZBTB17 locus. Rare variant analysis (MAF<0.01) demonstrated significant association for TTN variants only (P = 0.0085). All candidate genes but one (SLC39A8) exhibit preferential expression in striated muscle tissues and mutations in TTN, BAG3, FLNC and FHOD3 are known to cause familial cardiomyopathy. We also investigated a panel of 48 known cardiomyopathy genes. Collectively, rare (n = 228, P = 0.0033) or common (n = 36, P = 0.019) variants with elevated in silico severity scores were associated with DCM, indicating that the spectrum of genes contributing to sporadic DCM extends beyond those identified here.
We identified eight loci independently associated with sporadic DCM. The functions of the best candidate genes at these loci suggest that proteostasis regulation might play a role in DCM pathophysiology.
In this work, we assessed whether SERPINE1 expression could be under the influence of microRNAs (miRNAs) predicted to bind the SERPINE1 3'UTR region. We specifically focused on the 3'UTR region ...harboring a common polymorphism, rs1050955, that have been found associated to SERPINE1 monocyte expression, and investigated whether the presence of different alleles at rs1050955 could modify the miRNAs binding efficiency and affect PAI-1 protein levels. We demonstrated that, in human umbilical vein endothelial cells, both miR-421 and miR-30c directly interacted with PAI-1 mRNA to inhibit the expression of the associated protein. However, these inhibitory mechanisms were independent on the allele present at the rs1050955 locus. We further showed that miR-421 levels correlated with PAI-1 activity in the plasma sample of 40 patients with venous thrombosis. Our results strongly suggest that the regulation of PAI-1 molecule could be under the influence of several miRNAs whose measurement in the plasma of patients could be envisaged as a biomarker for inflammatory and thrombotic disorders.
Venous thromboembolism is the third common cardiovascular disease and is composed of two entities, deep vein thrombosis (DVT) and its potential fatal form, pulmonary embolism (PE). While PE is ...observed in ~ 40% of patients with documented DVT, there is limited biomarkers that can help identifying patients at high PE risk. To fill this need, we implemented a two hidden-layers artificial neural networks (ANN) on 376 antibodies and 19 biological traits measured in the plasma of 1388 DVT patients, with or without PE, of the MARTHA study. We used the LIME algorithm to obtain a linear approximate of the resulting ANN prediction model. As MARTHA patients were typed for genotyping DNA arrays, a genome wide association study (GWAS) was conducted on the LIME estimate. Detected single nucleotide polymorphisms (SNPs) were tested for association with PE risk in MARTHA. Main findings were replicated in the EOVT study composed of 143 PE patients and 196 DVT only patients. The derived ANN model for PE achieved an accuracy of 0.89 and 0.79 in our training and testing sets, respectively. A GWAS on the LIME approximate identified a strong statistical association peak (rs1424597: p = 5.3 × 10
) at the PLXNA4 locus. Homozygote carriers for the rs1424597-A allele were then more frequently observed in PE than in DVT patients from the MARTHA (2% vs. 0.4%, p = 0.005) and the EOVT (3% vs. 0%, p = 0.013) studies. In a sample of 112 COVID-19 patients known to have endotheliopathy leading to acute lung injury and an increased risk of PE, decreased PLXNA4 levels were associated (p = 0.025) with worsened respiratory function. Using an original integrated proteomics and genetics strategy, we identified PLXNA4 as a new susceptibility gene for PE whose exact role now needs to be further elucidated.
Rare variants outside the classical coagulation cascade might cause inherited thrombosis. We aimed to identify the variant(s) causing venous thromboembolism (VTE) in a family with multiple relatives ...affected with unprovoked VTE and no thrombophilia defects. We identified by whole exome sequencing an extremely rare Arg to Gln variant (Arg89Gln) in the Microtubule Associated Serine/Threonine Kinase 2 (MAST2) gene that segregates with VTE in the family. Free-tissue factor pathway inhibitor (f-TFPI) plasma levels were significantly decreased in affected family members compared to healthy relatives. Conversely, plasminogen activator inhibitor-1 (PAI-1) levels were significantly higher in affected members than in healthy relatives. RNA sequencing analysis of RNA interference experimental data conducted in endothelial cells revealed that, of the 13,387 detected expressed genes, 2,354 have their level of expression modified by MAST2 knockdown, including SERPINE1 coding for PAI-1 and TFPI. In HEK293 cells overexpressing the MAST2 Gln89 variant, TFPI and SERPINE1 promoter activities were respectively lower and higher than in cells overexpressing the MAST2 wild type. This study identifies a novel thrombophilia-causing Arg89Gln variant in the MAST2 gene that is here proposed as a new molecular player in the etiology of VTE by interfering with hemostatic balance of endothelial cells.
Abstract
Hereditary Hemorrhagic Telangiectasia (HHT) is a rare, autosomal dominant, vascular disorder. About 80% of cases are caused by pathogenic variants in
ACVRL1
(also known as
ALK1
) and
ENG
, ...with the remaining cases being unexplained. We identified two variants, c.-79C>T and c.-68G>A, in the 5’UTR of ENG in two unrelated patients. They create upstream AUGs at the origin of upstream overlapping open reading frames (uoORFs) ending at the same stop codon. To assess the pathogenicity of these variants, we performed functional assays based on the expression of wild-type and mutant constructs in human cells and evaluated their effect on ALK1 activity in a BMP-response element assay. This assay is mandatory for molecular diagnosis and has been so far only applied to coding
ENG
variants. These variants were associated with a decrease of protein levels in HeLa and HUVEC cells and a decreased ability to activate ALK1. We applied the same experiments on three additional uoORF-creating variants (c.-142A>T, c.-127C>T and c.-10C>T) located in the 5’UTR of ENG and previously reported in HHT patients. We found that all the analyzed variants alter protein levels and function. Additional experiments relying on an artificial deletion in our mutated constructs show that identified uAUGs could initiate the translation indicating that the associated effect is translation-dependent. Overall, we have identified two 5’UTR
ENG
variations in HHT patients and shed new light on the role of upstream ORFs on ENG regulation. Our findings contribute to the amelioration of molecular diagnosis in HHT.
High-throughput sequencing (HTS) technologies are revolutionizing the research and molecular diagnosis landscape by allowing the exploration of millions of nucleotide sequences at an unprecedented ...scale. These technologies are of particular interest in the identification of genetic variations contributing to the risk of rare (Mendelian) and common (multifactorial) human diseases. So far, they have led to numerous successes in identifying rare disease-causing mutations in coding regions, but few in non-coding regions that include introns, untranslated (UTR), and intergenic regions. One class of neglected non-coding variations is that of 5′UTR variants that alter upstream open reading frames (upORFs) of the coding sequence (CDS) of a natural protein coding transcript. Following a brief summary of the molecular bases of the origin and functions of upORFs, we will first review known 5′UTR variations altering upORFs and causing rare cardiovascular disorders (CVDs). We will then investigate whether upORF-affecting single nucleotide polymorphisms could be good candidates for explaining association signals detected in the context of genome-wide association studies for common complex CVDs.
Macrophages are key players involved in numerous pathophysiological pathways and an in-depth characterization of their gene regulatory networks can help in better understanding how their dysfunction ...may impact on human diseases. We here conducted a cross-species network analysis of macrophage gene expression data between human and mouse to identify conserved networks across both species, and assessed whether such networks could reveal new disease-associated regulatory mechanisms. From a sample of 684 individuals processed for genome-wide macrophage gene expression profiling, we identified 27 groups of coexpressed genes (modules). Six modules were found preserved (P < 10
) in macrophages from 86 mice of the Hybrid Mouse Diversity Panel. One of these modules was significantly false discovery rate (FDR) = 8.9 × 10
enriched for genes belonging to the oxidative phosphorylation (OXPHOS) pathway. This pathway was also found significantly (FDR < 10
) enriched in susceptibility genes for Alzheimer, Parkinson, and Huntington diseases. We further conducted an expression quantitative trait loci analysis to identify SNP that could regulate macrophage OXPHOS gene expression in humans. This analysis identified the PARK2 rs192804963 as a trans-acting variant influencing (minimal P-value = 4.3 × 10
) the expression of most OXPHOS genes in humans. Further experimental work demonstrated that PARK2 knockdown expression was associated with increased OXPHOS gene expression in THP1 human macrophages. This work provided strong new evidence that PARK2 participates to the regulatory networks associated with oxidative phosphorylation and suggested that PARK2 genetic variations could act as a trans regulator of OXPHOS gene macrophage expression in humans.