Abstract
We evaluate the shared genetic regulation of mRNA molecules, proteins and metabolites derived from whole blood from 3029 human donors. We find abundant allelic heterogeneity, where multiple ...variants regulate a particular molecular phenotype, and pleiotropy, where a single variant associates with multiple molecular phenotypes over multiple genomic regions. The highest proportion of share genetic regulation is detected between gene expression and proteins (66.6%), with a further median shared genetic associations across 49 different tissues of 78.3% and 62.4% between plasma proteins and gene expression. We represent the genetic and molecular associations in networks including 2828 known GWAS variants, showing that GWAS variants are more often connected to gene expression in trans than other molecular phenotypes in the network. Our work provides a roadmap to understanding molecular networks and deriving the underlying mechanism of action of GWAS variants using different molecular phenotypes in an accessible tissue.
Non-alcoholic fatty liver disease (NAFLD) is highly prevalent and causes serious health complications in individuals with and without type 2 diabetes (T2D). Early diagnosis of NAFLD is important, as ...this can help prevent irreversible damage to the liver and, ultimately, hepatocellular carcinomas. We sought to expand etiological understanding and develop a diagnostic tool for NAFLD using machine learning.
We utilized the baseline data from IMI DIRECT, a multicenter prospective cohort study of 3,029 European-ancestry adults recently diagnosed with T2D (n = 795) or at high risk of developing the disease (n = 2,234). Multi-omics (genetic, transcriptomic, proteomic, and metabolomic) and clinical (liver enzymes and other serological biomarkers, anthropometry, measures of beta-cell function, insulin sensitivity, and lifestyle) data comprised the key input variables. The models were trained on MRI-image-derived liver fat content (<5% or ≥5%) available for 1,514 participants. We applied LASSO (least absolute shrinkage and selection operator) to select features from the different layers of omics data and random forest analysis to develop the models. The prediction models included clinical and omics variables separately or in combination. A model including all omics and clinical variables yielded a cross-validated receiver operating characteristic area under the curve (ROCAUC) of 0.84 (95% CI 0.82, 0.86; p < 0.001), which compared with a ROCAUC of 0.82 (95% CI 0.81, 0.83; p < 0.001) for a model including 9 clinically accessible variables. The IMI DIRECT prediction models outperformed existing noninvasive NAFLD prediction tools. One limitation is that these analyses were performed in adults of European ancestry residing in northern Europe, and it is unknown how well these findings will translate to people of other ancestries and exposed to environmental risk factors that differ from those of the present cohort. Another key limitation of this study is that the prediction was done on a binary outcome of liver fat quantity (<5% or ≥5%) rather than a continuous one.
In this study, we developed several models with different combinations of clinical and omics data and identified biological features that appear to be associated with liver fat accumulation. In general, the clinical variables showed better prediction ability than the complex omics variables. However, the combination of omics and clinical variables yielded the highest accuracy. We have incorporated the developed clinical models into a web interface (see: https://www.predictliverfat.org/) and made it available to the community.
ClinicalTrials.gov NCT03814915.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Biomarkers for early detection of breast cancer may complement population screening approaches to enable earlier and more precise treatment. The blood proteome is an important source for biomarker ...discovery but so far, few proteins have been identified with breast cancer risk. Here, we measure 2929 unique proteins in plasma from 598 women selected from the Karolinska Mammography Project to explore the association between protein levels, clinical characteristics, and gene variants, and to identify proteins with a causal role in breast cancer. We present 812 cis-acting protein quantitative trait loci for 737 proteins which are used as instruments in Mendelian randomisation analyses of breast cancer risk. Of those, we present five proteins (CD160, DNPH1, LAYN, LRRC37A2 and TLR1) that show a potential causal role in breast cancer risk with confirmatory results in independent cohorts. Our study suggests that these proteins should be further explored as biomarkers and potential drug targets in breast cancer.
•Current risk prediction models use a variety of factors to identify women at risk of developing breast cancer.•Proteins circulating in blood represent an attractive but currently still ...underrepresented source of candidates serving as molecular risk factors.•Plasma samples from women participating in a prospective breast cancer cohort study were studied for proteomic risk factors related to a future breast cancer diagnosis.•Applying data-driven approaches on the levels of circulating proteins, women with future breast cancers and previous use of menopausal hormone therapy were identified.•Menopausal hormone therapy was found to alter components of the circulating proteomes even years after the treatment ended.
Accessible risk predictors are crucial for improving the early detection and prognosis of breast cancer. Blood samples are widely available and contain proteins that provide important information about human health and disease, however, little is still known about the contribution of circulating proteins to breast cancer risk prediction. We profiled EDTA plasma samples collected before diagnosis from the Swedish KARMA breast cancer cohort to evaluate circulating proteins as molecular predictors. A data-driven analysis strategy was applied to the molecular phenotypes built on 700 circulating proteins to identify and annotate clusters of women. The unsupervised analysis of 183 future breast cancer cases and 366 age-matched controls revealed five stable clusters with distinct proteomic plasma profiles. Among these women, those in the most stable cluster (N = 19; mean Jaccard index: 0.70 ± 0.29) were significantly more likely to have used menopausal hormonal therapy (MHT), get a breast cancer diagnosis, and were older compared to the remaining clusters. The circulating proteins associated with this cluster (FDR < 0.001) represented physiological processes related to cell junctions (F11R, CLDN15, ITGAL), DNA repair (RBBP8), cell replication (TJP3), and included proteins found in female reproductive tissue (PTCH1, ZP4). Using a data-driven approach on plasma proteomics data revealed the potential long-lasting molecular effects of menopausal hormonal therapy (MHT) on the circulating proteome, even after women had ended their treatment. This provides valuable insights concerning proteomics efforts to identify molecular markers for breast cancer risk prediction.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Self-sampling of dried blood spots (DBS) offers new routes to gather valuable health-related information from the general population. Yet, the utility of using deep proteome profiling from ...home-sampled DBS to obtain clinically relevant insights about SARS-CoV-2 infections remains largely unexplored.
Our study involved 228 individuals from the general Swedish population who used a volumetric DBS sampling device and completed questionnaires at home during spring 2020 and summer 2021. Using multi-analyte COVID-19 serology, we stratified the donors by their response phenotypes, divided them into three study sets, and analyzed 276 proteins by proximity extension assays (PEA). After normalizing the data to account for variances in layman-collected samples, we investigated the association of DBS proteomes with serology and self-reported information.
Our three studies display highly consistent variance of protein levels and share associations of proteins with sex (e.g., MMP3) and age (e.g., GDF-15). Studying seropositive (IgG
) and seronegative (IgG
) donors from the first pandemic wave reveals a network of proteins reflecting immunity, inflammation, coagulation, and stress response. A comparison of the early-infection phase (IgM
IgG
) with the post-infection phase (IgM
IgG
) indicates several proteins from the respiratory system. In DBS from the later pandemic wave, we find that levels of a virus receptor on B-cells differ between seropositive (IgG
) and seronegative (IgG
) donors.
Proteome analysis of volumetric self-sampled DBS facilitates precise analysis of clinically relevant proteins, including those secreted into the circulation or found on blood cells, augmenting previous COVID-19 reports with clinical blood collections. Our population surveys support the usefulness of DBS, underscoring the role of timing the sample collection to complement clinical and precision health monitoring initiatives.
Precision medicine approaches aim to tackle diseases on an individual level through molecular profiling. Despite the growing knowledge about diseases and the reported diversity of molecular ...phenotypes, the descriptions of human health on an individual level have been far less elaborate.
To provide insights into the longitudinal protein signatures of well-being, we profiled blood plasma collected over one year from 101 clinically healthy individuals using multiplexed antibody assays. After applying an antibody validation scheme, we utilized > 700 protein profiles for in-depth analyses of the individuals’ short-term health trajectories.
We found signatures of circulating proteomes to be highly individual-specific. Considering technical and longitudinal variability, we observed that 49% of the protein profiles were stable over one year. We also identified eight networks of proteins in which 11–242 proteins covaried over time. For each participant, there were unique protein profiles of which some could be explained by associations to genetic variants.
This observational and non-interventional study identifyed noticeable diversity among clinically healthy subjects, and facets of individual-specific signatures emerged by monitoring the variability of the circulating proteomes over time. To enable more personal hence precise assessments of health states, longitudinal profiling of circulating proteomes can provide a valuable component for precision medicine approaches.
This work was supported by the Erling Persson Foundation, the Swedish Heart and Lung Foundation, the Knut and Alice Wallenberg Foundation, Science for Life Laboratory, and the Swedish Research Council.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Current breast cancer risk prediction scores and algorithms can potentially be further improved by including molecular markers. To this end, we studied the association of circulating plasma proteins ...using Proximity Extension Assay (PEA) with incident breast cancer risk.
In this study, we included 1577 women participating in the prospective KARMA mammographic screening cohort.
In a targeted panel of 164 proteins, we found 8 candidates nominally significantly associated with short-term breast cancer risk (P < 0.05). Similarly, in an exploratory panel consisting of 2204 proteins, 115 were found nominally significantly associated (P < 0.05). However, none of the identified protein levels remained significant after adjustment for multiple testing. This lack of statistically significant findings was not due to limited power, but attributable to the small effect sizes observed even for nominally significant proteins. Similarly, adding plasma protein levels to established risk factors did not improve breast cancer risk prediction accuracy.
Our results indicate that the levels of the studied plasma proteins captured by the PEA method are unlikely to offer additional benefits for risk prediction of short-term overall breast cancer risk but could provide interesting insights into the biological basis of breast cancer in the future.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Despite recognizing aging as a common risk factor of many human diseases, little is known about its molecular traits. To identify age-associated proteins circulating in human blood, we screened 156 ...individuals aged 50–92 using exploratory and multiplexed affinity proteomics assays. Profiling eight additional study sets (N = 3,987), performing antibody validation, and conducting a meta-analysis revealed a consistent age association (
P
= 6.61 × 10
−6
) for circulating histidine-rich glycoprotein (HRG). Sequence variants of HRG influenced how the protein was recognized in the immunoassays. Indeed, only the HRG profiles affected by rs9898 were associated with age and predicted the risk of mortality (HR = 1.25 per SD; 95% CI = 1.12–1.39;
P
= 6.45 × 10
−5
) during a follow-up period of 8.5 yr after blood sampling (IQR = 7.7–9.3 yr). Our affinity proteomics analysis found associations between the particular molecular traits of circulating HRG with age and all-cause mortality. The distinct profiles of this multipurpose protein could serve as an accessible and informative indicator of the physiological processes related to biological aging.