Generalizability of polygenic prediction models: how is the R2 defined on test data?

E-viri

Recenzirano Odprti dostop

Generalizability of polygenic prediction models: how is the R2 defined on test data?

Staerk, Christian; Klinkhammer, Hannah; Wistuba, Tobias; Maj, Carlo; Mayr, Andreas

BMC medical genomics, 05/2024, Letnik: 17, Številka: 1

Journal Article

Polygenic risk scores (PRS) quantify an individual's genetic predisposition for different traits and are expected to play an increasingly important role in personalized medicine. A crucial challenge in clinical practice is the generalizability and transferability of PRS models to populations with different ancestries. When assessing the generalizability of PRS models for continuous traits, the formula omitted is a commonly used measure to evaluate prediction accuracy. While the formula omitted is a well-defined goodness-of-fit measure for statistical linear models, there exist different definitions for its application on test data, which complicates interpretation and comparison of results. Based on large-scale genotype data from the UK Biobank, we compare three definitions of the formula omitted on test data for evaluating the generalizability of PRS models to different populations. Polygenic models for several phenotypes, including height, BMI and lipoprotein A, are derived based on training data with European ancestry using state-of-the-art regression methods and are evaluated on various test populations with different ancestries. Our analysis shows that the choice of the formula omitted definition can lead to considerably different results on test data, making the comparison of formula omitted values from the literature problematic. While the definition as the squared correlation between predicted and observed phenotypes solely addresses the discriminative performance and always yields values between 0 and 1, definitions of the formula omitted based on the mean squared prediction error (MSPE) with reference to intercept-only models assess both discrimination and calibration. These MSPE-based definitions can yield negative values indicating miscalibrated predictions for out-of-target populations. We argue that the choice of the most appropriate definition depends on the aim of PRS analysis -- whether it primarily serves for risk stratification or also for individual phenotype prediction. Moreover, both correlation-based and MSPE-based definitions of formula omitted can provide valuable complementary information. Awareness of the different definitions of the formula omitted on test data is necessary to facilitate the reporting and interpretation of results on PRS generalizability. It is recommended to explicitly state which definition was used when reporting formula omitted values on test data. Further research is warranted to develop and evaluate well-calibrated polygenic models for diverse populations.

Išči dalje

Avtor

Staerk, Christian | Klinkhammer, Hannah | Wistuba, Tobias | Maj, Carlo | Mayr, Andreas

Dostop do baze podatkov JCR je dovoljen samo uporabnikom iz Slovenije. Vaš trenutni IP-naslov ni na seznamu dovoljenih za dostop, zato je potrebna avtentikacija z ustreznim računom AAI.

Leto	Faktor vpliva		Izdaja		Kategorija		Razvrstitev
Leto	JCR	SNIP	JCR	SNIP	JCR	SNIP	JCR	SNIP

Povezave do osebnih bibliografij avtorjev	Povezave do podatkov o raziskovalcih v sistemu SICRIS

Vir: Osebne bibliografije in: SICRIS

Naloži sliko

Vnos na polico

Dodajanje gradiva na polico je uspelo.

Dodajanje gradiva na polico je spodletelo.

Dodajanje gradiva na polico ni bilo potrebno.

Trajna povezava

E-pošta

Faktor vpliva

Izberite knjižnično izkaznico:

Baze podatkov, v katerih je revija indeksirana

Citiranje

Tema