Akademska digitalna zbirka SLovenije - logo
E-viri
Celotno besedilo
Recenzirano Odprti dostop
  • Generalizability of polygen...
    Staerk, Christian; Klinkhammer, Hannah; Wistuba, Tobias; Maj, Carlo; Mayr, Andreas

    BMC medical genomics, 05/2024, Letnik: 17, Številka: 1
    Journal Article

    Polygenic risk scores (PRS) quantify an individual's genetic predisposition for different traits and are expected to play an increasingly important role in personalized medicine. A crucial challenge in clinical practice is the generalizability and transferability of PRS models to populations with different ancestries. When assessing the generalizability of PRS models for continuous traits, the formula omitted is a commonly used measure to evaluate prediction accuracy. While the formula omitted is a well-defined goodness-of-fit measure for statistical linear models, there exist different definitions for its application on test data, which complicates interpretation and comparison of results. Based on large-scale genotype data from the UK Biobank, we compare three definitions of the formula omitted on test data for evaluating the generalizability of PRS models to different populations. Polygenic models for several phenotypes, including height, BMI and lipoprotein A, are derived based on training data with European ancestry using state-of-the-art regression methods and are evaluated on various test populations with different ancestries. Our analysis shows that the choice of the formula omitted definition can lead to considerably different results on test data, making the comparison of formula omitted values from the literature problematic. While the definition as the squared correlation between predicted and observed phenotypes solely addresses the discriminative performance and always yields values between 0 and 1, definitions of the formula omitted based on the mean squared prediction error (MSPE) with reference to intercept-only models assess both discrimination and calibration. These MSPE-based definitions can yield negative values indicating miscalibrated predictions for out-of-target populations. We argue that the choice of the most appropriate definition depends on the aim of PRS analysis -- whether it primarily serves for risk stratification or also for individual phenotype prediction. Moreover, both correlation-based and MSPE-based definitions of formula omitted can provide valuable complementary information. Awareness of the different definitions of the formula omitted on test data is necessary to facilitate the reporting and interpretation of results on PRS generalizability. It is recommended to explicitly state which definition was used when reporting formula omitted values on test data. Further research is warranted to develop and evaluate well-calibrated polygenic models for diverse populations.