Polygenic scores (PGSs) aggregate the effects of variants across the genome to estimate genetic liability, but have lower performance in external study populations. A new study by Ding et al. has ...applied a novel framework to estimate the individual-level predictive accuracy of PGSs, and demonstrates that performance reduction occurs linearly with genetic distance.
Polygenic scores (PGSs) aggregate the effects of variants across the genome to estimate genetic liability, but have lower performance in external study populations. A new study by Ding et al. has applied a novel framework to estimate the individual-level predictive accuracy of PGSs, and demonstrates that performance reduction occurs linearly with genetic distance.
Genome-wide association studies (GWASs) can require immense sample sizes to identify variants associated with human health across the frequency spectrum. As the Global Biobank Meta-analysis ...Initiative (GBMI), Zhou et al. describe a collaborative network across 23 biobanks and 2.2 million participants to address challenges of underrepresentation of diversity in genomic research.
Genome-wide association studies (GWASs) can require immense sample sizes to identify variants associated with human health across the frequency spectrum. As the Global Biobank Meta-analysis Initiative (GBMI), Zhou et al. describe a collaborative network across 23 biobanks and 2.2 million participants to address challenges of underrepresentation of diversity in genomic research.
The vast majority of genome-wide association studies (GWASs) are performed in Europeans, and their transferability to other populations is dependent on many factors (e.g., linkage disequilibrium, ...allele frequencies, genetic architecture). As medical genomics studies become increasingly large and diverse, gaining insights into population history and consequently the transferability of disease risk measurement is critical. Here, we disentangle recent population history in the widely used 1000 Genomes Project reference panel, with an emphasis on populations underrepresented in medical studies. To examine the transferability of single-ancestry GWASs, we used published summary statistics to calculate polygenic risk scores for eight well-studied phenotypes. We identify directional inconsistencies in all scores; for example, height is predicted to decrease with genetic distance from Europeans, despite robust anthropological evidence that West Africans are as tall as Europeans on average. To gain deeper quantitative insights into GWAS transferability, we developed a complex trait coalescent-based simulation framework considering effects of polygenicity, causal allele frequency divergence, and heritability. As expected, correlations between true and inferred risk are typically highest in the population from which summary statistics were derived. We demonstrate that scores inferred from European GWASs are biased by genetic drift in other populations even when choosing the same causal variants and that biases in any direction are possible and unpredictable. This work cautions that summarizing findings from large-scale GWASs may have limited portability to other populations using standard approaches and highlights the need for generalized risk prediction methods and the inclusion of more diverse individuals in medical genomics.
Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery ...efforts are based on data from populations of European ancestry
. In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities. Critical variants may be missed if they have a low frequency or are completely absent in European populations, especially as the field shifts its attention towards rare variants, which are more likely to be population-specific
. Additionally, effect sizes and their derived risk prediction scores derived in one population may not accurately extrapolate to other populations
. Here we demonstrate the value of diverse, multi-ethnic participants in large-scale genomic studies. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioural phenotypes in 49,839 non-European individuals. Using strategies tailored for analysis of multi-ethnic and admixed populations, we describe a framework for analysing diverse populations, identify 27 novel loci and 38 secondary signals at known loci, as well as replicate 1,444 GWAS catalogue associations across these traits. Our data show evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts and insights into clinical implications. In the United States-where minority populations have a disproportionately higher burden of chronic conditions
-the lack of representation of diverse populations in genetic research will result in inequitable access to precision medicine for those with the highest burden of disease. We strongly advocate for continued, large genome-wide efforts in diverse populations to maximize genetic discovery and reduce health disparities.
Nearly 300 million individuals live with chronic hepatitis B virus (HBV) infection (CHB), for which no curative therapy is available. As viral diversity is associated with pathogenesis and ...immunological control of infection, improved methods to characterize this diversity could aid drug development efforts. Conventionally, viral sequencing data are mapped/aligned to a reference genome, and only the aligned sequences are retained for analysis. Thus, reference selection is critical, yet selecting the most representative reference a priori remains difficult. We investigate an alternative pangenome approach which can combine multiple reference sequences into a graph which can be used during alignment. Using simulated short-read sequencing data generated from publicly available HBV genomes and real sequencing data from an individual living with CHB, we demonstrate alignment to a phylogenetically representative 'genome graph' can improve alignment, avoid issues of reference ambiguity, and facilitate the construction of sample-specific consensus sequences more genetically similar to the individual's infection. Graph-based methods can, therefore, improve efforts to characterize the genetics of viral pathogens, including HBV, and have broader implications in host-pathogen research.
Investigating genetic architecture of complex traits in ancestrally diverse populations is imperative to understand the etiology of disease. However, the current paucity of genetic research in people ...of African and Latin American ancestry, Hispanic and indigenous peoples in the United States is likely to exacerbate existing health disparities for many common diseases. The Population Architecture using Genomics and Epidemiology, Phase II (PAGE II), Study was initiated in 2013 by the National Human Genome Research Institute to expand our understanding of complex trait loci in ethnically diverse and well characterized study populations. To meet this goal, the Multi-Ethnic Genotyping Array (MEGA) was designed to substantially improve fine-mapping and functional discovery by increasing variant coverage across multiple ethnicities at known loci for metabolic, cardiovascular, renal, inflammatory, anthropometric, and a variety of lifestyle traits. Studying the frequency distribution of clinically relevant mutations, putative risk alleles, and known functional variants across multiple populations will provide important insight into the genetic architecture of complex diseases and facilitate the discovery of novel, sometimes population-specific, disease associations. DNA samples from 51,650 self-identified African ancestry (17,328), Hispanic/Latino (22,379), Asian/Pacific Islander (8,640), and American Indian (653) and an additional 2,650 participants of either South Asian or European ancestry, and other reference panels have been genotyped on MEGA by PAGE II. MEGA was designed as a new resource for studying ancestrally diverse populations. Here, we describe the methodology for selecting trait-specific content for use in multi-ethnic populations and how enriching MEGA for this content may contribute to deeper biological understanding of the genetic etiology of complex disease.
Polygenic risk scores (PRSs) aggregate the many small effects of alleles across the human genome to estimate the risk of a disease or disease-related trait for an individual. The potential benefits ...of PRSs include cost-effective enhancement of primary disease prevention, more refined diagnoses and improved precision when prescribing medicines. However, these must be weighed against the potential risks, such as uncertainties and biases in PRS performance, as well as potential misunderstanding and misuse of these within medical practice and in wider society. By addressing key issues including gaps in best practices, risk communication and regulatory frameworks, PRSs can be used responsibly to improve human health. Here, the International Common Disease Alliance's PRS Task Force, a multidisciplinary group comprising expertise in genetics, law, ethics, behavioral science and more, highlights recent research to provide a comprehensive summary of the state of polygenic score research, as well as the needs and challenges as PRSs move closer to widespread use in the clinic.
Polygenic risk scores (PRSs), which often aggregate results from genome-wide association studies, can bridge the gap between initial discovery efforts and clinical applications for the estimation of ...disease risk using genetics. However, there is notable heterogeneity in the application and reporting of these risk scores, which hinders the translation of PRSs into clinical care. Here, in a collaboration between the Clinical Genome Resource (ClinGen) Complex Disease Working Group and the Polygenic Score (PGS) Catalog, we present the Polygenic Risk Score Reporting Standards (PRS-RS), in which we update the Genetic Risk Prediction Studies (GRIPS) Statement to reflect the present state of the field. Drawing on the input of experts in epidemiology, statistics, disease-specific applications, implementation and policy, this comprehensive reporting framework defines the minimal information that is needed to interpret and evaluate PRSs, especially with respect to downstream clinical applications. Items span detailed descriptions of study populations, statistical methods for the development and validation of PRSs and considerations for the potential limitations of these scores. In addition, we emphasize the need for data availability and transparency, and we encourage researchers to deposit and share PRSs through the PGS Catalog to facilitate reproducibility and comparative benchmarking. By providing these criteria in a structured format that builds on existing standards and ontologies, the use of this framework in publishing PRSs will facilitate translation into clinical care and progress towards defining best practice.
Genome-wide association studies (GWASs) have been performed to identify host genetic factors for a range of phenotypes, including for infectious diseases. The use of population-based common control ...subjects from biobanks and extensive consortia is a valuable resource to increase sample sizes in the identification of associated loci with minimal additional expense. Non-differential misclassification of the outcome has been reported when the control subjects are not well characterized, which often attenuates the true effect size. However, for infectious diseases the comparison of affected subjects to population-based common control subjects regardless of pathogen exposure can also result in selection bias. Through simulated comparisons of pathogen-exposed cases and population-based common control subjects, we demonstrate that not accounting for pathogen exposure can result in biased effect estimates and spurious genome-wide significant signals. Further, the observed association can be distorted depending upon strength of the association between a locus and pathogen exposure and the prevalence of pathogen exposure. We also used a real data example from the hepatitis C virus (HCV) genetic consortium comparing HCV spontaneous clearance to persistent infection with both well-characterized control subjects and population-based common control subjects from the UK Biobank. We find biased effect estimates for known HCV clearance-associated loci and potentially spurious HCV clearance associations. These findings suggest that the choice of control subjects is especially important for infectious diseases or outcomes that are conditional upon environmental exposures.
Pathogen exposure is a necessary but insufficient cause of infectious disease. Through simulation and empirical genome-wide association comparisons, Duchen et al. show that ignoring pathogen exposure can bias genetic associations. Control selection is important to accurately characterize the genetics underlying outcomes conditional upon environmental exposures, including infectious diseases.