Data archiving and distribution are essential to scientific rigor and reproducibility of research. The National Center for Biotechnology Information's Database of Genotypes and Phenotypes (dbGaP) is ...a public repository for scientific data sharing. To support curation of thousands of complex data sets, dbGaP has detailed submission instructions that investigators must follow when archiving their data.
We developed dbGaPCheckup, an R package which implements a series of check, awareness, reporting, and utility functions to support data integrity and proper formatting of the subject phenotype data set and data dictionary prior to dbGaP submission. For example, as a tool, dbGaPCheckup ensures that the data dictionary contains all fields required by dbGaP, and additional fields required by dbGaPCheckup; the number and names of variables match between the data set and data dictionary; there are no duplicated variable names or descriptions; observed data values are not more extreme than the logical minimum and maximum values stated in the data dictionary; and more. The package also includes functions that implement a series of minor/scalable fixes when errors are detected (e.g., a function to reorder the variables in the data dictionary to match the order listed in the data set). Finally, we also include reporting functions that produce graphical and textual descriptives of the data to further reduce the likelihood of data integrity issues. The dbGaPCheckup R package is available on CRAN ( https://CRAN.R-project.org/package=dbGaPCheckup ) and developed on GitHub ( https://github.com/lwheinsberg/dbGaPCheckup ).
dbGaPCheckup is an innovative assistive and timesaving tool that fills an important gap for researchers by making dbGaP submission of large and complex data sets less error prone.
Samoans are a unique founder population with a high prevalence of obesity, making them well suited for identifying new genetic contributors to obesity. We conducted a genome-wide association study ...(GWAS) in 3,072 Samoans, discovered a variant, rs12513649, strongly associated with body mass index (BMI) (P = 5.3 × 10(-14)), and replicated the association in 2,102 additional Samoans (P = 1.2 × 10(-9)). Targeted sequencing identified a strongly associated missense variant, rs373863828 (p.Arg457Gln), in CREBRF (meta P = 1.4 × 10(-20)). Although this variant is extremely rare in other populations, it is common in Samoans (frequency of 0.259), with an effect size much larger than that of any other known common BMI risk variant (1.36-1.45 kg/m(2) per copy of the risk-associated allele). In comparison to wild-type CREBRF, the Arg457Gln variant when overexpressed selectively decreased energy use and increased fat storage in an adipocyte cell model. These data, in combination with evidence of positive selection of the allele encoding p.Arg457Gln, support a 'thrifty' variant hypothesis as a factor in human obesity.
Post hoc power is not informative Heinsberg, Lacey W.; Weeks, Daniel E.
Genetic epidemiology,
October 2022, Volume:
46, Issue:
7
Journal Article
Peer reviewed
Open access
Post hoc power estimates are often requested by reviewers and/or performed by researchers after a study has been conducted. The purpose of this commentary is to provide a heuristic explanation of why ...post hoc power should not be used. To illustrate our point, we provide a detailed simulation study of two essentially identical research experiments hypothetically conducted in parallel at two separate universities. The simulation demonstrates that post hoc power calculations are misleading and simply not informative for data interpretation. As such, we encourage authors and peer‐reviewers to avoid using or requesting post hoc power calculations.
Globally, autosomal recessive IFNAR1 deficiency is a rare inborn error of immunity underlying susceptibility to live attenuated vaccine and wild-type viruses. We report seven children from five ...unrelated kindreds of western Polynesian ancestry who suffered from severe viral diseases. All the patients are homozygous for the same nonsense IFNAR1 variant (p.Glu386*). This allele encodes a truncated protein that is absent from the cell surface and is loss-of-function. The fibroblasts of the patients do not respond to type I IFNs (IFN-α2, IFN-ω, or IFN-β). Remarkably, this IFNAR1 variant has a minor allele frequency >1% in Samoa and is also observed in the Cook, Society, Marquesas, and Austral islands, as well as Fiji, whereas it is extremely rare or absent in the other populations tested, including those of the Pacific region. Inherited IFNAR1 deficiency should be considered in individuals of Polynesian ancestry with severe viral illnesses.
Food allergy (FA) affects 2%-10% of US children and is a growing clinical and public health problem. Here we conduct the first genome-wide association study of well-defined FA, including specific ...subtypes (peanut, milk and egg) in 2,759 US participants (1,315 children and 1,444 parents) from the Chicago Food Allergy Study, and identify peanut allergy (PA)-specific loci in the HLA-DR and -DQ gene region at 6p21.32, tagged by rs7192 (P=5.5 × 10(-8)) and rs9275596 (P=6.8 × 10(-10)), in 2,197 participants of European ancestry. We replicate these associations in an independent sample of European ancestry. These associations are further supported by meta-analyses across the discovery and replication samples. Both single-nucleotide polymorphisms (SNPs) are associated with differential DNA methylation levels at multiple CpG sites (P<5 × 10(-8)), and differential DNA methylation of the HLA-DQB1 and HLA-DRB1 genes partially mediate the identified SNP-PA associations. This study suggests that the HLA-DR and -DQ gene region probably poses significant genetic risk for PA.
Recent successful discoveries of potentially causal single nucleotide polymorphisms (SNPs) for complex diseases hold great promise, and commercialization of genomics in personalized medicine has ...already begun. The hope is that genetic testing will benefit patients and their families, and encourage positive lifestyle changes and guide clinical decisions. However, for many complex diseases, it is arguable whether the era of genomics in personalized medicine is here yet. We focus on the clinical validity of genetic testing with an emphasis on two popular statistical methods for evaluating markers. The two methods, logistic regression and receiver operating characteristic (ROC) curve analysis, are applied to our age-related macular degeneration dataset. By using an additive model of the CFH, LOC387715, and C2 variants, the odds ratios are 2.9, 3.4, and 0.4, with p-values of 10(-13), 10(-13), and 10(-3), respectively. The area under the ROC curve (AUC) is 0.79, but assuming prevalences of 15%, 5.5%, and 1.5% (which are realistic for age groups 80 y, 65 y, and 40 y and older, respectively), only 30%, 12%, and 3% of the group classified as high risk are cases. Additionally, we present examples for four other diseases for which strongly associated variants have been discovered. In type 2 diabetes, our classification model of 12 SNPs has an AUC of only 0.64, and two SNPs achieve an AUC of only 0.56 for prostate cancer. Nine SNPs were not sufficient to improve the discrimination power over that of nongenetic predictors for risk of cardiovascular events. Finally, in Crohn's disease, a model of five SNPs, one with a quite low odds ratio of 0.26, has an AUC of only 0.66. Our analyses and examples show that strong association, although very valuable for establishing etiological hypotheses, does not guarantee effective discrimination between cases and controls. The scientific community should be cautious to avoid overstating the value of association findings in terms of personalized medicine before their time.
Abstract
Family- and population-based genetic studies have successfully identified multiple disease-susceptibility loci for Age-related macular degeneration (AMD), one of the first batch and most ...successful examples of genome-wide association study. However, most genetic studies to date have focused on case-control studies of late AMD (choroidal neovascularization or geographic atrophy). The genetic influences on disease progression are largely unexplored. We assembled unique resources to perform a genome-wide bivariate time-to-event analysis to test for association of time-to-late-AMD with ∼9 million variants on 2721 Caucasians from a large multi-center randomized clinical trial, the Age-Related Eye Disease Study. To our knowledge, this is the first genome-wide association study of disease progression (bivariate survival outcome) in AMD genetic studies, thus providing novel insights to AMD genetics. We used a robust Cox proportional hazards model to appropriately account for between-eye correlation when analyzing the progression time in the two eyes of each participant. We identified four previously reported susceptibility loci showing genome-wide significant association with AMD progression: ARMS2-HTRA1 (P = 8.1 × 10−43), CFH (P = 3.5 × 10−37), C2-CFB-SKIV2L (P = 8.1 × 10−10) and C3 (P = 1.2 × 10−9). Furthermore, we detected association of rs58978565 near TNR (P = 2.3 × 10−8), rs28368872 near ATF7IP2 (P = 2.9 × 10−8) and rs142450006 near MMP9 (P = 0.0006) with progression to choroidal neovascularization but not geographic atrophy. Secondary analysis limited to 34 reported risk variants revealed that LIPC and CTRB2-CTRB1 were also associated with AMD progression (P < 0.0015). Our genome-wide analysis thus expands the genetics in both development and progression of AMD and should assist in early identification of high risk individuals.
Age-related macular degeneration (AMD) is a common cause of blindness in older individuals. To accelerate the understanding of AMD biology and help design new therapies, we executed a collaborative ...genome-wide association study, including >17,100 advanced AMD cases and >60,000 controls of European and Asian ancestry. We identified 19 loci associated at P < 5 × 10(-8). These loci show enrichment for genes involved in the regulation of complement activity, lipid metabolism, extracellular matrix remodeling and angiogenesis. Our results include seven loci with associations reaching P < 5 × 10(-8) for the first time, near the genes COL8A1-FILIP1L, IER3-DDR1, SLC16A8, TGFBR1, RAD51B, ADAMTS9 and B3GALTL. A genetic risk score combining SNP genotypes from all loci showed similar ability to distinguish cases and controls in all samples examined. Our findings provide new directions for biological, genetic and therapeutic studies of AMD.
The A allele of rs373863828 in CREB3 regulatory factor is associated with high Body Mass Index, but lower odds of type 2 diabetes. These associations have been replicated elsewhere, but to date all ...studies have been cross-sectional. Our aims were (1) to describe the development of type 2 diabetes and change in fasting glucose between 2010 and 2018 among a longitudinal cohort of adult Samoans without type 2 diabetes or who were not using diabetes medications at baseline, and (2) to examine associations between fasting glucose rate-of-change (mmol/L per year) and the A allele of rs373863828. We describe and test differences in fasting glucose, the development of type 2 diabetes, body mass index, age, smoking status, physical activity, urbanicity of residence, and household asset scores between 2010 and 2018 among a cohort of n = 401 adult Samoans, selected to have a ~2:2:1 ratio of GG:AG: AA rs373863828 genotypes. Multivariate linear regression was used to test whether fasting glucose rate-of-change was associated with rs373863828 genotype, and other baseline variables. By 2018, fasting glucose and BMI significantly increased among all genotype groups, and a substantial portion of the sample developed type 2 diabetes mellitus. The A allele was associated with a lower fasting glucose rate-of-change (beta = -0.05 mmol/L/year per allele, p = 0.058 among women; beta = -0.004 mmol/L/year per allele, p = 0.863 among men), after accounting for baseline variables. Mean fasting glucose and mean BMI increased over an eight-year period and a substantial number of individuals developed type 2 diabetes by 2018. However, fasting glucose rate-of-change, and type 2 diabetes development was lower among females with AG and AA genotypes. Further research is needed to understand the effect of the A allele on fasting glucose and type 2 diabetes development. Based on our observations that other risk factors increased over time, we advocate for the continued promotion for diabetes prevention and treatment programming, and the reduction of modifiable risk factors, in this setting.
Age-related macular degeneration (AMD) is a multifactorial neurodegenerative disease, which is a leading cause of vision loss among the elderly in the developed countries. As one of the most ...successful examples of genome-wide association study (GWAS), a large number of genetic studies have been conducted to explore the genetic basis for AMD and its progression, of which over 30 loci were identified and confirmed. In this chapter, we review the recent development and findings of GWAS for AMD risk and progression. Then, we present emerging methods and models for predicting AMD development or its progression using large-scale genetic data. Finally, we discuss a set of novel statistical and analytical methods that were recently developed to tackle the challenges such as analyzing bilateral correlated eye-level outcomes that are subject to censoring with high-dimensional genetic data. Future directions for analytical studies of AMD genetics are also proposed.