Genome Wide Association Studies produce a wealth of data. However, a substantial portion of the genetic heritability for complex diseases is not explained by the most highly associated markers. ...Researchers have recently demonstrated that they can explain a much larger proportion of the genetic variation by delving more deeply into the data. For instance, Yang et al. showed that approximately 50% of the heritability in human height is explained by about three hundred thousand markers. To extract this information, researchers are moving to more complex analyses that model the relationships between a trait and two or more genes. These complex analyses often use gene regions instead of markers as the unit of measure. We call this gene region analysis and determining how to represent each region is often an obstacle. Here, we lay the foundation for evaluating summary methods used in complex gene-based analyses by exploring three aspects of gene region analysis: (1) simulating a gene region, (2) adjusting for multiple testing, and (3) detecting association to a gene region using summary methods. We first compare simulation methods and find that the software program, Hapgen, produces replicates that give adequate sampling variability while retaining the unique characteristics of the gene region used for simulation. We then evaluate methods to adjust for multiple comparisons within a gene region. We find that extreme tail theory performs well but is computationally expensive as compared to Li & Ji's effective number of independent SNP method, which does not always retain the appropriate type-I error rate, but is computationally efficient. Finally, we find that using the marker with the lowest p-value to summarize a gene region often has the highest power for regions with moderate to high correlation while using a summary method based off of BIC forward selection performs better in regions with low correlation. These findings will help researchers design simulation studies to explore the performance of gene region summary measures in complex analyses, to adjust for multiple comparisons when testing markers in a gene region, and to use gene region summary measures to detect association between a region and a trait.
We sought to find significant gene x gene interaction in a genome-wide association analysis of rheumatoid arthritis (RA) by performing pair-wise tests of interaction among collections of ...single-nucleotide polymorphisms (SNPs) obtained by one of two methods. The first method involved screening the results of the genome-wide association analysis for main effects p-values < 1 x 10-4. The second method used biological databases such as the Gene Ontology and Kyoto Encyclopedia of Genes and Genomes to define gene collections that each contained one of four genes with known associations with RA: PTPN22, STAT4, TRAF1, and C5. We used a permutation approach to determine whether any of these SNP sets had empirical enrichment of significant interaction effects. We found that the SNP set obtained by the first method was significantly enriched with significant interaction effects (empirical p = 0.003). Additionally, we found that the "protein complex assembly" collection of genes from the Gene Ontology collection containing the TRAF1 gene was significantly enriched with interaction effects with p-values < 1 x 10-8 (empirical p = 0.012).