The development of analytical methods for Genome-wide Association Studies (GWAS) has outpaced the evolution of simulation techniques and pipelines. This disparity underscores the importance of ...innovative simulation methods that can keep pace with the rapidly increasing scale of GWAS. The median sample size of GWAS over the past ten years has exceeded 50,000 individuals, a trend that emphasizes the need for simulation tools capable of generating data on a similar or larger scale. This paper introduces a novel method, the small-group originating (SGO) model, utilizing the SLiM software for simulating individual-level GWAS data. Our standardized protocol facilitates the generation of tens of thousands of pseudo-individuals with millions of variants from small (30−90) open-access datasets.
SGO stands out, especially when compared to the widely-used resampling method in HapGen, showcasing superior simulation efficiency for large sample sizes (> 13,000) of unrelated individuals. This capability is particularly relevant given the current trajectory towards larger GWAS, necessitating tools that can simulate datasets reflective of this growth. Additionally, SGO provides customization options and can model dynamic life cycles and mating across generations, positioning it as a highly promising alternative for GWAS simulations.
In a case study, sensitivity analyses of chromosome-level principal component analysis and kinship coefficient estimation were conducted. The results highlighted the poor robustness of chromosome-level quality control (QC) indexes and the uneven distribution of population structure across chromosomes and ancestries, advocating for the caution against relying solely on chromosome-level QC statistics.
With its flexible and efficient approach to generating pseudo GWAS data, our standardized SGO protocol emerges as a crucial asset for method development, power analysis, and benchmarking in GWAS research. It is especially vital in the context of accommodating the demands for large-scale simulations, aligning with the current and future scale of GWAS.
Display omitted
●Introduced the SGO model using SLiM for simulating individual-level GWAS data, addressing the need of large-scale simulations (50,000+ individuals).●Demonstrated SGO's superior efficiency over HapGen for samples >13,000, enabling creation of many pseudo-individuals from small datasets (N=30–90).●Showcased SGO's flexibility in modeling dynamic life cycles and mating across generations, offering a promising alternative for GWAS simulations.●Conducted a case study on chromosome-level PCA and kinship coefficient estimation, revealing limitations in chromosome-level quality control statistics.●Established SGO as a crucial tool for method development, power analysis, and benchmarking in GWAS research, especially for large-scale simulations.
The vast majority of genome-wide association study (GWAS) findings reported to date are from populations with European Ancestry (EA), and it is not yet clear how broadly the genetic associations ...described will generalize to populations of diverse ancestry. The Population Architecture Using Genomics and Epidemiology (PAGE) study is a consortium of multi-ancestry, population-based studies formed with the objective of refining our understanding of the genetic architecture of common traits emerging from GWAS. In the present analysis of five common diseases and traits, including body mass index, type 2 diabetes, and lipid levels, we compare direction and magnitude of effects for GWAS-identified variants in multiple non-EA populations against EA findings. We demonstrate that, in all populations analyzed, a significant majority of GWAS-identified variants have allelic associations in the same direction as in EA, with none showing a statistically significant effect in the opposite direction, after adjustment for multiple testing. However, 25% of tagSNPs identified in EA GWAS have significantly different effect sizes in at least one non-EA population, and these differential effects were most frequent in African Americans where all differential effects were diluted toward the null. We demonstrate that differential LD between tagSNPs and functional variants within populations contributes significantly to dilute effect sizes in this population. Although most variants identified from GWAS in EA populations generalize to all non-EA populations assessed, genetic models derived from GWAS findings in EA may generate spurious results in non-EA populations due to differential effect sizes. Regardless of the origin of the differential effects, caution should be exercised in applying any genetic risk prediction model based on tagSNPs outside of the ancestry group in which it was derived. Models based directly on functional variation may generalize more robustly, but the identification of functional variants remains challenging.
Background & Aims Heritable factors contribute to the development of colorectal cancer. Identifying the genetic loci associated with colorectal tumor formation could elucidate the mechanisms of ...pathogenesis. Methods We conducted a genome-wide association study that included 14 studies, 12,696 cases of colorectal tumors (11,870 cancer, 826 adenoma), and 15,113 controls of European descent. The 10 most statistically significant, previously unreported findings were followed up in 6 studies; these included 3056 colorectal tumor cases (2098 cancer, 958 adenoma) and 6658 controls of European and Asian descent. Results Based on the combined analysis, we identified a locus that reached the conventional genome-wide significance level at less than 5.0 × 10−8 : an intergenic region on chromosome 2q32.3, close to nucleic acid binding protein 1 (most significant single nucleotide polymorphism: rs11903757; odds ratio OR, 1.15 per risk allele; P = 3.7 × 10−8 ). We also found evidence for 3 additional loci with P values less than 5.0 × 10−7 : a locus within the laminin gamma 1 gene on chromosome 1q25.3 (rs10911251; OR, 1.10 per risk allele; P = 9.5 × 10−8 ), a locus within the cyclin D2 gene on chromosome 12p13.32 (rs3217810 per risk allele; OR, 0.84; P = 5.9 × 10−8 ), and a locus in the T-box 3 gene on chromosome 12q24.21 (rs59336; OR, 0.91 per risk allele; P = 3.7 × 10−7 ). Conclusions In a large genome-wide association study, we associated polymorphisms close to nucleic acid binding protein 1 (which encodes a DNA-binding protein involved in DNA repair) with colorectal tumor risk. We also provided evidence for an association between colorectal tumor risk and polymorphisms in laminin gamma 1 (this is the second gene in the laminin family to be associated with colorectal cancers), cyclin D2 (which encodes for cyclin D2), and T-box 3 (which encodes a T-box transcription factor and is a target of Wnt signaling to β-catenin). The roles of these genes and their products in cancer pathogenesis warrant further investigation.
Genome-wide association studies (GWASs) have associated approximately 50 loci with risk of colorectal cancer (CRC)—nearly one third of these loci were initially associated with CRC in studies ...conducted in East Asian populations. We conducted a GWAS of East Asians to identify CRC risk loci and evaluate the generalizability of findings from GWASs of European populations to Asian populations.
We analyzed genetic data from 22,775 patients with CRC (cases) and 47,731 individuals without cancer (controls) from 14 studies in the Asia Colorectal Cancer Consortium. First, we performed a meta-analysis of 7 GWASs (10,625 cases and 34,595 controls) and identified 46,554 promising risk variants for replication by adding them to the Multi-Ethnic Global Array (MEGA) for genotype analysis in 6445 cases and 7175 controls. These data were analyzed, along with data from an additional 5705 cases and 5961 controls genotyped using the OncoArray. We also obtained data from 57,976 cases and 67,242 controls of European descent. Variants at identified risk loci were functionally annotated and evaluated in correlation with gene expression levels.
A meta-analyses of all samples from people of Asian descent identified 13 loci and 1 new variant at a known locus (10q24.2) associated with risk of CRC at the genome-wide significance level of P < 5 × 10–8. We did not perform experiments to replicate these associations in additional individuals of Asian ancestry. However, the lead risk variant in 6 of these loci was also significantly associated with risk of CRC in European descendants. A strong association (44%–75% increase in risk per allele) was found for 2 low-frequency variants: rs201395236 at 1q44 (minor allele frequency, 1.34%) and rs77969132 at 12p11.21 (minor allele frequency, 1.53%). For 8 of the 13 associated loci, the variants with the highest levels of significant association were located inside or near the protein-coding genes L1TD1, EFCAB2, PPP1R21, SLCO2A1, HLA-G, NOTCH4, DENND5B, and GNAS. For other intergenic loci, we provided evidence for the possible involvement of the genes ALDH7A1, PRICKLE1, KLF5, WWOX, and GLP2R. We replicated findings for 41 of 52 previously reported risk loci.
We showed that most of the risk loci previously associated with CRC risk in individuals of European descent were also associated with CRC risk in East Asians. Furthermore, we identified 13 loci significantly associated with risk for CRC in Asians. Many of these loci contained genes that regulate the immune response, Wnt signaling to β-catenin, prostaglandin E2 catabolism, and cell pluripotency and proliferation. Further analyses of these genes and their variants is warranted, particularly for the 8 loci for which the lead CRC risk variants were not replicated in persons of European descent.
Background & Aims Known genetic factors explain only a small fraction of genetic variation in colorectal cancer (CRC). We conducted a genome-wide association study to identify risk loci for CRC. ...Methods This discovery stage included 8027 cases and 22,577 controls of East-Asian ancestry. Promising variants were evaluated in studies including as many as 11,044 cases and 12,047 controls. Tumor-adjacent normal tissues from 188 patients were analyzed to evaluate correlations of risk variants with expression levels of nearby genes. Potential functionality of risk variants were evaluated using public genomic and epigenomic databases. Results We identified 4 loci associated with CRC risk; P values for the most significant variant in each locus ranged from 3.92 × 10−8 to 1.24 × 10−12 : 6p21.1 (rs4711689), 8q23.3 (rs2450115, rs6469656), 10q24.3 (rs4919687), and 12p13.3 (rs11064437). We also identified 2 risk variants at loci previously associated with CRC: 10q25.2 (rs10506868) and 20q13.3 (rs6061231). These risk variants, conferring an approximate 10%–18% increase in risk per allele, are located either inside or near protein-coding genes that include TFEB (lysosome biogenesis and autophagy), eukaryotic translation initiation factor 3, subunit H (initiation of translation), cytochrome P450, family 17, subfamily A, polypeptide 1 (steroidogenesis), splA/ryanodine receptor domain and SOCS box containing 2 (proteasome degradation), and RPS21 (ribosome biogenesis). Gene expression analyses showed a significant association ( P < .05) for rs4711689 with TFEB , rs6469656 with eukaryotic translation initiation factor 3, subunit H, rs11064437 with splA/ryanodine receptor domain and SOCS box containing 2, and rs6061231 with RPS21. Conclusions We identified susceptibility loci and genes associated with CRC risk, linking CRC predisposition to steroid hormone, protein synthesis and degradation, and autophagy pathways and providing added insight into the mechanism of CRC pathogenesis.
The number of Endometrial Carcinoma (EC) diagnoses is projected to increase substantially in coming decades. Although most ECs have a favorable prognosis, the aggressive, non-endometrioid subtypes ...are disproportionately concentrated in Black women and spread rapidly, making treatment difficult and resulting in poor outcomes. Therefore, this study offers an exploratory spatial epidemiological investigation of EC patients within a U.S.-based health system's institutional cancer registry (
= 1748) to search for and study geographic patterns. Clinical, demographic, and geographic characteristics were compared by histotype using chi-square tests for categorical and t-tests for continuous variables. Multivariable logistic regression evaluated the impact of risks on these histotypes. Cox proportional hazard models measured risks in overall and cancer-specific death. Cluster detection indicated that patients with the EC non-endometrioid histotypes exhibit geographic clustering in their home address, such that congregate buildings can be identified for targeted outreach. Furthermore, living in a high social vulnerability area was independently associated with non-endometrioid histotypes, as continuous and categorical variables. This study provides a methodological framework for early, geographically targeted intervention; social vulnerability associations require further investigation. We have begun to fill the knowledge gap of geography in gynecologic cancers, and geographic clustering of aggressive tumors may enable targeted intervention to improve prognoses.
Purpose
Previous literature shows that more bladder cancer patients overall die from causes other than the primary malignancy. Given known disparities in bladder cancer outcomes by race and sex, we ...aimed to characterize differences in cause-specific mortality for bladder cancer patients by these demographics.
Methods
We identified 215,252 bladder cancer patients diagnosed with bladder cancer from 2000 to 2017 in the SEER 18 database. We calculated cumulative incidence of death from seven causes (bladder cancer, COPD, diabetes, heart disease, external, other cancer, other) to assess differences in cause-specific mortality between race and sex subgroups. We used multivariable Cox proportional hazards regression and Fine-Gray competing risk models to compare risk of bladder cancer-specific mortality between race and sex subgroups overall and stratified by cancer stage.
Results
17% of patients died from bladder cancer (
n
= 36,923), 30% died from other causes (
n
= 65,076), and 53% were alive (
n
= 113,253). Among those who died, the most common cause of death was bladder cancer, followed by other cancer and diseases of the heart. All race-sex subgroups were more likely than white men to die from bladder cancer. Compared to white men, white women (HR: 1.20, 95% CI: 1.17–1.23) and Black women (HR: 1.57, 95% CI: 1.49–1.66) had a higher risk of dying from bladder cancer, overall and stratified by stage.
Conclusion
Among bladder cancer patients, death from other causes especially other cancer and heart disease contributed a large proportion of mortality. We found differences in cause-specific mortality by race-sex subgroups, with Black women having a particularly high risk of dying from bladder cancer.
Background
Racial/ethnic disparities in metastatic colorectal cancer (mCRC) survival are well documented as is the impact that tumor mutation of
KRAS
and
BRAF
has on prognosis. It has been suggested ...that frequency differences of
KRAS-
and
BRAF-
mutated tumors may partially explain this disparity. Demographic differences in mutation frequency are not well established nor whether mutation and microsatellite instability (MSI) differentially impact survival among groups.
Methods
Using data for 11,117 patients diagnosed with de-novo mCRC from an electronic health record-derived database we estimated adjusted odds ratios (aOR) to characterize the association between demographics and MSI and
KRAS/NRAS/BRAF
-mutation status. Stratified Cox models were used to identify differences in overall survival (OS), adjusting for treatment and demographics.
Results
Being female, compared to male, (aOR
KRAS
:1.33 (1.23–1.44); aOR
BRAF
:1.84 (1.56–2.16)), and non-Hispanic Black race (NHB), compared to non-Hispanic White (NHW) (aOR
KRAS
:1.62 (1.42–1.85); aOR
BRAF
: 0.55 (0.38–0.77)) were associated with
KRAS-
or
BRAF-
mutant tumors. MSI prevalence was similar across race/ethnicity but higher in women.
BRAF-
mutant tumors were associated with poorer prognosis overall, especially among non-white patients. Among patients who had
KRAS/NRAS/BRAF-
WT tumors we observed no difference in OS by race or MSI. Among patients with
KRAS
-mutant tumors, Hispanic patients had more favorable prognosis adjusted hazards ratio (aHR) = 0.76 (0.65–0.89)) than their NHW counterparts. Among those with
BRAF-
mutant tumors, NHB patients had poorer prognosis than NHW patients (aHR:1.78 (1.08–2.93)).
Conclusion
MSI and frequency of
KRAS
and
BRAF
mutations differed by demographics. Racial/ethnic disparities in OS differed by mutation. Future studies should explore biological and/or social determinants underlying these differences.
Abstract
Autism spectrum disorders (ASD) display both phenotypic and genetic heterogeneity, impeding the understanding of ASD and development of effective means of diagnosis and potential treatments. ...Genes affected by genomic variations for ASD converge in dozens of gene ontologies (GOs), but the relationship between the variations at the GO level have not been well elucidated. In the current study, multiple types of genomic variations were mapped to GOs and correlations among GOs were measured in ASD and control samples. Several ASD-unique GO correlations were found, suggesting the importance of co-occurrence of genomic variations in genes from different functional categories in ASD etiology. Combined with experimental data, several variations related to WNT signaling, neuron development, synapse morphology/function and organ morphogenesis were found to be important for ASD with macrocephaly, and novel co-occurrence patterns of them in ASD patients were found. Furthermore, we applied this gene ontology correlation analysis method to find genomic variations that contribute to ASD etiology in combination with changes in gene expression and transcription factor binding, providing novel insights into ASD with macrocephaly and a new methodology for the analysis of genomic variation.