DNA sample contamination is a serious problem in DNA sequencing studies and may result in systematic genotype misclassification and false positive associations. Although methods exist to detect and ...filter out cross-species contamination, few methods to detect within-species sample contamination are available. In this paper, we describe methods to identify within-species DNA sample contamination based on (1) a combination of sequencing reads and array-based genotype data, (2) sequence reads alone, and (3) array-based genotype data alone. Analysis of sequencing reads allows contamination detection after sequence data is generated but prior to variant calling; analysis of array-based genotype data allows contamination detection prior to generation of costly sequence data. Through a combination of analysis of in silico and experimentally contaminated samples, we show that our methods can reliably detect and estimate levels of contamination as low as 1%. We evaluate the impact of DNA contamination on genotype accuracy and propose effective strategies to screen for and prevent DNA contamination in sequencing studies.
Fifty percent of variability in HIV-1 susceptibility is attributable to host genetics. Thus identifying genetic associations is essential to understanding pathogenesis of HIV-1 and important for ...targeting drug development. To date, however, CCR5 remains the only gene conclusively associated with HIV acquisition. To identify novel host genetic determinants of HIV-1 acquisition, we conducted a genome-wide association study among a high-risk sample of 3,136 injection drug users (IDUs) from the Urban Health Study (UHS). In addition to being IDUs, HIV-controls were frequency-matched to cases on environmental exposures to enhance detection of genetic effects. We tested independent replication in the Women's Interagency HIV Study (N=2,533). We also examined publicly available gene expression data to link SNPs associated with HIV acquisition to known mechanisms affecting HIV replication/infectivity. Analysis of the UHS nominated eight genetic regions for replication testing. SNP rs4878712 in FRMPD1 met multiple testing correction for independent replication (P=1.38x10(-4)), although the UHS-WIHS meta-analysis p-value did not reach genome-wide significance (P=4.47x10(-7) vs. P<5.0x10(-8)) Gene expression analyses provided promising biological support for the protective G allele at rs4878712 lowering risk of HIV: (1) the G allele was associated with reduced expression of FBXO10 (r=-0.49, P=6.9x10(-5)); (2) FBXO10 is a component of the Skp1-Cul1-F-box protein E3 ubiquitin ligase complex that targets Bcl-2 protein for degradation; (3) lower FBXO10 expression was associated with higher BCL2 expression (r=-0.49, P=8x10(-5)); (4) higher basal levels of Bcl-2 are known to reduce HIV replication and infectivity in human and animal in vitro studies. These results suggest new potential biological pathways by which host genetics affect susceptibility to HIV upon exposure for follow-up in subsequent studies.
Cardiomyocyte cell division and replication in mammals proceed through embryonic development and abruptly decline soon after birth. The process governing cardiomyocyte cell cycle arrest is poorly ...understood. Here we carry out whole-exome sequencing in an infant with evidence of persistent postnatal cardiomyocyte replication to determine the genetic risk factors. We identify compound heterozygous ALMS1 mutations in the proband, and confirm their presence in her affected sibling, one copy inherited from each heterozygous parent. Next, we recognize homozygous or compound heterozygous truncating mutations in ALMS1 in four other children with high levels of postnatal cardiomyocyte proliferation. Alms1 mRNA knockdown increases multiple markers of proliferation in cardiomyocytes, the percentage of cardiomyocytes in G2/M phases, and the number of cardiomyocytes by 10% in cultured cells. Homozygous Alms1-mutant mice have increased cardiomyocyte proliferation at 2 weeks postnatal compared with wild-type littermates. We conclude that deficiency of Alström protein impairs postnatal cardiomyocyte cell cycle arrest.
Insulin secretion has a crucial role in glucose homeostasis, and failure to secrete sufficient insulin is a hallmark of type 2 diabetes. Genome-wide association studies (GWAS) have identified loci ...contributing to insulin processing and secretion; however, a substantial fraction of the genetic contribution remains undefined. To examine low-frequency (minor allele frequency (MAF) 0.5-5%) and rare (MAF < 0.5%) nonsynonymous variants, we analyzed exome array data in 8,229 nondiabetic Finnish males using the Illumina HumanExome Beadchip. We identified low-frequency coding variants associated with fasting proinsulin concentrations at the SGSM2 and MADD GWAS loci and three new genes with low-frequency variants associated with fasting proinsulin or insulinogenic index: TBC1D30, KANK1 and PAM. We also show that the interpretation of single-variant and gene-based tests needs to consider the effects of noncoding SNPs both nearby and megabases away. This study demonstrates that exome array genotyping is a valuable approach to identify low-frequency variants that contribute to complex traits.
Microarray single-nucleotide polymorphism genotyping, combined with imputation of untyped variants, has been widely adopted as an efficient means to interrogate variation across the human genome. ..."Genomic coverage" is the total proportion of genomic variation captured by an array, either by direct observation or through an indirect means such as linkage disequilibrium or imputation. We have performed imputation-based genomic coverage assessments of eight current genotyping arrays that assay from ~0.3 to ~5 million variants. Coverage was determined separately in each of the four continental ancestry groups in the 1000 Genomes Project phase 1 release. We used the subset of 1000 Genomes variants present on each array to impute the remaining variants and assessed coverage based on correlation between imputed and observed allelic dosages. More than 75% of common variants (minor allele frequency > 0.05) are covered by all arrays in all groups except for African ancestry, and up to ~90% in all ancestries for the highest density arrays. In contrast, less than 40% of less common variants (0.01 < minor allele frequency < 0.05) are covered by low density arrays in all ancestries and 50-80% in high density arrays, depending on ancestry. We also calculated genome-wide power to detect variant-trait association in a case-control design, across varying sample sizes, effect sizes, and minor allele frequency ranges, and compare these array-based power estimates with a hypothetical array that would type all variants in 1000 Genomes. These imputation-based genomic coverage and power analyses are intended as a practical guide to researchers planning genetic studies.
Studies of the allelotype of human cancers have provided valuable insights into those chromosomes targeted for genetic inactivation during tumorigenesis. We present the comprehensive allelotype of 82 ...xenografted pancreatic or biliary cancers using 386 microsatellite markers and spanning the entire genome at an average coverage of 10 cM. Allelic losses were nonrandomly distributed across the genome and most prevalent for chromosome arms 9p, 17p, and 18q (>60%), sites of the known tumor suppressor genes CDKN2A, TP53, and MADH4. Moderate rates of loss (at any one locus) were noted for chromosome arms 3p, 6q, 8p, 17q, 18p, 21q, and 22q (40-60%). A mapping of individual loci of allelic loss revealed 11 "hot spots" of loss of heterozygosity (>30%) in addition to loci near known tumor suppressor genes, corresponding to 3p, 4q, 5q, 6q, 8p, 12q, 14q, 21q, 22q, and the X chromosome. The average genomic fractional allelic loss was 15.3% of all tested markers for the 82 xenografted cancers, with allelic loss affecting as little as 1.5% to as much as 32.1% of tested loci, a remarkable 20-fold range. We determined the chromosome location (in cM) of each of the 386 markers used based on mapping data available from the National Center for Biotechnology Information, and we provide the first distance-based estimates of chromosome material lost in a human epithelial cancer. Specifically, we found that the cumulative size of allelic losses ranged from 58 to 1160 cM, with an average loss of 561.32 cM/tumor. We compared the genomic fractional allelic loss of each xenografted cancer with known clinicopathological features for each patient and found a significant correlation with smoking status (P < 0.01). These findings offer new loci for investigation of the genetic alterations common to pancreaticobiliary cancers and aid the understanding of mechanisms of allelic loss in human carcinogenesis.
Abstract
Background
In addition to the established association between general obesity and breast cancer risk, central obesity and circulating fasting insulin and glucose have been linked to the ...development of this common malignancy. Findings from previous studies, however, have been inconsistent, and the nature of the associations is unclear.
Methods
We conducted Mendelian randomization analyses to evaluate the association of breast cancer risk, using genetic instruments, with fasting insulin, fasting glucose, 2-h glucose, body mass index (BMI) and BMI-adjusted waist-hip-ratio (WHRadj BMI). We first confirmed the association of these instruments with type 2 diabetes risk in a large diabetes genome-wide association study consortium. We then investigated their associations with breast cancer risk using individual-level data obtained from 98 842 cases and 83 464 controls of European descent in the Breast Cancer Association Consortium.
Results
All sets of instruments were associated with risk of type 2 diabetes. Associations with breast cancer risk were found for genetically predicted fasting insulin odds ratio (OR) = 1.71 per standard deviation (SD) increase, 95% confidence interval (CI) = 1.26-2.31, p = 5.09 × 10–4, 2-h glucose (OR = 1.80 per SD increase, 95% CI = 1.3 0-2.49, p = 4.02 × 10–4), BMI (OR = 0.70 per 5-unit increase, 95% CI = 0.65-0.76, p = 5.05 × 10–19) and WHRadj BMI (OR = 0.85, 95% CI = 0.79-0.91, p = 9.22 × 10–6). Stratified analyses showed that genetically predicted fasting insulin was more closely related to risk of estrogen-receptor ER-positive cancer, whereas the associations with instruments of 2-h glucose, BMI and WHRadj BMI were consistent regardless of age, menopausal status, estrogen receptor status and family history of breast cancer.
Conclusions
We confirmed the previously reported inverse association of genetically predicted BMI with breast cancer risk, and showed a positive association of genetically predicted fasting insulin and 2-h glucose and an inverse association of WHRadj BMI with breast cancer risk. Our study suggests that genetically determined obesity and glucose/insulin-related traits have an important role in the aetiology of breast cancer.
Kabuki syndrome is a monogenic disorder caused by loss of function variants in either of two genes encoding histone-modifying enzymes. We performed targeted sequencing in a cohort of 27 probands with ...a clinical diagnosis of Kabuki syndrome. Of these, 12 had causative variants in the two known Kabuki syndrome genes. In 2, we identified presumptive loss of function de novo variants in KMT2A (missense and splice site variants), a gene that encodes another histone modifying enzyme previously exclusively associated with Wiedermann-Steiner syndrome. Although Kabuki syndrome is a disorder of histone modification, we also find alterations in DNA methylation among individuals with a Kabuki syndrome diagnosis relative to matched normal controls, regardless of whether they carry a variant in KMT2A or KMT2D or not. Furthermore, we observed characteristic global abnormalities of DNA methylation that distinguished patients with a loss of function variant in KMT2D or missense or splice site variants in either KMT2D or KMT2A from normal controls. Our results provide new insights into the relationship of genotype to epigenotype and phenotype and indicate cross-talk between histone and DNA methylation machineries exposed by inborn errors of the epigenetic apparatus.
To identify molecular predictors of grade 3/4 neutropenic or leukopenic events (NLE) after chemotherapy using a genome-wide association study (GWAS).
A GWAS was performed on patients in the phase III ...chemotherapy study SUCCESS-A (n = 3,322). Genotyping was done using the Illumina HumanOmniExpress-12v1 array. Findings were functionally validated with cell culture models and the genotypes and gene expression of possible causative genes were correlated with clinical treatment response and prognostic outcomes.
One locus on chromosome 16 (rs4784750; NLRC5; P = 1.56E-8) and another locus on chromosome 13 (rs16972207; TNFSF13B; P = 3.42E-8) were identified at a genome-wide significance level. Functional validation revealed that expression of these two genes is altered by genotype-dependent and chemotherapy-dependent activity of two transcription factors. Genotypes also showed an association with disease-free survival in patients with an NLE.
Two loci in NLRC5 and TNFSF13B are associated with NLEs. The involvement of the MHC I regulator NLRC5 implies the possible involvement of immuno-oncological pathways.