DNA methylation patterns are well known to vary substantially across cell types or tissues. Hence, existing normalization methods may not be optimal if they do not take this into account. We ...therefore present a new R package for normalization of data from the Illumina Infinium Human Methylation450 BeadChip (Illumina 450 K) built on the concepts in the recently published funNorm method, and introducing cell-type or tissue-type flexibility.
funtooNorm is relevant for data sets containing samples from two or more cell or tissue types. A visual display of cross-validated errors informs the choice of the optimal number of components in the normalization. Benefits of cell (tissue)-specific normalization are demonstrated in three data sets. Improvement can be substantial; it is strikingly better on chromosome X, where methylation patterns have unique inter-tissue variability.
An R package is available at https://github.com/GreenwoodLab/funtooNorm, and has been submitted to Bioconductor at http://bioconductor.org.
In genetic studies of complex diseases, multiple measures of related phenotypes are often collected. Jointly analyzing these phenotypes may improve statistical power to detect sets of rare variants ...affecting multiple traits. In this work, we consider association testing between a set of rare variants and multiple phenotypes in family-based designs. We use a mixed linear model to express the correlations among the phenotypes and between related individuals. Given the many sources of correlations in this situation, deriving an appropriate test statistic is not straightforward. We derive a vector of score statistics, whose joint distribution is approximated using a copula. This allows us to have closed-form expressions for the p-values of several test statistics. A comprehensive simulation study and an application to Genetic Analysis Workshop 18 data highlight the gains associated with joint testing over univariate approaches, especially in the presence of pleiotropy or highly correlated phenotypes.
Des mesures de multiples phénotypes reliés sont souvent collectées dans le cadre d’études génétiques de maladies complexes. L’analyse conjointe de ces phénotypes pourrait améliorer la puissance pour détecter des ensembles de variantes rares qui affectent ces traits. Les auteurs s’intéressent aux tests d’association entre un ensemble de variantes rares et multiples phénotypes en présence de données familiales. Ils utilisent un modèle linéaire mixte pour exprimer les corrélations entre les phénotypes et entre les individus apparentés. Étant donné les différentes sources de corrélation dans cette situation, l’obtention d’une statistique de test n’est pas une tâche triviale. Les auteurs dérivent un vecteur de statistiques de test de type score et approximent sa distribution conjointe en utilisant des copules. Ceci leur permet d’obtenir des expressions explicites pour les seuils observés (valeurs-p) de plusieurs statistiques de tests. À l’aide d’une large étude de simulation et d’une application aux données de l’atelier d’analyse génétique GAW18, ils illustrent les gains de tests conjoints par rapport aux tests univariés, surtout en présence de pléiotropie ou de phénotypes fortement corrélés.
We consider the assessment of DNA methylation profiles for sequencing-derived data from a single cell type or from cell lines. We derive a kernel smoothed EM-algorithm, capable of analyzing an entire ...chromosome at once, and to simultaneously correct for experimental errors arising from either the pre-treatment steps or from the sequencing stage and to take into account spatial correlations between DNA methylation profiles at neighbouring CpG sites. The outcomes of our algorithm are then used to (i) call the true methylation status at each CpG site, (ii) provide accurate smoothed estimates of DNA methylation levels, and (iii) detect differentially methylated regions. Simulations show that the proposed methodology outperforms existing analysis methods that either ignore the correlation between DNA methylation profiles at neighbouring CpG sites or do not correct for errors. The use of the proposed inference procedure is illustrated through the analysis of a publicly available data set from a cell line of induced pluripotent H9 human embryonic stem cells and also a data set where methylation measures were obtained for a small genomic region in three different immune cell types separated from whole blood.
Common variants explain little of the variance of most common disease, prompting large-scale sequencing studies to understand the contribution of rare variants to these diseases. Imputation of rare ...variants from genome-wide genotypic arrays offers a cost-efficient strategy to achieve necessary sample sizes required for adequate statistical power. To estimate the performance of imputation of rare variants, we imputed 153 individuals, each of whom was genotyped on 3 different genotype arrays including 317k, 610k and 1 million single nucleotide polymorphisms (SNPs), to two different reference panels: HapMap2 and 1000 Genomes pilot March 2010 release (1KGpilot) by using IMPUTE version 2. We found that more than 94% and 84% of all SNPs yield acceptable accuracy (info > 0.4) in HapMap2 and 1KGpilot-based imputation, respectively. For rare variants (minor allele frequency (MAF) ≤5%), the proportion of well-imputed SNPs increased as the MAF increased from 0.3% to 5% across all 3 genome-wide association study (GWAS) datasets. The proportion of well-imputed SNPs was 69%, 60% and 49% for SNPs with a MAF from 0.3% to 5% for 1M, 610k and 317k, respectively. None of the very rare variants (MAF ≤ 0.3%) were well imputed. We conclude that the imputation accuracy of rare variants increases with higher density of genome-wide genotyping arrays when the size of the reference panel is small. Variants with lower MAF are more difficult to impute. These findings have important implications in the design and replication of large-scale sequencing studies.
Data integration is becoming an essential tool to cope with and make sense of the ever increasing amount of biological data. Genomic data arises in various shapes and forms including vectors, graphs ...or sequences, therefore, it is essential to carefully consider strategies that best capture the most information contained in each data type. The need for integration of heterogeneous data measured on the same individuals arises in a wide range of clinical applications as well. We propose weighted kernel Fisher discriminant (wKFD) analysis for integrating heterogeneous data sets. We use weights that measure relative importance of each of the data sets to be integrated. Simulation studies are conducted to assess performance of our proposed method. The results show that our method performs very well including in the presence of noisy data. We also illustrate our method using gene expression and clinical data from breast cancer patients. Weighted integration of heterogeneous data leads to improved predictive accuracy. The amount of improvement, however, depends on the quality and informativity of each of the data sets being integrated. If a data set is of poor quality and/or non-informative, one should not expect a significant improvement by adding this particular data set to other informative data sets. Likewise, important improvement might not be obtained if data do not contain independent information, that is, if there is redundancy in the data.
Deviations from a Mendelian 1:1 transmission ratio have been observed in human and mouse chromosomes. With few exceptions, the underlying mechanism of the transmission-ratio distortion remains ...obscure. We tested a hypothesis that grandparental-origin dependent transmission-ratio distortion is related to imprinting and possibly results from the loss of embryos which carry imprinted genes with imprinting marks that have been incorrectly reset. We analyzed transmission of alleles in four regions of the human genome that carry imprinted genes presumably critical for normal embryonic growth and development: 11p15.5 (H19, IGF2, HASH2, etc.), 11p13 (WT1), 7p11-12 (GRB10), and 6q25-q27 (IGF2R), among the offspring of 31 three-generation Centre d'Étude de polymorphism Humain (CEPH) families. Deviations from expected 1:1 ratios were found in the maternal chromosomes for regions 11p15.5, 11p13, and 6q25-27 and in the paternal chromosomes for regions 11p15 and 7p11-p12. The likelihood of the results was assessed empirically to be statistically significant (p = 0.0008), suggesting that the transmission ratios in the imprinted regions significantly deviated from 1:1. We did not find deviations from a 1:1 transmission ratio in imprinted regions that are not crucial for embryo viability (13q14 and 15q11-q13). The analysis of a larger set of 51 families for the 11p15.5 region suggests that there is heterogeneity among the families with regard to the transmission of 11p15.5 alleles. The results of this study are consistent with the hypothesis that grandparental-origin dependent transmission-ratio distortion is related to imprinting and embryo loss.Key words: imprinting, transmission-ratio distortion, grandparental origin, embryo loss.
Background and Objective: Standard population genetic theory says that deleterious genetic variants are likely rare and fairly recently introduced. However, can this expectation lead to more powerful ...tests of association between diseases and rare genetic variation? The gene genealogy describes the relationships between haplotypes sampled from the general population. Although ancestral tree-based methods, inspired by the gene genealogy concept, have been developed for finding associations with common genetic variants, here we ask whether gene genealogies can help in identifying genomic regions containing multiple rare causal variants. Methods: With data simulated under several demographic models and using known gene genealogies, we developed and compared several tree-based statistics to determine which, if any, could detect the type of clustering expected with rare causal variants and whether the genealogic tree provides additional information about disease associations. Results and Conclusions: We found that a novel statistic based on the scaled distance between the tips of a tree performed better than other tree-based statistics. When data were simulated with mild population growth, this statistic outperformed two standard non-tree-based methods, showing that an ancestral tree-based approach has potential for rare variant discovery.
Recent technological advances in many domains including both genomics and brain imaging have led to an abundance of high-dimensional and correlated data being routinely collected. Classical ...multivariate approaches like Multivariate Analysis of Variance (MANOVA) and Canonical Correlation Analysis (CCA) can be used to study relationships between such multivariate datasets. Yet, special care is required with high-dimensional data, as the test statistics may be ill-defined and classical inference procedures break down. In this work, we explain how valid p-values can be derived for these multivariate methods even in high dimensional datasets. Our main contribution is an empirical estimator for the largest root distribution of a singular double Wishart problem; this general framework underlies many common multivariate analysis approaches. From a small number of permutations of the data, we estimate the location and scale parameters of a parametric Tracy-Widom family that provides a good approximation of this distribution. Through simulations, we show that this estimated distribution also leads to valid p-values that can be used for high-dimensional inference. We then apply our approach to a pathway-based analysis of the association between DNA methylation and disease type in patients with systemic auto-immune rheumatic diseases.
Abstract
We reported that though >90% of high-grade serous ovarian carcinomas (HGSC) harbor somatic TP53 mutations, cases with missense mutations have significantly longer progression free and ...overall survival than cases with “null” mutations, and that HGSCs defined by TP53 mutation type exhibit unique differences in their genomic landscapes. Large-scale molecular genetic analyses by The Cancer Atlas Group (TCGA) have identified numerous genes/molecular pathways that could be targeted for therapy in HGSC. A large number of candidates were reported, whereby the results were affected by heterogeneity of samples. We have analyzed the transcriptomes from a unique HGSC cell line model with altered in vivo tumorigenic and in vitro growth characteristics to understand the biology of HGSC as we have shown in previous analyses that the pathways affected intersected those found altered in tumors. We posit that investigating phenotypically defined cancer cell line models could facilitate the identification of targets for siRNA-based therapeutics (as an example). Towards this goal, we have characterized the transcriptomes (n~700 RNAs) derived from our parental HGSC cell line model and derived genetically modified cell lines. The cell line was derived from long term passage of ovarian ascites, harbors a somatic missense TP53 mutation and exhibits suppression of tumorigenicity in murine models. The genetically modified derivative cell lines have lost tumorigenic potential and exhibit altered growth characteristics. We compared their transcriptomes, derived a list a genes correlated with these alterations and then compared them to transcriptomes from public data sets: ovarian surface epithelial cells (n=10) and HGSC (n=56); and (2) TCGA samples (n~300). A defined list of known and new genes was identified by these comparative analyses pointing to targets that could be used to affect tumorigenic potential. The genetically modified model provides a new avenue of research that has the potential to elucidate pathways important in HGSC, especially those involved in tumorigenicity, as well as define targets for the development of RNA-based targeted therapeutics.
Citation Format: Patricia N. Tonin, Celia MT Greenwood, Diane MT Provencher, Anne-Marie Mes-Masson. Using a unique genetically modified ovarian cancer cell line model to identify the targets for siRNA directed therapies. abstract. In: Proceedings of the AACR Special Conference on Advances in Ovarian Cancer Research: From Concept to Clinic; Sep 18-21, 2013; Miami, FL. Philadelphia (PA): AACR; Clin Cancer Res 2013;19(19 Suppl):Abstract nr B24.
Background Observational studies have demonstrated an association between decreased vitamin D level and risk of multiple sclerosis (MS); however, it remains unclear whether this relationship is ...causal. We undertook a Mendelian randomization (MR) study to evaluate whether genetically lowered vitamin D level influences the risk of MS. Methods and Findings We identified single nucleotide polymorphisms (SNPs) associated with 25-hydroxyvitamin D (25OHD) level from SUNLIGHT, the largest (n = 33,996) genome-wide association study to date for vitamin D. Four SNPs were genome-wide significant for 25OHD level (p-values ranging from 6 × 10-10 to 2 × 10-109), and all four SNPs lay in, or near, genes strongly implicated in separate mechanisms influencing 25OHD. We then ascertained their effect on 25OHD level in 2,347 participants from a population-based cohort, the Canadian Multicentre Osteoporosis Study, and tested the extent to which the 25OHD-decreasing alleles explained variation in 25OHD level. We found that the count of 25OHD-decreasing alleles across these four SNPs was strongly associated with lower 25OHD level (n = 2,347, F-test statistic = 49.7, p = 2.4 × 10-12). Next, we conducted an MR study to describe the effect of genetically lowered 25OHD on the odds of MS in the International Multiple Sclerosis Genetics Consortium study, the largest genetic association study to date for MS (including up to 14,498 cases and 24,091 healthy controls). Alleles were weighted by their relative effect on 25OHD level, and sensitivity analyses were performed to test MR assumptions. MR analyses found that each genetically determined one-standard-deviation decrease in log-transformed 25OHD level conferred a 2.0-fold increase in the odds of MS (95% CI: 1.7-2.5; p = 7.7 × 10-12; I2 = 63%, 95% CI: 0%-88%). This result persisted in sensitivity analyses excluding SNPs possibly influenced by population stratification or pleiotropy (odds ratio OR = 1.7, 95% CI: 1.3-2.2; p = 2.3 × 10-5; I2 = 47%, 95% CI: 0%-85%) and including only SNPs involved in 25OHD synthesis or metabolism (ORsynthesis = 2.1, 95% CI: 1.6-2.6, p = 1 × 10-9; ORmetabolism = 1.9, 95% CI: 1.3-2.7, p = 0.002). While these sensitivity analyses decreased the possibility that pleiotropy may have biased the results, residual pleiotropy is difficult to exclude entirely. Conclusions A genetically lowered 25OHD level is strongly associated with increased susceptibility to MS. Whether vitamin D sufficiency can delay, or prevent, MS onset merits further investigation in long-term randomized controlled trials.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK