Detecting gene-environment (G × E) interactions in the context of genome-wide association studies (GWAS) is a challenging problem since standard methods generally present a lack of power. An ...additional difficulty arises from the fact that the causal exposure is seldom observed and only a proxy of this exposure is observed. This leads to an additional drop in terms of power and it explains the failure of standard methods in detecting interactions, even very strong ones. In this article, we consider the latent exposure as a source of heterogeneity and we propose a new powerful method, named “Breakpoint Model for Logistic Regression” (BMLR), based on a breakpoint model, in order to detect G × E interactions when causal exposure is unobserved. First, the BMLR method is compared to the ordered-subset analysis for case-control method, which has been developed for the same purpose, through simulations. This highlights the ability of BMLR to detect the heterogeneity, and therefore, to detect interaction with latent exposure. Finally, the BMLR method is compared to standard methods, such as Plink, to perform a GWAS on a published realistic benchmark.
Full text
Available for:
NUK, OILJ, SAZU, UKNU, UL, UM, UPUK
In genetic diseases with variable age of onset, an accurate estimation of the survival function for the mutation carriers and also modifying factors effects estimations are important for the ...management of asymptomatic gene carriers across life. Among the modifying factors, the gender of the parent transmitting the mutation (i.e. the parent-of-origin effect) has been shown to have a significant effect on survival curve estimation on transthyretin familial amyloid polyneuropathy (ATTRv) families. However, as most genotypes are unknown, the parent-of-origin must be calculated through a probability estimated from the pedigree. We propose in this article to extend the method providing mutation carrier survival estimates in order to estimate the parent-of-origin effect. The method is both validated on simulated data and applied to familly samples with ATTRv.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Despite many technological advances for malaria parasite detection (e.g. high resolution image acquisition), microscopic reading of thick blood smear (TBS) remains the gold standard. Even though ...available in low technology environment, the microscopy of TBS is slow and time consuming. Moreover microscopy may induce errors at many levels and has no quality control.
A electronic extension of the mechanical tally counter is proposed. In addition to the counting process it includes the process of counting itself that relies on the time elapsed between two successive pressures of the counting button leading to a timed tally counter (TTC). The microscopist performs the reading with the specific instruction starting by counting, in each high power fields, leucocytes first and then parasites. The time-stamp of all pressures of counting buttons are recorded along with the nature of the count. The data are recorded internally in CSV format and are exportable. The detection of HPFs locations and leukocyte/parasite counts per HPFs is performed through a hidden semi-Markov model (with outliers) allowing both to take into account the known distribution of leukocyte per HPFs (using a negative binomial distribution) and the pauses and hesitation of the microscopist during the reading. Parameters are estimated via the expectation-maximization algorithm. Hyper-parameters are calibrated using expert annotations. Forward/backward recursions are used to obtain the HPFs locations.
This approach provides richer data at no extra cost. It has been demonstrated that the method can derive parasites per HPF, leukocytes per HPF, and parasite/leukocyte ratio with robust non-parametric confidence intervals. Moreover a direct digital data entry leads to a less expensive process and decreased time-consuming and error-prone manual data entry. Lastly the TTC allows detecting possible protocol break during reading and prevents the risk of fraud.
Introducing a programmed digital device in the data acquisition of TBS reading gives the opportunity to develop easily new (possible adaptive) reading protocols that will be easily followed by the reader since they will be embedded directly in the device. With the TTC the reader only has to read HPFs, counting leukocytes first and parasites second, and the counter will beep when the protocol is completed.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Penalized selection criteria like AIC or BIC are among the most popular methods for variable selection. Their theoretical properties have been studied intensively and are well understood, but making ...use of them in case of high-dimensional data is difficult due to the non-convex optimization problem induced by L.sub.0 penalties. In this paper we introduce an adaptive ridge procedure (AR), where iteratively weighted ridge problems are solved whose weights are updated in such a way that the procedure converges towards selection with L.sub.0 penalties. After introducing AR its specific shrinkage properties are studied in the particular case of orthogonal linear regression. Based on extensive simulations for the non-orthogonal case as well as for Poisson regression the performance of AR is studied and compared with SCAD and adaptive LASSO. Furthermore an efficient implementation of AR in the context of least-squares segmentation is presented. The paper ends with an illustrative example of applying AR to analyze GWAS data.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
In genetic diseases with variable age of onset, survival function estimation for the mutation carriers as well as estimation of the modifying factors effects are essential to provide individual risk ...assessment, both for mutation carriers management and prevention strategies. In practice, this survival function is classically estimated from pedigrees data where most genotypes are unobserved. In this article, we present a unifying Expectation-Maximization (EM) framework combining probabilistic computations in Bayesian networks with standard statistical survival procedures in order to provide mutation carrier survival estimates. The proposed approach allows to obtain previously published parametric estimates (e.g. Weibull survival) as particular cases as well as more general Kaplan-Meier non-parametric estimates, which is the main contribution. Note that covariates can also be taken into account using a proportional hazard model. The whole methodology is both validated on simulated data and applied to family samples with transthyretin-related hereditary amyloidosis (a rare autosomal dominant disease with highly variable age of onset), showing very promising results.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Children born to mothers with placental malaria (PM) have been described as more susceptible to the occurrence of a first malaria infection. However, whether or not these children remain more at risk ...during infancy has never been explored. We aimed to determine if children born to mothers with PM are more susceptible to malaria and remain at higher risk between birth and 18 months.
Five hundred fifty children were followed up weekly with control of temperature and, if >37.5°C, both a rapid diagnostic test for malaria and a thick blood smear were performed. Taking into account environmental risk of infection, the relationship between occurrences of malaria attacks from birth to 18 months was modeled using Cox models for recurrent events.
PM is not associated with an overall susceptibility to malaria but only with the delay of occurrence of the first malaria attack. Children born from mothers with PM tend to have an increased risk for the first malaria attack (hazard ratio HR = 1.33; P = .048) but not for subsequent ones (HR = 0.9; P = .46). Children who experienced 1 malaria attack were strongly at risk to develop subsequent infections independent of placental infection and environmental exposure.
These results are consistent with the existence of an individual susceptibility to malaria unrelated to PM. From a public health point of view, protecting children born to infected placenta remains a priority, but seems insufficient to account for other frail children for whom a biomarker of frailty needs to be found.
Full text
Available for:
BFBNIB, NUK, PNG, UL, UM, UPUK
Causal network inference is an important methodological challenge in biology as well as other areas of application. Although several causal network inference methods have been proposed in recent ...years, they are typically applicable for only a small number of genes, due to the large number of parameters to be estimated and the limited number of biological replicates available. In this work, we consider the specific case of transcriptomic studies made up of both observational and interventional data in which a single gene of biological interest is knocked out. We focus on a marginal causal estimation approach, based on the framework of Gaussian directed acyclic graphs, to infer causal relationships between the knocked-out gene and a large set of other genes. In a simulation study, we found that our proposed method accurately differentiates between downstream causal relationships and those that are upstream or simply associative. It also enables an estimation of the total causal effects between the gene of interest and the remaining genes. Our method performed very similarly to a classical differential analysis for experiments with a relatively large number of biological replicates, but has the advantage of providing a formal causal interpretation. Our proposed marginal causal approach is computationally efficient and may be applied to several thousands of genes simultaneously. In addition, it may help highlight subsets of genes of interest for a more thorough subsequent causal network inference. The method is implemented in an R package called MarginalCausality (available on GitHub).
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed ...between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alternatives have been developed based on different and innovative variance modeling strategies. However, a critical issue is that selecting a different test usually leads to a different gene list. In this context and given the current tendency to apply the t-test, identifying the most efficient approach in practice remains crucial. To provide elements to answer, we conduct a comparison of eight tests representative of variance modeling strategies in gene expression data: Welch's t-test, ANOVA 1, Wilcoxon's test, SAM 2, RVM 3, limma 4, VarMixt 5 and SMVar 6. Our comparison process relies on four steps (gene list analysis, simulations, spike-in data and re-sampling) to formulate comprehensive and robust conclusions about test performance, in terms of statistical power, false-positive rate, execution time and ease of use. Our results raise concerns about the ability of some methods to control the expected number of false positives at a desirable level. Besides, two tests (limma and VarMixt) show significant improvement compared to the t-test, in particular to deal with small sample sizes. In addition limma presents several practical advantages, so we advocate its application to analyze gene expression data.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Polymerase proofreading-associated polyposis is a dominantly inherited colorectal cancer syndrome caused by exonuclease domain missense variants in the DNA polymerases POLE and POLD1. Manifestations ...may also include malignancies at extracolonic sites. Cancer risks in this syndrome are not yet accurately quantified.
We sequenced POLE and POLD1 exonuclease domains in 354 individuals with early/familial colorectal cancer (CRC) or adenomatous polyposis. We assessed the pathogenicity of POLE variants with yeast fluctuation assays and structural modeling. We estimated the penetrance function for each cancer site in variant carriers with a previously published nonparametric method based on survival analysis approach, able to manage unknown genotypes.
Pathogenic POLE exonuclease domain variants P286L, M294R, P324L, N363K, D368N, L424V, K425R, and P436S were found in ten families. The estimated cumulative risk of CRC at 30, 50, and 70 years was 11.1% (95% confidence interval CI: 4.2-17.5), 48.5% (33.2-60.3), and 74% (51.6-86.1). Cumulative risk of glioblastoma was 18.7% (3.2-25.8) at 70 years. Variants interfering with DNA binding (P286L and N363K) had a significantly higher mutagenic effect than variants disrupting ion metal coordination at the exonuclease site.
The risk estimates derived from this study provide a rational basis on which to provide genetic counseling to POLE variant carriers.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP