In this paper, we describe SAFlex (Structural Alphabet Flexibility), an extension of an existing structural alphabet (HMM-SA), to better explore increasing protein three dimensional structure ...information by encoding conformations of proteins in case of missing residues or uncertainties. An SA aims to reduce three dimensional conformations of proteins as well as their analysis and comparison complexity by simplifying any conformation in a series of structural letters. Our methodology presents several novelties. Firstly, it can account for the encoding uncertainty by providing a wide range of encoding options: the maximum a posteriori, the marginal posterior distribution, and the effective number of letters at each given position. Secondly, our new algorithm deals with the missing data in the protein structure files (concerning more than 75% of the proteins from the Protein Data Bank) in a rigorous probabilistic framework. Thirdly, SAFlex is able to encode and to build a consensus encoding from different replicates of a single protein such as several homomer chains. This allows localizing structural differences between different chains and detecting structural variability, which is essential for protein flexibility identification. These improvements are illustrated on different proteins, such as the crystal structure of an eukaryotic small heat shock protein. They are promising to explore increasing protein redundancy data and obtain useful quantification of their flexibility.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Analysis of Algorithms
In this paper we consider the distribution of a pattern of interest in a binary random (d; k)-sequence generated by a Markov source. Such constrained sequences are frequently ...encountered in communication systems. Unlike the previous approach based on generating function we have chosen here to use Markov chain embedding techniques. By doing so, we get both previous results (sequence constrained up to the rth occurrence), and new ones (sequence constrained up to its end). We also provide in both cases efficient algorithms using basic linear algebra only. We then compare our numerical results to previous ones and finally propose several interesting extensions of our method which further illustrate the usefulness of our approach. That is to say higher order Markov chains, renewal occurrences rather than overlapping ones, heterogeneous models, more complex patterns of interest, and multistate trial sequences.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
Estimating parameters in genetic diseases requires efficient algorithms to compute likelihood of genetic models in pedigrees. The Elston-Stewart algorithm allows to compute pedigree likelihoods with ...a complexity O(n x gtw) where n is the number of individuals, g is the number of genotypes, tw is the tree-width of the pedigree (tw = 3 to 5 for standard families). Computing first and second derivatives of the likelihood function is of great interest both to maximise the likelihood more efficiently, and to obtain confidence intervals on parameters. These derivatives can be computed numerically but this approach is slow and might lead to unstable computations. In this work, we present an extension of the Elston-Stewart algorithm combining Mendelian laws and polynomial arithmetic in order to obtain exact derivatives of the likelihood function. For a univariate model (one parameter to estimate) our algorithm computes derivatives up to the dth order with a multiplicative complexity factor of (d+1)(d+2)/2=O(d2 ) (3 for d = 1, 6 for d = 2). For a multivariate model with p parameters, we obtain the likelihood, the gradient, and the Hessian with a complexity factor of O(p2). We illustrate the interest of our algorithm with two classical models in genetic epidemiology: 1) in segregation analysis, order 2 multivariate likelihood derivatives allows to compute confidence interval jointly for penetrance parameters and disease allele frequencies; 2) for two-point linkage we establish the distribution of recombination rate estimates and derive pointwise confidence intervals for LOD scores.
Full text
Available for:
BFBNIB, NMLJ, NUK, PNG, UL, UM, UPUK
Although bariatric surgery is proven to sustain weight loss in morbidly obese patients, long-term adverse effects have yet to be fully characterized. This study compared the long-term consequences of ...two common forms of bariatric surgery: one-anastomosis gastric bypass (OAGB) and Roux-en-Y Gastric Bypass (RYGB) in a preclinical rat model. We evaluated the influence of biliopancreatic limb (BPL) length, malabsorption, and bile acid (BA) reflux on esogastric mucosa. After 30 weeks of follow-up, Wistar rats operated on RYGB, OAGB with a short BPL (15 cm, OAGB-15), or a long BPL (35 cm, OAGB-35), and unoperated rats exhibit no cases of esogastric cancer, metaplasia, dysplasia, or Barrett's esophagus. Compared to RYGB, OAGB-35 rats presented higher rate of esophagitis, fundic gastritis and perianastomotic foveolar hyperplasia. OAGB-35 rats also revealed the greatest weight loss and malabsorption. On the contrary, BA concentrations were the highest in the residual gastric pouch of OAGB-15 rats. Yet, no association could be established between the esogastric lesions and malabsorption, weight loss, or gastric bile acid concentrations. In conclusion, RYGB results in a better long-term outcome than OAGB, as chronic signs of biliary reflux or reactional gastritis were reported post-OAGB even after reducing the BPL length in a preclinical rat model.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
In this paper, we provide the ML (Maximum Likelihood) and the REML (REstricted ML) criteria for consistently estimating multivariate linear mixed-effects models with arbitrary correlation structure ...between the random effects across dimensions, but independent (and possibly heteroscedastic) residuals. By factorizing the random effects covariance matrix, we provide an explicit expression of the profiled deviance through a reparameterization of the model. This strategy can be viewed as the generalization of the estimation procedure used by Douglas Bates and his co-authors in the context of the fitting of one-dimensional linear mixed-effects models. Beside its robustness regarding the starting points, the approach enables a numerically consistent estimate of the random effects covariance matrix while classical alternatives such as the EM algorithm are usually non-consistent. In a simulation study, we compare the estimates obtained from the present method with the EM algorithm-based estimates. We finally apply the method to a study of an immune response to Malaria in Benin.
The mechanisms underlying the heterogeneity of clinical malaria remain largely unknown. We hypothesized that differential gene expression contributes to phenotypic variation of parasites which ...results in a specific interaction with the host, leading to different clinical features of malaria. In this study, we analyzed the transcriptomes of isolates obtained from asymptomatic carriers and patients with uncomplicated or cerebral malaria. We also investigated the transcriptomes of 3D7 clone and 3D7-Lib that expresses severe malaria associated-variant surface antigen. Our findings revealed a specific up-regulation of genes involved in pathogenesis, adhesion to host cell, and erythrocyte aggregation in parasites from patients with cerebral malaria and 3D7-Lib, compared to parasites from asymptomatic carriers and 3D7, respectively. However, we did not find any significant difference between the transcriptomes of parasites from cerebral malaria and uncomplicated malaria, suggesting similar transcriptomic pattern in these two parasite populations. The difference between isolates from asymptomatic children and cerebral malaria concerned genes coding for exported proteins, Maurer's cleft proteins, transcriptional factor proteins, proteins implicated in protein transport, as well as Plasmodium conserved and hypothetical proteins. Interestingly, UPs A1, A2, A3 and UPs B1 of var genes were predominantly found in cerebral malaria-associated isolates and those containing architectural domains of DC4, DC5, DC13 and their neighboring rif genes in 3D7-lib. Therefore, more investigations are needed to analyze the effective role of these genes during malaria infection to provide with new knowledge on malaria pathology. In addition, concomitant regulation of genes within the chromosomal neighborhood suggests a common mechanism of gene regulation in P. falciparum.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
In epidemiological or demographic studies, with variable age at onset, a typical quantity of interest is the incidence of a disease (for example the cancer incidence). In these studies, the ...individuals are usually highly heterogeneous in terms of dates of birth (the cohort) and with respect to the calendar time (the period) and appropriate estimation methods are needed. In this article a new estimation method is presented which extends classical age-period-cohort analysis by allowing interactions between age, period and cohort effects. We introduce a bidimensional regularized estimate of the hazard rate where a penalty is introduced on the likelihood of the model. This penalty can be designed either to smooth the hazard rate or to enforce consecutive values of the hazard to be equal, leading to a parsimonious representation of the hazard rate. In the latter case, we make use of an iterative penalized likelihood scheme to approximate the
norm, which makes the computation tractable. The method is evaluated on simulated data and applied on breast cancer survival data from the SEER program.
A sentinel network,
Obépine
, has been designed to monitor SARS-CoV-2 viral load in wastewaters arriving at wastewater treatment plants (WWTPs) in France as an indirect macro-epidemiological ...parameter. The sources of uncertainty in such a monitoring system are numerous, and the concentration measurements it provides are left-censored and contain outliers, which biases the results of usual smoothing methods. Hence, the need for an adapted pre-processing in order to evaluate the real daily amount of viruses arriving at each WWTP. We propose a method based on an auto-regressive model adapted to censored data with outliers. Inference and prediction are produced
via
a discretized smoother which makes it a very flexible tool. This method is both validated on simulations and real data from
Obépine
. The resulting smoothed signal shows a good correlation with other epidemiological indicators and is currently used by
Obépine
to provide an estimate of virus circulation over the watersheds corresponding to about 200 WWTPs.
Hereditary transthyretin (ATTRv) amyloidosis is of autosomal dominant transmission, caused by a spectrum of mutations in the transthyretin (TTR) gene. The ATTRV30M (p.Val50Met) is the most frequent ...substitution in Europe. Northern Sweden is a known cluster for ATTRV30M amyloidosis patients due to high prevalence of the mutation rate, with homozygous cases. First symptoms occur generally during the 6th decade. Previous studies reported low penetrance in this area and possible anticipation in families. In order to refine our knowledge of the genetic aspects, penetrance and factors that influence the disease's risk, we performed a comprehensive study of ATTRV30M families in Sweden.
To assess anticipation, well-established age at onset (AO) was compared in all informative parent-offspring pairs and in subgroups, after excluding ascertainment biases. Penetrance was estimated using a non-parametric method that enables to study covariates' effect on the disease's risk.
We analysed 114 ATTRV30M Swedish families, including 12 homozygous individuals. Among 131 parent-offspring pairs, we found an average anticipation of 11.7 Standard Deviation (SD) =10.03 years, higher in case of maternal transmission (mean ± SD = 13.7 ± 8.4 years), compared to paternal transmission (mean ± SD = 7.9 ± 11.5 years, p < .003). Anticipation remained significant, after exclusion of ascertainment biases. In heterozygous ATTRV30M kindred, penetrance was low, estimated below 10% 95% confidence interval (CI) = 6-10 at 40 years-old, increasing to 71% 95% CI= 65-76 at age 90 years. The risk was found to be higher in male patients (p < .01) and in case of maternal transmission (p < .01), reflecting a parent of origin effect. We observed no difference of penetrance according the geographical origin. Finally, the disease risk was similar in heterozygous and homozygous ATTRV30M amyloidosis individuals.
Our study provides new data on the genetics of ATTRV30M families in Sweden, including the occurrence of anticipation and on penetrance. Both are increased in case of maternal inheritance and in male patients. Overall, gender seems to be a factor that substantially modulates the AO of the disease, in this area. Clinically, these findings are of importance to guide the management of sibships and the monitoring of mutation carriers.
Full text
Available for:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
In bioinformatics it is common to search for a pattern of interest in a potentially large set of rather short sequences (upstream gene regions, proteins, exons, etc.). Although many methodological ...approaches allow practitioners to compute the distribution of a pattern count in a random sequence generated by a Markov source, no specific developments have taken into account the counting of occurrences in a set of independent sequences. We aim to address this problem by deriving efficient approaches and algorithms to perform these computations both for low and high complexity patterns in the framework of homogeneous or heterogeneous Markov models.
The latest advances in the field allowed us to use a technique of optimal Markov chain embedding based on deterministic finite automata to introduce three innovative algorithms. Algorithm 1 is the only one able to deal with heterogeneous models. It also permits to avoid any product of convolution of the pattern distribution in individual sequences. When working with homogeneous models, Algorithm 2 yields a dramatic reduction in the complexity by taking advantage of previous computations to obtain moment generating functions efficiently. In the particular case of low or moderate complexity patterns, Algorithm 3 exploits power computation and binary decomposition to further reduce the time complexity to a logarithmic scale. All these algorithms and their relative interest in comparison with existing ones were then tested and discussed on a toy-example and three biological data sets: structural patterns in protein loop structures, PROSITE signatures in a bacterial proteome, and transcription factors in upstream gene regions. On these data sets, we also compared our exact approaches to the tempting approximation that consists in concatenating the sequences in the data set into a single sequence.
Our algorithms prove to be effective and able to handle real data sets with multiple sequences, as well as biological patterns of interest, even when the latter display a high complexity (PROSITE signatures for example). In addition, these exact algorithms allow us to avoid the edge effect observed under the single sequence approximation, which leads to erroneous results, especially when the marginal distribution of the model displays a slow convergence toward the stationary distribution. We end up with a discussion on our method and on its potential improvements.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK