We propose an extension to quantile normalization that removes unwanted technical variation using control probes. We adapt our algorithm, functional normalization, to the Illumina 450k methylation ...array and address the open problem of normalizing methylation data with global epigenetic changes, such as human cancers. Using data sets from The Cancer Genome Atlas and a large case-control study, we show that our algorithm outperforms all existing normalization methods with respect to replication of results between experiments, and yields robust results even in the presence of batch effects. Functional normalization can be applied to any microarray platform, provided suitable control probes are available.
Capturing the conditional covariances or correlations among the elements of a multivariate response vector based on covariates is important to various fields including neuroscience, epidemiology and ...biomedicine. We propose a new method called Covariance Regression with Random Forests (CovRegRF) to estimate the covariance matrix of a multivariate response given a set of covariates, using a random forest framework. Random forest trees are built with a splitting rule specially designed to maximize the difference between the sample covariance matrix estimates of the child nodes. We also propose a significance test for the partial effect of a subset of covariates. We evaluate the performance of the proposed method and significance test through a simulation study which shows that the proposed method provides accurate covariance matrix estimates and that the Type-1 error is well controlled. An application of the proposed method to thyroid disease data is also presented. CovRegRF is implemented in a freely available R package on CRAN.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Abstract
Motivation
The human microbiota is the collection of microorganisms colonizing the human body, and plays an integral part in human health. A growing trend in microbiome analysis is to ...construct a network to estimate the co-occurrence patterns among taxa through precision matrices. Existing methods do not facilitate investigation into how these networks change with respect to covariates.
Results
We propose a new model called Microbiome Differential Network Estimation (MDiNE) to estimate network changes with respect to a binary covariate. The counts of individual taxa in the samples are modeled through a multinomial distribution whose probabilities depend on a latent Gaussian random variable. A sparse precision matrix over all the latent terms determines the co-occurrence network among taxa. The model fit is obtained and evaluated using Hamiltonian Monte Carlo methods. The performance of our model is evaluated through an extensive simulation study and is shown to outperform existing methods in terms of estimation of network parameters. We also demonstrate an application of the model to estimate changes in the intestinal microbial network topology with respect to Crohn’s disease.
Availability and implementation
MDiNE is implemented in a freely available R package: https://github.com/kevinmcgregor/mdine.
Supplementary information
Supplementary data are available at Bioinformatics online.
•A network screening method using GPS data and surrogate safety measures is proposed.•Surrogate measures are used as covariates in models of crash frequency and severity.•In general, the effect of ...the covariates supported results from previous studies.•To rank sites, model results are combined using the total crash cost per vehicle-km.•The model captures between 30% and 45% of hotspots identified using crash data.
Crash frequency and injury severity are independent dimensions defining crash risk in road safety management and network screening. Traditional screening techniques model crashes using regression and historical crash data, making them intrinsically reactive. In response, surrogate measures of safety have become a popular proactive alternative. The purpose of this paper is to develop models for crash frequency and severity incorporating GPS-derived surrogate safety measures (SSMs) as predictive variables. SSMs based on vehicle manoeuvres and traffic flow were extracted from data collected in Quebec City. The mixed multivariate outcome is estimated using two models; a Full Bayes Spatial Negative Binomial model for crash frequency estimated using the Integrated Nested Laplace Approximation approach and a fractional Multinomial Logit model for crash severity. Model outcomes are combined to generate posterior expected crash frequency at each severity level and rank sites based on crash cost. The crash frequency model was accurate at the network scale, with the majority of proposed SSMs statistically significant at 95% confidence and the direction of their effect generally consistent with previous research. In the crash severity model, fewer variables were significant, yet the direction of the effect of all significant variables was again consistent with previous research. Correlations between rankings predicted by the mixed multivariate model and by the crash data were adequate for intersections (0.46) but were poorer for links (0.25). The ability to prioritize sites based on GPS data and SSMs rather than historical crash data represents a substantial contribution to the field of road safety.
Converting minor-approach-only stop (MAS) intersections to all-way-stop (AWS) intersections is a prevailing safety countermeasure in North American urban areas. Although the general population ...positively perceives the installation of stop-signs in residential areas, little research has investigated the impact of AWS on road safety and road user behaviour. This paper investigated the safety effectiveness of converting MAS to AWS intersections using an observational before and after approach and surrogate measures of safety. More specifically, the safety impacts of AWS conversion were investigated using multiple indicators, including vehicle speed measures, vehicle–pedestrian, vehicle-cyclist, vehicles-vehicle interactions as well as yielding rates before and after the treatment implementation. A multi-level regression approach was adopted to determine the effect of stop signs controlling for built environments, traffic exposure, and intersection geometry factors as well as site-specific unobserved heterogeneity.
A unique sample of 31 intersections were used in this before-after study. From this sample, video data were collected before and after implementing AWS. In total, 245 h of video were automatically processed and corrected using a specialized computer vision software. More than 68,000 (37,668 before and 31,305 after AWS treatment) road user trajectories were obtained from 104 approaches. The results show that the conversion of MAS to AWS intersections significantly decreased vehicle speed and increased post-encroachment time. This work also shows that implementing AWS significantly increased the yielding rates from 45.7% to 76.7% in MAS conditions and reduced the average speed of motor-vehicles. Using multi-level regression model, it is estimated that when the intersection was converted from MAS to AWS, the minimum speed in the major approaches was reduced by 60.0%.
Abstract
Electroencephalography measures are of interest in developmental neuroscience as potentially reliable clinical markers of brain function. Features extracted from electroencephalography are ...most often averaged across individuals in a population with a particular condition and compared statistically to the mean of a typically developing group, or a group with a different condition, to define whether a feature is representative of the populations as a whole. However, there can be large variability within a population, and electroencephalography features often change dramatically with age, making comparisons difficult. Combined with often low numbers of trials and low signal-to-noise ratios in pediatric populations, establishing biomarkers can be difficult in practice. One approach is to identify electroencephalography features that are less variable between individuals and are relatively stable in a healthy population during development. To identify such features in resting-state electroencephalography, which can be readily measured in many populations, we introduce an innovative application of statistical measures of variance for the analysis of resting-state electroencephalography data. Using these statistical measures, we quantified electroencephalography features commonly used to measure brain development—including power, connectivity, phase-amplitude coupling, entropy, and fractal dimension—according to their intersubject variability. Results from 51 6-month-old infants revealed that the complexity measures, including fractal dimension and entropy, followed by connectivity were the least variable features across participants. This stability was found to be greatest in the right parietotemporal region for both complexity feature, but no significant region of interest was found for connectivity feature. This study deepens our understanding of physiological patterns of electroencephalography data in developing brains, provides an example of how statistical measures can be used to analyze variability in resting-state electroencephalography in a homogeneous group of healthy infants, contributes to the establishment of robust electroencephalography biomarkers of neurodevelopment through the application of variance analyses, and reveals that nonlinear measures may be most relevant biomarkers of neurodevelopment.
IMPORTANCE;: Copy number variants (CNVs) classified as pathogenic are identified in 10% to 15% of patients referred for neurodevelopmental disorders. However, their effect sizes on cognitive traits ...measured as a continuum remain mostly unknown because most of them are too rare to be studied individually using association studies. OBJECTIVE: To measure and estimate the effect sizes of recurrent and nonrecurrent CNVs on IQ. DESIGN, SETTING, AND PARTICIPANTS: This study identified all CNVs that were 50 kilobases (kb) or larger in 2 general population cohorts (the IMAGEN project and the Saguenay Youth Study) with measures of IQ. Linear regressions, including functional annotations of genes included in CNVs, were used to identify features to explain their association with IQ. Validation was performed using intraclass correlation that compared IQ estimated by the model with empirical data. MAIN OUTCOMES AND MEASURES: Performance IQ (PIQ), verbal IQ (VIQ), and frequency of de novo CNV events. RESULTS: The study included 2090 European adolescents from the IMAGEN study and 1983 children and parents from the Saguenay Youth Study. Of these, genotyping was performed on 1804 individuals from IMAGEN and 977 adolescents, 445 mothers, and 448 fathers (484 families) from the Saguenay Youth Study. We observed 4928 autosomal CNVs larger than 50 kb across both cohorts. For rare deletions, size, number of genes, and exons affect IQ, and each deleted gene is associated with a mean (SE) decrease in PIQ of 0.67 (0.19) points (P = 6 × 10−4); this is not so for rare duplications and frequent CNVs. Among 10 functional annotations, haploinsufficiency scores best explain the association of any deletions with PIQ with a mean (SE) decrease of 2.74 (0.68) points per unit of the probability of being loss-of-function intolerant (P = 8 × 10−5). Results are consistent across cohorts and unaffected by sensitivity analyses removing pathogenic CNVs. There is a 0.75 concordance (95% CI, 0.39-0.91) between the effect size on IQ estimated by our model and IQ loss calculated in previous studies of 15 recurrent CNVs. There is a close association between effect size on IQ and the frequency at which deletions occur de novo (odds ratio, 0.86; 95% CI, 0.84-0.87; P = 2.7 × 10−88). There is a 0.76 concordance (95% CI, 0.41-0.91) between de novo frequency estimated by the model and calculated using data from the DECIPHER database. CONCLUSIONS AND RELEVANCE: Models trained on nonpathogenic deletions in the general population reliably estimate the effect size of pathogenic deletions and suggest omnigenic associations of haploinsufficiency with IQ. This represents a new framework to study variants too rare to perform individual association studies and can help estimate the cognitive effect of undocumented deletions in the neurodevelopmental clinic.
•This paper models the effects of exposure, geometry, and signalization on pedestrian injuries at signalized intersections.•Full Bayes spatial models were estimated using the INLA technique on a rich ...database of intersections in Montreal, Quebec.•Traffic exposure, curb extensions, raised medians, and exclusive left turn lanes were associated with pedestrian injuries.•Total lanes and commercial entrances increased injuries, while pedestrian priority phases reduced injuries.•Hotspot analysis identified dangerous sites based on total crashes and crash rates.
Intersections represent the most dangerous sites in the road network for pedestrians: not only is modal separation often impossible, but elements of geometry, traffic control, and built environment further exacerbate crash risk. Evaluating the safety impact of intersection features requires methods to quantify relationships between different factors and pedestrian injuries. The purpose of this paper is to model the effects of exposure, geometry, and signalization on pedestrian injuries at urban signalized intersections using a Full Bayes spatial Poisson Log-Normal model that accounts for unobserved heterogeneity and spatial correlation. Using the Integrated Nested Laplace Approximation (INLA) technique, this work leverages a rich database of geometric and signalization variables for 1864 intersections in Montreal, Quebec. To collect exposure data, short-term pedestrian and vehicle counts were extrapolated to AADT using developed expansion factors. Results of the model confirmed the positive relationship between pedestrian and vehicle volumes and pedestrian injuries. Curb extensions, raised medians, and exclusive left turn lanes were all found to reduce pedestrian injuries, while the total number of lanes and the number of commercial entrances were found to increase them. Pedestrian priority phases reduced injuries while the green straight arrow increased injuries. Lastly, the posterior expected number of crashes was used to identify hotspots. The proposed ranking criteria identified many intersections close to the city centre where the expected number of crashes is highest and intersections along arterials with lower pedestrian volumes where individual pedestrian risk is elevated. Understanding the effects of intersection geometry and pedestrian signalization will aid in ensuring the safety of pedestrians at signalized intersections.
Crash data observed on a road network often exhibit spatial correlation due to unobserved effects with inherent spatial correlation following the structure of the road network. It is important to ...model this spatial correlation while accounting for the road network structure. In this study, we introduce the network process convolution (NPC) model. In this model, the spatial correlation among crash data is captured by a Gaussian Process (GP) approximated through a kernel convolution approach. The GP’s covariance function is based on path distance computed between a limited set of knots and crash data points on the road network. The proposed model offers a straightforward approach for predicting crash frequency at unobserved locations where covariates are available, and for interpolating the GP values anywhere on the network. Inference procedure is performed following the Bayesian paradigm and is implemented in R-INLA, which offers an estimation procedure that is very efficient compared to Markov Chain Monte Carlo sampling algorithms. We fitted our model to synthetic data and to crash data from Ottawa, Canada. We compared the proposed approach with a proper Conditional Autoregressive (pCAR) model, and with Poisson Regression (PR) and Negative Binomial (NB) models without latent effects. The results of the study indicated that although the pCAR model has comparable fitting performance, the NPC model outperforms pCAR when the main goal is to predict unobserved locations of interest. The proposed model also offers lower mean absolute error rates for cross validated crash counts, latent variable values, fixed-effect coefficients, as well as shorter interval scores for singletons. The NPC provides a natural way to account for the road network structure when considering the inclusion of spatially structured latent random effects in the modelling of crash data. It also offers an improved predictive capability for crash data on a road network.
•The spatial structure in road crash frequency data follows the structure of the road network.•The network process convolution model (NPC) incorporates the structure of the road network using path distances.•The results indicate better model performance compared to the neighbourhood-based spatial modelling algorithms (CAR).•NPC model implemented in INLA offers significant computational advantages.
Transmission ratio distortion (TRD) occurs when one of the two alleles from either parent is preferentially transmitted to the offspring. This leads to a statistical departure from the Mendelian law ...of inheritance, which states that each of the two parental alleles is transmitted to offspring with a probability of 0.5. A number of mechanisms are thought to induce TRD such as meiotic drive, gametic competition, and embryo lethality. TRD has been extensively studied in animals, but the prevalence of TRD in humans remains largely unknown. Nevertheless, understanding the TRD phenomenon and taking it into consideration in many aspects of human genetics has potential benefits that have not been sufficiently emphasized in the current literature. In this review, we discuss the importance of TRD in three distinct but related fields of genetics: developmental genetics which studies the genetic abnormalities in zygotic and embryonic development, statistical genetics/genetic epidemiology which utilizes population study designs and statistical models to interpret the role of genes in human health, and population genetics which is concerned with genetic diversity in populations in an evolutionary context. From the perspective of developmental genetics, studying TRD leads to the identification of the processes and mechanisms for differential survival observed in embryos. As a result, it is a genetic force which affects allele frequency at the population, as well as, at the organismal level. Therefore, it has implications on genetic diversity of the population over time. From the perspective of genetic epidemiology, the TRD influence on a marker locus is a confounding factor which has to be adequately dealt with to correctly interpret linkage or association study results. These aspects are developed in this review. In addition to these theoretical notions, a brief summary of the empirical evidence of the TRD phenomenon in human and mouse studies is provided. The objective of our paper is to show the potentially important role of TRD in many areas of genetics, and to create an incentive for future research.