Single-cell RNA-seq data allows insight into normal cellular function and various disease states through molecular characterization of gene expression on the single cell level. Dimensionality ...reduction of such high-dimensional data sets is essential for visualization and analysis, but single-cell RNA-seq data are challenging for classical dimensionality-reduction methods because of the prevalence of dropout events, which lead to zero-inflated data. Here, we develop a dimensionality-reduction method, (Z)ero (I)nflated (F)actor (A)nalysis (ZIFA), which explicitly models the dropout characteristics, and show that it improves modeling accuracy on simulated and biological data sets.
Recent major cancer genome sequencing studies have used whole-genome sequencing to detect various types of genomic variation. However, a number of these studies have continued to rely on SNP array ...information to provide additional results for copy number and loss-of-heterozygosity estimation and assessing tumour purity. OncoSNP-SEQ is a statistical model-based approach for inferring copy number profiles directly from high-coverage whole genome sequencing data that is able to account for unknown tumour purity and ploidy.
MATLAB code is available at the following URL: https://sites.google.com/site/oncosnpseq/.
Advances in single cell genomics provide a way of routinely generating transcriptomics data at the single cell level. A frequent requirement of single cell expression analysis is the identification ...of novel patterns of heterogeneity across single cells that might explain complex cellular states or tissue composition. To date, classical statistical analysis tools have being routinely applied, but there is considerable scope for the development of novel statistical approaches that are better adapted to the challenges of inferring cellular hierarchies.
We have developed a novel agglomerative clustering method that we call pcaReduce to generate a cell state hierarchy where each cluster branch is associated with a principal component of variation that can be used to differentiate two cell states. Using two real single cell datasets, we compared our approach to other commonly used statistical techniques, such as K-means and hierarchical clustering. We found that pcaReduce was able to give more consistent clustering structures when compared to broad and detailed cell type labels.
Our novel integration of principal components analysis and hierarchical clustering establishes a connection between the representation of the expression data and the number of cell types that can be discovered. In doing so we found that pcaReduce performs better than either technique in isolation in terms of characterising putative cell states. Our methodology is complimentary to other single cell clustering techniques and adds to a growing palette of single cell bioinformatics tools for profiling heterogeneous cell populations.
The identification of tumor-specific molecular dependencies is essential for the development of effective cancer therapies. Genetic and chemical perturbations are powerful tools for discovering these ...dependencies. Even though chemical perturbations can be applied to primary cancer samples at large scale, the interpretation of experiment outcomes is often complicated by the fact that one chemical compound can affect multiple proteins. To overcome this challenge, Batzilla et al. (PLoS Comput Biol 18(8): e1010438, 2022) proposed DepInfeR, a regularized multi-response regression model designed to identify and estimate specific molecular dependencies of individual cancers from their ex-vivo drug sensitivity profiles. Inspired by their work, we propose a Bayesian extension to DepInfeR. Our proposed approach offers several advantages over DepInfeR, including e.g. the ability to handle missing values in both protein-drug affinity and drug sensitivity profiles without the need for data pre-processing steps such as imputation. Moreover, our approach uses Gaussian Processes to capture more complex molecular dependency structures, and provides probabilistic statements about whether a protein in the protein-drug affinity profiles is informative to the drug sensitivity profiles. Simulation studies demonstrate that our proposed approach achieves better prediction accuracy, and is able to discover unreported dependency structures.
Pseudotime algorithms can be employed to extract latent temporal information from cross-sectional data sets allowing dynamic biological processes to be studied in situations where the collection of ...time series data is challenging or prohibitive. Computational techniques have arisen from single-cell 'omics and cancer modelling where pseudotime can be used to learn about cellular differentiation or tumour progression. However, methods to date typically implicitly assume homogeneous genetic, phenotypic or environmental backgrounds, which becomes limiting as data sets grow in size and complexity. We describe a novel statistical framework that learns how pseudotime trajectories can be modulated through covariates that encode such factors. We apply this model to both single-cell and bulk gene expression data sets and show that the approach can recover known and novel covariate-pseudotime interaction effects. This hybrid regression-latent variable model framework extends pseudotemporal modelling from its most prevalent area of single cell genomics to wider applications.
Obstetric brachial plexus injuries (OBPIs) are rare but can have significant implications for those affected, their caregivers and the health system. Symptoms can range from restricted movement to ...complete paralysis of the arm. We investigated health-related quality of life in adults with OBPIs and parents of children with permanent OBPIs, compared these with population norms, and investigated whether certain socio-demographic or clinical factors were associated with the quality of life in these cohorts.
A cross-sectional study examined 50 affected adults and 78 parents. Participants completed EQ-5D-5 L and characteristics questionnaires. EQ-5D-5 L responses were mapped onto an EQ-5D-3 L value set to generate utility scores. Mean utility scores were compared with English population norms. Univariable and multivariable linear regression models were conducted to assess for associations between participant characteristics and the utility scores.
The overall mean utility scores for affected adults and parents were 0.56 (SD 0.28) and 0.80 (SD 0.19) respectively. Affected adults (95% CI (- 0.38, - 0.22), p < 0.001) and parents of children with permanent OBPIs (95% CI (- 0.10, - 0.02), p = 0.007) had lower mean utility scores, and therefore quality of life, compared to English population norms. For affected adults, previous OBPI surgery (95% CI (0.01, 0.25), p = 0.040), employment in non-manual work (95% CI (0.06, 0.30), p = 0.005) and having a partner (95% CI (0.04, 0.25), p = 0.009) appeared to be positively associated with the utility score. Affected adults receiving disability benefits related to OBPIs appeared to have worse utility scores than those not receiving any disability benefits (95% CI (- 0.31, - 0.06), p = 0.005). For parents, employment was associated with better utility scores (95% CI (0.02, 0.20), p = 0.024) but the presence of one or more medical condition appeared to be associated with worse utility scores (95% CI (- 0.16, - 0.04), p = 0.001).
Adults with OBPIs and parents of children with permanent OBPIs reported worse utility scores, and therefore quality of life, compared to the English general population. We also identified certain characteristics as possible factors to consider when dealing with utility scores in these cohorts. The utility scores in this study can be used in future economic evaluations related to OBPIs.
Nonsense-mediated decay (NMD) eliminates transcripts with premature termination codons. Although NMD-induced loss-of-function has been shown to contribute to the genesis of particular cancers, its ...global functional consequence in tumours has not been characterized. Here we develop an algorithm to predict NMD and apply it on somatic mutations reported in The Cancer Genome Atlas. We identify more than 73 K mutations that are predicted to elicit NMD (NMD-elicit). NMD-elicit mutations in tumour suppressor genes (TSGs) are associated with significant reduction in gene expression. We discover cancer-specific NMD-elicit signatures in TSGs and cancer-associated genes. Our analysis reveals a previously unrecognized dependence of hypermutated tumours on hypofunction of genes that are involved in chromatin remodelling and translation. Half of hypermutated stomach adenocarcinomas are associated with NMD-elicit mutations of the translation initiators LARP4B and EIF5B. Our results unravel strong therapeutic opportunities by targeting tumour dependencies on NMD-elicit mutations.
Objective
To quantify the incidence of intrapartum risk factors in labours with an adverse outcome, and compare them with the incidence of the same indicators in a series of consecutive labours ...without adverse outcome.
Design
Case–control study.
Setting
Twenty‐six maternity units in the UK.
Population or sample
Sixty‐nine labours with an adverse outcome and 198 labours without adverse outcome.
Methods
Observational study.
Main outcome measures
Incidence of risk factors in hourly assessments for 7 hours before birth in the two groups.
Results
A risk score combining suspected fetal growth restriction, tachysystole, meconium in the amniotic fluid and fetal heart rate abnormalities (baseline rate and variability, presence of decelerations) gave the best indication of likely outcome group.
Conclusions
Accurate risk assessment in labour requires fetal heart rate abnormalities to be considered in context with additional intrapartum risk factors.
Abstract Objective To determine the economic impact of the introduction of carbetocin for the prevention of postpartum haemorrhage (PPH) at caesarean section, compared to oxytocin. Study design The ...model is a decision tree conducted from a UK National Health Service perspective. 1500 caesarean sections (both elective and emergency) were modelled over a 12 month period. Efficacy data was taken from a published Cochrane meta-analysis, and costs from NHS Reference costs, the British National Formulary and the NHS electronic Medicines Information Tool. A combination of hospital audit data and expert input from an advisory board of clinicians was used to inform resource use estimates. The main outcome measures were the incidence of PPH and total cost over a one year time horizon, as a result of using carbetocin compared to oxytocin for prevention of PPH at caesarean section. Results The use of carbetocin compared to oxytocin for prevention of PPH at caesarean section was associated with a reduction of 30 (88 vs 58) PPH events (>500 ml blood loss), and a cost saving of £27,518. In probabilistic sensitivity analysis, carbetocin had a 91.5% probability of producing better outcomes, and a 69.4% chance of being dominant (both cheaper and more effective) compared to oxytocin. Conclusion At list price, the introduction of carbetocin appears to provide improved clinical outcomes along with cost savings, though this is subject to uncertainty regarding the underlying data in efficacy, resource use, and cost.
Clustering of joint single-cell RNA-Seq (scRNA-Seq) data is often challenged by confounding factors, such as batch effects and biologically relevant variability. Existing batch effect removal methods ...typically require strong assumptions on the composition of cell populations being near identical across samples. Here, we present CIDER, a meta-clustering workflow based on inter-group similarity measures. We demonstrate that CIDER outperforms other scRNA-Seq clustering methods and integration approaches in both simulated and real datasets. Moreover, we show that CIDER can be used to assess the biological correctness of integration in real datasets, while it does not require the existence of prior cellular annotations.