Normalization is an essential step with considerable impact on high-throughput RNA sequencing (RNA-seq) data analysis. Although there are numerous methods for read count normalization, it remains a ...challenge to choose an optimal method due to multiple factors contributing to read count variability that affects the overall sensitivity and specificity. In order to properly determine the most appropriate normalization methods, it is critical to compare the performance and shortcomings of a representative set of normalization routines based on different dataset characteristics. Therefore, we set out to evaluate the performance of the commonly used methods (DESeq, TMM-edgeR, FPKM-CuffDiff, TC, Med UQ and FQ) and two new methods we propose: Med-pgQ2 and UQ-pgQ2 (per-gene normalization after per-sample median or upper-quartile global scaling). Our per-gene normalization approach allows for comparisons between conditions based on similar count levels. Using the benchmark Microarray Quality Control Project (MAQC) and simulated datasets, we performed differential gene expression analysis to evaluate these methods. When evaluating MAQC2 with two replicates, we observed that Med-pgQ2 and UQ-pgQ2 achieved a slightly higher area under the Receiver Operating Characteristic Curve (AUC), a specificity rate > 85%, the detection power > 92% and an actual false discovery rate (FDR) under 0.06 given the nominal FDR (≤0.05). Although the top commonly used methods (DESeq and TMM-edgeR) yield a higher power (>93%) for MAQC2 data, they trade off with a reduced specificity (<70%) and a slightly higher actual FDR than our proposed methods. In addition, the results from an analysis based on the qualitative characteristics of sample distribution for MAQC2 and human breast cancer datasets show that only our gene-wise normalization methods corrected data skewed towards lower read counts. However, when we evaluated MAQC3 with less variation in five replicates, all methods performed similarly. Thus, our proposed Med-pgQ2 and UQ-pgQ2 methods perform slightly better for differential gene analysis of RNA-seq data skewed towards lowly expressed read counts with high variation by improving specificity while maintaining a good detection power with a control of the nominal FDR level.
With the rise of metabolomics, the development of methods to address analytical challenges in the analysis of metabolomics data is of great importance. Missing values (MVs) are pervasive, yet the ...treatment of MVs can have a substantial impact on downstream statistical analyses. The MVs problem in metabolomics is quite challenging and can arise because the metabolite is not biologically present in the sample, or is present in the sample but at a concentration below the lower limit of detection (LOD), or is present in the sample but undetected due to technical issues related to sample pre-processing steps. The former is considered missing not at random (MNAR) while the latter is an example of missing at random (MAR). Typically, such MVs are substituted by a minimum value, which may lead to severely biased results in downstream analyses.
We develop a Bayesian model, called BayesMetab, that systematically accounts for missing values based on a Markov chain Monte Carlo (MCMC) algorithm that incorporates data augmentation by allowing MVs to be due to either truncation below the LOD or other technical reasons unrelated to its abundance. Based on a variety of performance metrics (power for detecting differential abundance, area under the curve, bias and MSE for parameter estimates), our simulation results indicate that BayesMetab outperformed other imputation algorithms when there is a mixture of missingness due to MAR and MNAR. Further, our approach was competitive with other methods tailored specifically to MNAR in situations where missing data were completely MNAR. Applying our approach to an analysis of metabolomics data from a mouse myocardial infarction revealed several statistically significant metabolites not previously identified that were of direct biological relevance to the study.
Our findings demonstrate that BayesMetab has improved performance in imputing the missing values and performing statistical inference compared to other current methods when missing values are due to a mixture of MNAR and MAR. Analysis of real metabolomics data strongly suggests this mixture is likely to occur in practice, and thus, it is important to consider an imputation model that accounts for a mixture of missing data types.
The Model for End-Stage Liver Disease (MELD) score has been successfully used to prioritize patients on the United States liver transplant waiting list since its adoption in 2002. The United Network ...for Organ Sharing (UNOS)/Organ Procurement Transplantation Network (OPTN) allocation policy has evolved over the years, and notable recent changes include Share 35, inclusion of serum sodium in the MELD score, and a 'delay and cap' policy for hepatocellular carcinoma (HCC) patients. We explored the potential of a registrant's change in 30-day MELD scores (ΔMELD30) to improve allocation both before and after these policy changes. Current MELD and ΔMELD30 were evaluated using cause-specific hazards models for waitlist dropout based on US liver transplant registrants added to the waitlist between 06/30/2003 and 6/30/2013. Two composite scores were constructed and then evaluated on UNOS data spanning the current policy era (01/02/2016 to 09/07/2018). Predictive accuracy was evaluated using the C-index for model discrimination and by comparing observed and predicted waitlist dropout probabilities for model calibration. After the change to MELD-Na, increased dropout associated with ΔMELD30 jumps is no longer evident at MELD scores below 30. However, the adoption of Share 35 has potentially resulted in discrepancies in waitlist dropout for patients with sharp MELD increases at higher MELD scores. Use of the ΔMELD30 to add additional points or serve as a potential tiebreaker for patients with rapid deterioration may extend the benefit of Share 35 to better include those in most critical need.
Neoadjuvant therapy is increasingly being used before surgery for localized pancreatic cancer. Given the importance of completing multimodal therapy, the aim of this study was to characterize ...surgical resection rates after neoadjuvant therapy as well as the reasons for, and long-term prognostic impact of, not undergoing resection.
A systematic review and meta-analysis of prospective trials and high-quality retrospective studies since 2010 was performed to calculate pooled resection rates using a generalized random-effects model for potentially resectable, borderline resectable, and locally advanced pancreatic cancer. Median survival times were calculated using random-effects models for patients who did and did not undergo resection.
In 125 studies that met the inclusion criteria, neoadjuvant therapy consisted of chemotherapy (36.8 per cent), chemoradiation (15.2 per cent), or chemotherapy and radiation (48.0 per cent). Among 11 713 patients, the pooled resection rates were 77.4 (95 per cent c.i. 71.3 to 82.5), 60.6 (54.8 to 66.1), and 22.2 (16.7 to 29.0) per cent for potentially resectable, borderline resectable, and locally advanced pancreatic cancer respectively. The most common reasons for not undergoing resection were distant progression for resectable and borderline resectable cancers, and local unresectability for locally advanced disease. Among 42 studies with survival data available, achieving surgical resection after neoadjuvant therapy was associated with improved survival for patients with potentially resectable (median 38.5 versus 13.3 months), borderline resectable (32.3 versus 13.9 months), and locally advanced (30.0 versus 14.6 months) pancreatic cancer (P < 0.001 for all).
Although rates of surgical resection after neoadjuvant therapy vary based on anatomical stage, surgery is associated with improved survival for all patients with localized pancreatic cancer. These pooled resection and survival rates may inform patient-provider decision-making and serve as important benchmarks for future prospective trials.
The main purpose of this paper was to model the process by which patients enter the ED, are seen by physicians, and discharged from the Emergency Department at Nationwide Children's Hospital, as well ...as identify modifiable factors that are associated with ED lengths of stay through use of multistate modeling.
In this study, 75,591 patients admitted to the ED from March 1st, 2016 to February 28th, 2017 were analyzed using a multistate model of the ED process. Cox proportional hazards models with transition-specific covariates were used to model each transition in the multistate model and the Aalen-Johansen estimator was used to obtain transition probabilities and state occupation probabilities in the ED process.
Acuity level, season, time of day and number of ED physicians had significant and varying associations with the six transitions in the multistate model. Race and ethnicity were significantly associated with transition to left without being seen, but not with the other transitions. Conversely, age and gender were significantly associated with registration to room and subsequent transitions in the model, though the magnitude of association was not strong.
The multistate model presented in this paper decomposes the overall ED length of stay into constituent transitions for modeling covariate-specific effects on each transition. This allows physicians to understand the ED process and identify which potentially modifiable covariates would have the greatest impact on reducing the waiting times in each state in the model.
DSC is used to determine thermally-induced conformational changes of biomolecules within a blood plasma sample. Recent research has indicated that DSC curves (or thermograms) may have different ...characteristics based on disease status and, thus, may be useful as a monitoring and diagnostic tool for some diseases. Since thermograms are curves measured over a range of temperature values, they are considered functional data. In this paper we apply functional data analysis techniques to analyze differential scanning calorimetry (DSC) data from individuals from the Lupus Family Registry and Repository (LFRR). The aim was to assess the effect of lupus disease status as well as additional covariates on the thermogram profiles, and use FD analysis methods to create models for classifying lupus vs. control patients on the basis of the thermogram curves.
Thermograms were collected for 300 lupus patients and 300 controls without lupus who were matched with diseased individuals based on sex, race, and age. First, functional regression with a functional response (DSC) and categorical predictor (disease status) was used to determine how thermogram curve structure varied according to disease status and other covariates including sex, race, and year of birth. Next, functional logistic regression with disease status as the response and functional principal component analysis (FPCA) scores as the predictors was used to model the effect of thermogram structure on disease status prediction. The prediction accuracy for patients with Osteoarthritis and Rheumatoid Arthritis but without Lupus was also calculated to determine the ability of the classifier to differentiate between Lupus and other diseases. Data were divided 1000 times into separate 2/3 training and 1/3 test data for evaluation of predictions. Finally, derivatives of thermogram curves were included in the models to determine whether they aided in prediction of disease status.
Functional regression with thermogram as a functional response and disease status as predictor showed a clear separation in thermogram curve structure between cases and controls. The logistic regression model with FPCA scores as the predictors gave the most accurate results with a mean 79.22% correct classification rate with a mean sensitivity = 79.70%, and specificity = 81.48%. The model correctly classified OA and RA patients without Lupus as controls at a rate of 75.92% on average with a mean sensitivity = 79.70% and specificity = 77.6%. Regression models including FPCA scores for derivative curves did not perform as well, nor did regression models including covariates.
Changes in thermograms observed in the disease state likely reflect covalent modifications of plasma proteins or changes in large protein-protein interacting networks resulting in the stabilization of plasma proteins towards thermal denaturation. By relating functional principal components from thermograms to disease status, our Functional Principal Component Analysis model provides results that are more easily interpretable compared to prior studies. Further, the model could also potentially be coupled with other biomarkers to improve diagnostic classification for lupus.
Hospital length of stay (LOS) and time for a patient to reach clinical stability (TCS) have increasingly become important outcomes when investigating ways in which to combat Community Acquired ...Pneumonia (CAP). Difficulties arise when deciding how to handle in-hospital mortality. Ad-hoc approaches that are commonly used to handle time to event outcomes with mortality can give disparate results and provide conflicting conclusions based on the same data. To ensure compatibility among studies investigating these outcomes, this type of data should be handled in a consistent and appropriate fashion.
Using both simulated data and data from the international Community Acquired Pneumonia Organization (CAPO) database, we evaluate two ad-hoc approaches for handling mortality when estimating the probability of hospital discharge and clinical stability: 1) restricting analysis to those patients who lived, and 2) assigning individuals who die the "worst" outcome (right-censoring them at the longest recorded LOS or TCS). Estimated probability distributions based on these approaches are compared with right-censoring the individuals who died at time of death (the complement of the Kaplan-Meier (KM) estimator), and treating death as a competing risk (the cumulative incidence estimator). Tests for differences in probability distributions based on the four methods are also contrasted.
The two ad-hoc approaches give different estimates of the probability of discharge and clinical stability. Analysis restricted to patients who survived is conceptually problematic, as estimation is conditioned on events that happen at a future time. Estimation based on assigning those patients who died the worst outcome (longest LOS and TCS) coincides with the complement of the KM estimator based on the subdistribution hazard, which has been previously shown to be equivalent to the cumulative incidence estimator. However, in either case the time to in-hospital mortality is ignored, preventing simultaneous assessment of patient mortality in addition to LOS and/or TCS. The power to detect differences in underlying hazards of discharge between patient populations differs for test statistics based on the four approaches, and depends on the underlying hazard ratio of mortality between the patient groups.
Treating death as a competing risk gives estimators which address the clinical questions of interest, and allows for simultaneous modelling of both in-hospital mortality and TCS / LOS. This article advocates treating mortality as a competing risk when investigating other time related outcomes.
Differential scanning calorimetry (DSC) is a tool for measuring the thermal stability profiles of complex molecular interactions in biological fluids. DSC profiles (thermograms) of biofluids provide ...specific signatures which are being utilized as a new diagnostic approach for characterizing disease but the development of these approaches is still in its infancy.
This article evaluates several approaches for the analysis of thermograms which could increase the utility of DSC for clinical application. Thermograms were analyzed using localized thermogram features and principal components (PCs). The performance of these methods was evaluated alongside six models for the classification of a data set comprised of 300 systemic lupus erythematosus (SLE) patients and 300 control subjects obtained from the Lupus Family Registry and Repository (LFRR).
Classification performance was substantially higher using the penalized algorithms relative to localized features/PCs alone. The models were grouped into two sets, the first having smoother solution vectors but lower classification accuracies than the second with seemingly noisier solution vectors.
Coupling thermogram technology with modern classification algorithms provides a powerful diagnostic approach for analysis of biological samples. The solution vectors from the models may reflect important information from the thermogram profiles for discriminating between clinical groups.
DSC thermograms show sensitivity to changes in the bulk plasma proteome that correlate with clinical status. To move this technology towards clinical application the development of new approaches is needed to extract discriminatory parameters from DSC profiles for the comparison and diagnostic classification of patients. This article is part of a Special Issue entitled Microcalorimetry in the BioSciences — Principles and Applications, edited by Fadi Bou-Abdallah.
•New approaches for the diagnostic analysis of thermograms were evaluated.•Classification performance was assessed using a large dataset of lupus and controls.•Thermogram feature metrics and principal components performed modestly.•Classification performance was higher for modern classification algorithms.•Uncovering biological drivers of thermogram changes can enhance diagnostic analysis.
Liver allocation policy was changed to reduce variance in median MELD scores at transplant (MMaT) in February 2020. “Acuity circles” replaced local allocation. Understanding the impact of policy ...change on donor utilization is important. Ideal (I), standard (S), and non‐ideal (NI) donors were defined. NI donors include older, higher BMI donors with elevated transaminases or bilirubin, history of hepatitis B or C, and all DCD donors. Utilization of I, S, and NI donors was established before and after allocation change and compared between low MELD (LM) centers (MMaT ≤ 28 before allocation change) and high MELD (HM) centers (MMaT > 28). Following reallocation, transplant volume increased nationally (67 transplants/center/year pre, 74 post, p .0006) and increased for both HM and LM centers. LM centers significantly increased use of NI donors and HM centers significantly increased use of I and S donors. Centers further stratify based on donor utilization phenotype. A subset of centers increased transplant volume despite rising MMaT by broadening organ acceptance criteria, increasing use of all donor types including DCD donors (98% increase), increasing living donation, and transplanting more frequently for alcohol associated liver disease. Variance in donor utilization can undermine intended effects of allocation policy change.
The authors critically examine center‐level utilization of donor livers following allocation policy change and find widely disparate utilization of available donor organs.