Cross-validation (CV) is an effective method for estimating the prediction error of a classifier. Some recent articles have proposed methods for optimizing classifiers by choosing classifier ...parameter values that minimize the CV error estimate. We have evaluated the validity of using the CV error estimate of the optimized classifier as an estimate of the true error expected on independent data.
We used CV to optimize the classification parameters for two kinds of classifiers; Shrunken Centroids and Support Vector Machines (SVM). Random training datasets were created, with no difference in the distribution of the features between the two classes. Using these "null" datasets, we selected classifier parameter values that minimized the CV error estimate. 10-fold CV was used for Shrunken Centroids while Leave-One-Out-CV (LOOCV) was used for the SVM. Independent test data was created to estimate the true error. With "null" and "non null" (with differential expression between the classes) data, we also tested a nested CV procedure, where an inner CV loop is used to perform the tuning of the parameters while an outer CV is used to compute an estimate of the error. The CV error estimate for the classifier with the optimal parameters was found to be a substantially biased estimate of the true error that the classifier would incur on independent data. Even though there is no real difference between the two classes for the "null" datasets, the CV error estimate for the Shrunken Centroid with the optimal parameters was less than 30% on 18.5% of simulated training data-sets. For SVM with optimal parameters the estimated error rate was less than 30% on 38% of "null" data-sets. Performance of the optimized classifiers on the independent test set was no better than chance. The nested CV procedure reduces the bias considerably and gives an estimate of the error that is very close to that obtained on the independent testing set for both Shrunken Centroids and SVM classifiers for "null" and "non-null" data distributions.
We show that using CV to compute an error estimate for a classifier that has itself been tuned using CV gives a significantly biased estimate of the true error. Proper use of CV for estimating true error of a classifier developed using a well defined algorithm requires that all steps of the algorithm, including classifier parameter tuning, be repeated in each CV loop. A nested CV procedure provides an almost unbiased estimate of the true error.
Road construction involves large quantity of construction materials, primarily obtained from natural sources. Utilization of waste materials in construction activities can reduce burden on these ...natural sources. To achieve sustainable development, use of steel slag as a substitute to natural-aggregates has gained significant attention. Steel slag as aggregate has many well-known applications in road construction such as, hot mix asphalt, cement concrete mix, antiskid-layer, granular-base and subbase layers. This paper review development in various aspects of utilizing steel slag aggregate in dense graded hot mix asphalt. Properties of steel slag can significantly influence performance of roads. Chemical composition and physical and mechanical properties of steel slag have been reviewed in consideration of its suitability in asphalt mix. Findings from asphalt mix design studies indicate that, steel slag asphalt mix can be designed to satisfy both volumetric and mechanical properties. However, caution need to be exercised to ensure that addition of steel slag does not lead to undesirable change in performance of mix. Review of laboratory and field performance evaluation studies related to skid, moisture, fatigue and rutting failures are presented in the paper.
Background
Coronavirus disease (COVID‐19) has crippled life, families and oral healthcare delivery in India due to nationwide lockdown.
Aim
Through cross‐sectional design, we investigated the impact ...of child's dental pain, caregiver's fear of SARS‐CoV‐2 and parental distress on oral health–related quality of life (OHRQOL) of preschoolers during the nationwide COVID‐19 pandemic lockdown.
Design
Preschool children self‐reported their pain using Pieces of Hurt scale; caregiver SARS‐CoV‐2 fear was assessed using Fear of COVID‐19 scale and parental distress evaluated using 4‐item scale. Child's oral health was assessed using the dmft index and OHRQOL evaluated using early childhood oral health impact scale. Bivariate, multivariate regression analysis was conducted to identify predictors; statistical significance was set at 5%.
Results
Sample mean age was 4.58 years, and about 69% were boys. Children reporting higher pain scores (OR = 1.9) due to decayed teeth and having dmft > 5 (OR = 4.25), followed by greater parental distress (OR = 4.13) and fear of SARS‐CoV‐2 (OR = 3.84), were significantly associated with poor OHRQOL during the COVID‐19 pandemic.
Conclusions
Greater parental distress and fear of COVID‐19 among caregivers, higher self‐perceived dental pain among children and caries experience are associated with poor OHRQOL of preschool children during the COVID‐19 pandemic.
Microarray batch effect (BE) has been the primary bottleneck for large-scale integration of data from multiple experiments. Current BE correction methods either need known batch identities (ComBat) ...or have the potential to overcorrect, by removing true but unknown biological differences (Surrogate Variable Analysis SVA). It is well known that experimental conditions such as array or reagent batches, PCR amplification or ozone levels can affect the measured expression levels; often the direction of perturbation of the measured expression is the same in different datasets. However, there are no BE correction algorithms that attempt to estimate the individual effects of technical differences and use them to correct expression data. In this manuscript, we show that a set of signatures, each of which is a vector the length of the number of probes, calculated on a reference set of microarray samples can predict much of the batch effect in other validation sets. We present a rationale of selecting a reference set of samples designed to estimate technical differences without removing biological differences. Putting both together, we introduce the Batch Effect Signature Correction (BESC) algorithm that uses the BES calculated on the reference set to efficiently predict and remove BE. Using two independent validation sets, we show that BESC is capable of removing batch effect without removing unknown but true biological differences. Much of the variations due to batch effect is shared between different microarray datasets. That shared information can be used to predict signatures (i.e. directions of perturbation) due to batch effect in new datasets. The correction can be precomputed without using the samples to be corrected (blind), done on each sample individually (single sample) and corrects only known technical effects without removing known or unknown biological differences (conservative). Those three characteristics make it ideal for high-throughput correction of samples for a microarray data repository. We also compare the performance of BESC to three other batch correction methods: SVA, Removing Unwanted Variation (RUV) and Hidden Covariates with Prior (HCP). An R Package besc implementing the algorithm is available from http://explainbio.com.
The metabolic basis of Alzheimer disease (AD) is poorly understood, and the relationships between systemic abnormalities in metabolism and AD pathogenesis are unclear. Understanding how global ...perturbations in metabolism are related to severity of AD neuropathology and the eventual expression of AD symptoms in at-risk individuals is critical to developing effective disease-modifying treatments. In this study, we undertook parallel metabolomics analyses in both the brain and blood to identify systemic correlates of neuropathology and their associations with prodromal and preclinical measures of AD progression.
Quantitative and targeted metabolomics (Biocrates AbsoluteIDQ identification and quantification p180) assays were performed on brain tissue samples from the autopsy cohort of the Baltimore Longitudinal Study of Aging (BLSA) (N = 44, mean age = 81.33, % female = 36.36) from AD (N = 15), control (CN; N = 14), and "asymptomatic Alzheimer's disease" (ASYMAD, i.e., individuals with significant AD pathology but no cognitive impairment during life; N = 15) participants. Using machine-learning methods, we identified a panel of 26 metabolites from two main classes-sphingolipids and glycerophospholipids-that discriminated AD and CN samples with accuracy, sensitivity, and specificity of 83.33%, 86.67%, and 80%, respectively. We then assayed these 26 metabolites in serum samples from two well-characterized longitudinal cohorts representing prodromal (Alzheimer's Disease Neuroimaging Initiative ADNI, N = 767, mean age = 75.19, % female = 42.63) and preclinical (BLSA) (N = 207, mean age = 78.68, % female = 42.63) AD, in which we tested their associations with magnetic resonance imaging (MRI) measures of AD-related brain atrophy, cerebrospinal fluid (CSF) biomarkers of AD pathology, risk of conversion to incident AD, and trajectories of cognitive performance. We developed an integrated blood and brain endophenotype score that summarized the relative importance of each metabolite to severity of AD pathology and disease progression (Endophenotype Association Score in Early Alzheimer's Disease EASE-AD). Finally, we mapped the main metabolite classes emerging from our analyses to key biological pathways implicated in AD pathogenesis. We found that distinct sphingolipid species including sphingomyelin (SM) with acyl residue sums C16:0, C18:1, and C16:1 (SM C16:0, SM C18:1, SM C16:1) and hydroxysphingomyelin with acyl residue sum C14:1 (SM (OH) C14:1) were consistently associated with severity of AD pathology at autopsy and AD progression across prodromal and preclinical stages. Higher log-transformed blood concentrations of all four sphingolipids in cognitively normal individuals were significantly associated with increased risk of future conversion to incident AD: SM C16:0 (hazard ratio HR = 4.430, 95% confidence interval CI = 1.703-11.520, p = 0.002), SM C16:1 (HR = 3.455, 95% CI = 1.516-7.873, p = 0.003), SM (OH) C14:1 (HR = 3.539, 95% CI = 1.373-9.122, p = 0.009), and SM C18:1 (HR = 2.255, 95% CI = 1.047-4.855, p = 0.038). The sphingolipid species identified map to several biologically relevant pathways implicated in AD, including tau phosphorylation, amyloid-β (Aβ) metabolism, calcium homeostasis, acetylcholine biosynthesis, and apoptosis. Our study has limitations: the relatively small number of brain tissue samples may have limited our power to detect significant associations, control for heterogeneity between groups, and replicate our findings in independent, autopsy-derived brain samples.
We present a novel framework to identify biologically relevant brain and blood metabolites associated with disease pathology and progression during the prodromal and preclinical stages of AD. Our results show that perturbations in sphingolipid metabolism are consistently associated with endophenotypes across preclinical and prodromal AD, as well as with AD pathology at autopsy. Sphingolipids may be biologically relevant biomarkers for the early detection of AD, and correcting perturbations in sphingolipid metabolism may be a plausible and novel therapeutic strategy in AD.
It is unclear whether abnormalities in brain glucose homeostasis are associated with Alzheimer's disease (AD) pathogenesis.
Within the autopsy cohort of the Baltimore Longitudinal Study of Aging, we ...measured brain glucose concentration and assessed the ratios of the glycolytic amino acids, serine, glycine, and alanine to glucose. We also quantified protein levels of the neuronal (GLUT3) and astrocytic (GLUT1) glucose transporters. Finally, we assessed the relationships between plasma glucose measured before death and brain tissue glucose.
Higher brain tissue glucose concentration, reduced glycolytic flux, and lower GLUT3 are related to severity of AD pathology and the expression of AD symptoms. Longitudinal increases in fasting plasma glucose levels are associated with higher brain tissue glucose concentrations.
Impaired glucose metabolism due to reduced glycolytic flux may be intrinsic to AD pathogenesis. Abnormalities in brain glucose homeostasis may begin several years before the onset of clinical symptoms.
•Brain tissue glucose is associated with severity of Alzheimer's disease (AD) pathology and symptom onset.•Reduced brain glycolytic flux is associated with severity of AD pathology and symptom onset.•Neuronal glucose transporter-3 is lower in AD.•Lower glucose transporter-3 levels are associated with more severe AD pathology.•Increase in plasma glucose decades before death is related to higher brain glucose.
Abstract
CellMiner Cross-Database (CellMinerCDB, discover.nci.nih.gov/cellminercdb) allows integration and analysis of molecular and pharmacological data within and across cancer cell line datasets ...from the National Cancer Institute (NCI), Broad Institute, Sanger/MGH and MD Anderson Cancer Center (MDACC). We present CellMinerCDB 1.2 with updates to datasets from NCI-60, Broad Cancer Cell Line Encyclopedia and Sanger/MGH, and the addition of new datasets, including NCI-ALMANAC drug combination, MDACC Cell Line Project proteomic, NCI-SCLC DNA copy number and methylation data, and Broad methylation, genetic dependency and metabolomic datasets. CellMinerCDB (v1.2) includes several improvements over the previously published version: (i) new and updated datasets; (ii) support for pattern comparisons and multivariate analyses across data sources; (iii) updated annotations with drug mechanism of action information and biologically relevant multigene signatures; (iv) analysis speedups via caching; (v) a new dataset download feature; (vi) improved visualization of subsets of multiple tissue types; (vii) breakdown of univariate associations by tissue type; and (viii) enhanced help information. The curation and common annotations (e.g. tissues of origin and identifiers) provided here across pharmacogenomic datasets increase the utility of the individual datasets to address multiple researcher question types, including data reproducibility, biomarker discovery and multivariate analysis of drug activity.
There is growing evidence that Alzheimer disease (AD) is a pervasive metabolic disorder with dysregulation in multiple biochemical pathways underlying its pathogenesis. Understanding how ...perturbations in metabolism are related to AD is critical to identifying novel targets for disease-modifying therapies. In this study, we test whether AD pathogenesis is associated with dysregulation in brain transmethylation and polyamine pathways.
We first performed targeted and quantitative metabolomics assays using capillary electrophoresis-mass spectrometry (CE-MS) on brain samples from three groups in the Baltimore Longitudinal Study of Aging (BLSA) (AD: n = 17; Asymptomatic AD ASY: n = 13; Control CN: n = 13) (overall 37.2% female; mean age at death 86.118 ± 9.842 years) in regions both vulnerable and resistant to AD pathology. Using linear mixed-effects models within two primary brain regions (inferior temporal gyrus ITG and middle frontal gyrus MFG), we tested associations between brain tissue concentrations of 26 metabolites and the following primary outcomes: group differences, Consortium to Establish a Registry for Alzheimer's Disease (CERAD) (neuritic plaque burden), and Braak (neurofibrillary pathology) scores. We found significant alterations in concentrations of metabolites in AD relative to CN samples, as well as associations with severity of both CERAD and Braak, mainly in the ITG. These metabolites represented biochemical reactions in the (1) methionine cycle (choline: lower in AD, p = 0.003; S-adenosyl methionine: higher in AD, p = 0.005); (2) transsulfuration and glutathione synthesis (cysteine: higher in AD, p < 0.001; reduced glutathione GSH: higher in AD, p < 0.001); (3) polyamine synthesis/catabolism (spermidine: higher in AD, p = 0.004); (4) urea cycle (N-acetyl glutamate: lower in AD, p < 0.001); (5) glutamate-aspartate metabolism (N-acetyl aspartate: lower in AD, p = 0.002); and (6) neurotransmitter metabolism (gamma-amino-butyric acid: lower in AD, p < 0.001). Utilizing three Gene Expression Omnibus (GEO) datasets, we then examined mRNA expression levels of 71 genes encoding enzymes regulating key reactions within these pathways in the entorhinal cortex (ERC; AD: n = 25; CN: n = 52) and hippocampus (AD: n = 29; CN: n = 56). Complementing our metabolomics results, our transcriptomics analyses also revealed significant alterations in gene expression levels of key enzymatic regulators of biochemical reactions linked to transmethylation and polyamine metabolism. Our study has limitations: our metabolomics assays measured only a small proportion of all metabolites participating in the pathways we examined. Our study is also cross-sectional, limiting our ability to directly test how AD progression may impact changes in metabolite concentrations or differential-gene expression. Additionally, the relatively small number of brain tissue samples may have limited our power to detect alterations in all pathway-specific metabolites and their genetic regulators.
In this study, we observed broad dysregulation of transmethylation and polyamine synthesis/catabolism, including abnormalities in neurotransmitter signaling, urea cycle, aspartate-glutamate metabolism, and glutathione synthesis. Our results implicate alterations in cellular methylation potential and increased flux in the transmethylation pathways, increased demand on antioxidant defense mechanisms, perturbations in intermediate metabolism in the urea cycle and aspartate-glutamate pathways disrupting mitochondrial bioenergetics, increased polyamine biosynthesis and breakdown, as well as abnormalities in neurotransmitter metabolism that are related to AD.
High-throughput and high-content databases are increasingly important resources in molecular medicine, systems biology, and pharmacology. However, the information usually resides in unwieldy ...databases, limiting ready data analysis and integration. One resource that offers substantial potential for improvement in this regard is the NCI-60 cell line database compiled by the U.S. National Cancer Institute, which has been extensively characterized across numerous genomic and pharmacologic response platforms. In this report, we introduce a CellMiner (http://discover.nci.nih.gov/cellminer/) web application designed to improve the use of this extensive database. CellMiner tools allowed rapid data retrieval of transcripts for 22,379 genes and 360 microRNAs along with activity reports for 20,503 chemical compounds including 102 drugs approved by the U.S. Food and Drug Administration. Converting these differential levels into quantitative patterns across the NCI-60 clarified data organization and cross-comparisons using a novel pattern match tool. Data queries for potential relationships among parameters can be conducted in an iterative manner specific to user interests and expertise. Examples of the in silico discovery process afforded by CellMiner were provided for multidrug resistance analyses and doxorubicin activity; identification of colon-specific genes, microRNAs, and drugs; microRNAs related to the miR-17-92 cluster; and drug identification patterns matched to erlotinib, gefitinib, afatinib, and lapatinib. CellMiner greatly broadens applications of the extensive NCI-60 database for discovery by creating web-based processes that are rapid, flexible, and readily applied by users without bioinformatics expertise.
Advances in the high-throughput omic technologies have made it possible to profile cells in a large number of ways at the DNA, RNA, protein, chromosomal, functional, and pharmacological levels. A ...persistent problem is that some classes of molecular data are labeled with gene identifiers, others with transcript or protein identifiers, and still others with chromosomal locations. What has lagged behind is the ability to integrate the resulting data to uncover complex relationships and patterns. Those issues are reflected in full form by molecular profile data on the panel of 60 diverse human cancer cell lines (the NCI-60) used since 1990 by the U.S. National Cancer Institute to screen compounds for anticancer activity. To our knowledge, CellMiner is the first online database resource for integration of the diverse molecular types of NCI-60 and related meta data.
CellMiner enables scientists to perform advanced querying of molecular information on NCI-60 (and additional types) through a single web interface. CellMiner is a freely available tool that organizes and stores raw and normalized data that represent multiple types of molecular characterizations at the DNA, RNA, protein, and pharmacological levels. Annotations for each project, along with associated metadata on the samples and datasets, are stored in a MySQL database and linked to the molecular profile data. Data can be queried and downloaded along with comprehensive information on experimental and analytic methods for each data set. A Data Intersection tool allows selection of a list of genes (proteins) in common between two or more data sets and outputs the data for those genes (proteins) in the respective sets. In addition to its role as an integrative resource for the NCI-60, the CellMiner package also serves as a shell for incorporation of molecular profile data on other cell or tissue sample types.
CellMiner is a relational database tool for storing, querying, integrating, and downloading molecular profile data on the NCI-60 and other cancer cell types. More broadly, it provides a template to use in providing such functionality for other molecular profile data generated by academic institutions, public projects, or the private sector. CellMiner is available online at (http://discover.nci.nih.gov/cellminer/).