We have searched the literature for information on the risk of breast cancer (BC) in relation to gender, breast development, and gonadal function in the following 8 populations: 1) females with the ...Turner syndrome (45, XO); 2) females and males with congenital hypogonadotropic hypogonadism and the Kallmann syndrome; 3) pure gonadal dysgenesis (PGD) in genotypic and phenotypic females and genotypic males (Swyer syndrome); 4) males with the Klinefelter syndrome (47, XXY); 5) male-to-female transgender individuals; 6) female-to-male transgender individuals; 7) genotypic males, but phenotypic females with the complete androgen insensitivity syndrome, and 8) females with Mayer-Rokitansky-Küster-Hauser (MRKH) syndrome (müllerian agenesis). Based on this search, we have drawn 3 major conclusions. First, the presence of a Y chromosome protects against the development of BC, even when female-size breasts and female-level estrogens are present. Second, without menstrual cycles, BC hardly occurs with an incidence comparable to males. There is a strong correlation between the lifetime number of menstrual cycles and the risk of BC. In our populations the BC risk in genetic females not exposed to progesterone (P4) is very low and comparable to males. Third, BC has been reported only once in genetic females with MRKH syndrome who have normal breasts and ovulating ovaries with normal levels of estrogens and P4. We hypothesize that the oncogenic glycoprotein WNT family member 4 is the link between the genetic cause of MRKH and the absence of BC women with MRKH syndrome.
Class prediction models have been shown to have varying performances in clinical gene expression datasets. Previous evaluation studies, mostly done in the field of cancer, showed that the accuracy of ...class prediction models differs from dataset to dataset and depends on the type of classification function. While a substantial amount of information is known about the characteristics of classification functions, little has been done to determine which characteristics of gene expression data have impact on the performance of a classifier. This study aims to empirically identify data characteristics that affect the predictive accuracy of classification models, outside of the field of cancer.
Datasets from twenty five studies meeting predefined inclusion and exclusion criteria were downloaded. Nine classification functions were chosen, falling within the categories: discriminant analyses or Bayes classifiers, tree based, regularization and shrinkage and nearest neighbors methods. Consequently, nine class prediction models were built for each dataset using the same procedure and their performances were evaluated by calculating their accuracies. The characteristics of each experiment were recorded, (i.e., observed disease, medical question, tissue/cell types and sample size) together with characteristics of the gene expression data, namely the number of differentially expressed genes, the fold changes and the within-class correlations. Their effects on the accuracy of a class prediction model were statistically assessed by random effects logistic regression. The number of differentially expressed genes and the average fold change had significant impact on the accuracy of a classification model and gave individual explained-variation in prediction accuracy of up to 72% and 57%, respectively. Multivariable random effects logistic regression with forward selection yielded the two aforementioned study factors and the within class correlation as factors affecting the accuracy of classification functions, explaining 91.5% of the between study variation.
We evaluated study- and data-related factors that might explain the varying performances of classification functions in non-cancerous datasets. Our results showed that the number of differentially expressed genes, the fold change, and the correlation in gene expression data significantly affect the accuracy of class prediction models.
Hybrid designs with both randomized arms and an external control cohort preserve key features of randomization and utilize external information to augment clinical trials. In this study, we propose ...to leverage high‐quality, patient‐level concurrent registries to enhance clinical trials and illustrate the impact on trial design for amyotrophic lateral sclerosis. The proposed methodology was evaluated in a randomized, placebo‐controlled clinical trial. We used patient‐level information from a well‐defined, population‐based registry, that was running parallel to the randomized clinical trial, to identify concurrently nonparticipating, eligible patients who could be matched with trial participants, and integrate them into the statistical analysis. We assessed the impact of the addition of the external controls on the treatment effect estimate, precision, and time to reach a conclusion. During the runtime of the trial, a total of 1,141 registry patients were alive; 473 (41.5%) of them fulfilled the eligibility criteria and 133 (11.7%) were enrolled in the study. A matched control population could be identified among the nonparticipating patients. Augmenting the randomized controls with matched external controls could have avoided unnecessary randomization of 17 patients (−12.8%) as well as reducing the study duration from 30.1 months to 22.6 months (−25.0%). Matching eligible external controls from a different calendar period led to bias in the treatment effect estimate. Hybrid trial designs utilizing a concurrent registry with rigorous matching can minimize bias due to a mismatch in calendar time and differences in standard of care, and may accelerate the development of new treatments.
Class predicting with gene expression is widely used to generate diagnostic and/or prognostic models. The literature reveals that classification functions perform differently across gene expression ...datasets. The question, which classification function should be used for a given dataset remains to be answered. In this study, a predictive model for choosing an optimal function for class prediction on a given dataset was devised.
To achieve this, gene expression data were simulated for different values of gene-pairs correlations, sample size, genes' variances, deferentially expressed genes and fold changes. For each simulated dataset, ten classifiers were built and evaluated using ten classification functions. The resulting accuracies from 1152 different simulation scenarios by ten classification functions were then modeled using a linear mixed effects regression on the studied data characteristics, yielding a model that predicts the accuracy of the functions on a given data. An application of our model on eight real-life datasets showed positive correlations (0.33-0.82) between the predicted and expected accuracies.
The here presented predictive model might serve as a guide to choose an optimal classification function among the 10 studied functions, for any given gene expression data.
The R source code for the analysis and an R-package 'SPreFuGED' are available at Bioinformatics online.
v.l.jong@umcutecht.nl
Supplementary data are available at Bioinformatics online.
Highlights • We determined associations between patient and study-related factors and inclusion. • Inclusion was associated with travel time and the number of placed advertorials. • Age and male ...gender were associated with study inclusion. • Introduction letters prior to invitation were not associated with study inclusion.
Although data from electronic health records (EHR) are often used for research purposes, systematic validation of these data prior to their use is not standard practice. Existing validation ...frameworks discuss validity concepts without translating these into practical implementation steps or addressing the potential influence of linking multiple sources. Therefore we developed a practical approach for validating routinely collected data from multiple sources and to apply it to a blood transfusion data warehouse to evaluate the usability in practice.
The approach consists of identifying existing validation frameworks for EHR data or linked data, selecting validity concepts from these frameworks and establishing quantifiable validity outcomes for each concept. The approach distinguishes external validation concepts (e.g. concordance with external reports, previous literature and expert feedback) and internal consistency concepts which use expected associations within the dataset itself (e.g. completeness, uniformity and plausibility). In an example case, the selected concepts were applied to a transfusion dataset and specified in more detail.
Application of the approach to a transfusion dataset resulted in a structured overview of data validity aspects. This allowed improvement of these aspects through further processing of the data and in some cases adjustment of the data extraction. For example, the proportion of transfused products that could not be linked to the corresponding issued products initially was 2.2% but could be improved by adjusting data extraction criteria to 0.17%.
This stepwise approach for validating linked multisource data provides a basis for evaluating data quality and enhancing interpretation. When the process of data validation is adopted more broadly, this contributes to increased transparency and greater reliability of research based on routinely collected electronic health records.
Aggregating gene expression data across experiments via meta-analysis is expected to increase the precision of the effect estimates and to increase the statistical power to detect a certain fold ...change. This study evaluates the potential benefit of using a meta-analysis approach as a gene selection method prior to predictive modeling in gene expression data.
Six raw datasets from different gene expression experiments in acute myeloid leukemia (AML) and 11 different classification methods were used to build classification models to classify samples as either AML or healthy control. First, the classification models were trained on gene expression data from single experiments using conventional supervised variable selection and externally validated with the other five gene expression datasets (referred to as the individual-classification approach). Next, gene selection was performed through meta-analysis on four datasets, and predictive models were trained with the selected genes on the fifth dataset and validated on the sixth dataset. For some datasets, gene selection through meta-analysis helped classification models to achieve higher performance as compared to predictive modeling based on a single dataset; but for others, there was no major improvement. Synthetic datasets were generated from nine simulation scenarios. The effect of sample size, fold change and pairwise correlation between differentially expressed (DE) genes on the difference between MA- and individual-classification model was evaluated. The fold change and pairwise correlation significantly contributed to the difference in performance between the two methods. The gene selection via meta-analysis approach was more effective when it was conducted using a set of data with low fold change and high pairwise correlation on the DE genes.
Gene selection through meta-analysis on previously published studies potentially improves the performance of a predictive model on a given gene expression data.
Abstract Estimators for the variance between treatment effects from randomized clinical trials (RCTs) in a meta-analysis may yield divergent or even contradictory results. In a sequential ...meta-analysis (SMA), their properties are even more important, as they influence the point in time at which definite conclusions are drawn. In this study, we evaluated the properties of estimators of heterogeneity to be used in an SMA. We conducted an extensive simulation study with dichotomous and continuous outcome data and applied the estimators in real life examples. Bias and variance of the estimators were used as primary evaluation criteria, as well as the number of RCTs and patients from the accumulating trials needed to get stable estimates. The simulation studies showed that the well-known DerSimonian–Laird (DL) estimator largely underestimates the true value for dichotomous outcomes. The two-step DL (DL2) significantly improves this behavior. In general, the DL2 and Paule–Mandel (PM) estimators are recommended for both dichotomous and continuous outcome data for use in an SMA.
TRICALS: creating a highway toward a cure van Eijk, Ruben P.A.; Kliest, Tessa; McDermott, Christopher J. ...
Amyotrophic lateral sclerosis and frontotemporal degeneration,
10/2020, Letnik:
21, Številka:
7-8
Journal Article
Recenzirano
Odprti dostop
A change in our current approach toward drug development is required to improve the likelihood of finding effective treatment for patients with amyotrophic lateral sclerosis (ALS). The aim of the ...Treatment Research Initiative to Cure ALS (TRICALS) is to extend the collective effort with industry and consolidate drug development paths. TRICALS has begun a series of meetings on how to best move the field forward collaboratively, thereby addressing five major topics in ALS clinical trials: (1) preclinical research, (2) biomarker development, (3) eligibility criteria, (4) efficacy endpoints and (5) innovative trial design. There is an appetite for ongoing discussions of these major topics in clinical trials between representatives from academia, patient advocacy groups, industry partners and funding bodies. Industry is open to fundamentally change drug development for ALS and shorten the time to effective therapy for patients by implementing promising innovations in biomarker development, trial design, and patient selection. There is however, a pressing need from all stakeholders for regulatory discussions and amendments of current guidelines to successfully adopt innovation in future clinical development lines.