The evolution of chemistries and instrument platforms for next‐generation sequencing has led to sequencing of genomic variants in both tumor biopsies as well as in circulating tumor cells (CTCs) and ...cell‐free DNA liquid biopsies. The transition of these analytical platforms into clinical ones has led to challenges in product development as well as regulatory strategies for the approval of diagnostic products with these platforms. Regulatory strategies for liquid biopsy diagnostics depend on a framework that has been developed over the past few years by the US Food and Drug Administration (FDA). This framework includes both guidances that cover enrichment biomarkers and companion diagnostics, as well as regulatory approval precedents, which can be used to design regulatory strategies for new liquid biopsy diagnostic products. However, the regulatory paths for these liquid biopsy diagnostics can also be tortuous, as is the example of CTC—platform liquid biopsies. The ultimate success of regulatory pathways of liquid biopsy diagnostics has been driven by the incremental value of FDA approval for Clinical Laboratory Improvement Amendment (CLIA)‐developed tests and by the inherent complexity of these diagnostics, which are practical barriers for the widespread replication of these tests throughout CLIA laboratories. The framework for FDA approval of sequence information from these liquid biopsies has been focused on single‐site approvals of diagnostics where sequencing information is considered at different diagnostic risk levels, ranging from novel or follow‐on companion diagnostics to variant calls in genomic targets considered independently valuable for therapeutic decision making.
Abstract Biomarkers may be qualified using different qualification processes. A passive approach for qualification has been to accept the end of discussions in the scientific literature as an ...indication that a biomarker has been accepted. An active approach to qualification requires development of a comprehensive process by which a consensus may be reached about the qualification of a biomarker. Active strategies for qualification include those associated with context-independent as well as context-dependent qualifications.
Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 ...independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.
The acceptance of microarray technology in regulatory decision-making is being challenged by the existence of various platforms and data analysis methods. A recent report (E. Marshall, Science, 306, ...630-631, 2004), by extensively citing the study of Tan et al. (Nucleic Acids Res., 31, 5676-5684, 2003), portrays a disturbingly negative picture of the cross-platform comparability, and, hence, the reliability of microarray technology.
We reanalyzed Tan's dataset and found that the intra-platform consistency was low, indicating a problem in experimental procedures from which the dataset was generated. Furthermore, by using three gene selection methods (i.e., p-value ranking, fold-change ranking, and Significance Analysis of Microarrays (SAM)) on the same dataset we found that p-value ranking (the method emphasized by Tan et al.) results in much lower cross-platform concordance compared to fold-change ranking or SAM. Therefore, the low cross-platform concordance reported in Tan's study appears to be mainly due to a combination of low intra-platform consistency and a poor choice of data analysis procedures, instead of inherent technical differences among different platforms, as suggested by Tan et al. and Marshall.
Our results illustrate the importance of establishing calibrated RNA samples and reference datasets to objectively assess the performance of different microarray platforms and the proficiency of individual laboratories as well as the merits of various data analysis procedures. Thus, we are progressively coordinating the MAQC project, a community-wide effort for microarray quality control.
Over the last decade, the introduction of microarray technology has had a profound impact on gene expression research. The publication of studies with dissimilar or altogether contradictory results, ...obtained using different microarray platforms to analyze identical RNA samples, has raised concerns about the reliability of this technology. The MicroArray Quality Control (MAQC) project was initiated to address these concerns, as well as other performance and data analysis issues. Expression data on four titration pools from two distinct reference RNA samples were generated at multiple test sites using a variety of microarray-based and alternative technology platforms. Here we describe the experimental design and probe mapping efforts behind the MAQC project. We show intraplatform consistency across test sites as well as a high level of interplatform concordance in terms of genes identified as differentially expressed. This study provides a resource that represents an important first step toward establishing a framework for the use of microarrays in clinical and regulatory settings.
To validate and extend the findings of the MicroArray Quality Control (MAQC) project, a biologically relevant toxicogenomics data set was generated using 36 RNA samples from rats treated with three ...chemicals (aristolochic acid, riddelliine and comfrey) and each sample was hybridized to four microarray platforms. The MAQC project assessed concordance in intersite and cross-platform comparisons and the impact of gene selection methods on the reproducibility of profiling data in terms of differentially expressed genes using distinct reference RNA samples. The real-world toxicogenomic data set reported here showed high concordance in intersite and cross-platform comparisons. Further, gene lists generated by fold-change ranking were more reproducible than those obtained by t-test P value or Significance Analysis of Microarrays. Finally, gene lists generated by fold-change ranking with a nonstringent P-value cutoff showed increased consistency in Gene Ontology terms and pathways, and hence the biological impact of chemical exposure could be reliably deduced from all platforms analyzed.
We have evaluated the performance characteristics of three quantitative gene expression technologies and correlated their expression measurements to those of five commercial microarray platforms, ...based on the MicroArray Quality Control (MAQC) data set. The limit of detection, assay range, precision, accuracy and fold-change correlations were assessed for 997 TaqMan Gene Expression Assays, 205 Standardized RT (Sta)RT-PCR assays and 244 QuantiGene assays. TaqMan is a registered trademark of Roche Molecular Systems, Inc. We observed high correlation between quantitative gene expression values and microarray platform results and found few discordant measurements among all platforms. The main cause of variability was differences in probe sequence and thus target location. A second source of variability was the limited and variable sensitivity of the different microarray platforms for detecting weakly expressed genes, which affected interplatform and intersite reproducibility of differentially expressed genes. From this analysis, we conclude that the MAQC microarray data set has been validated by alternative quantitative gene expression platforms thus supporting the use of microarray platforms for the quantitative characterization of gene expression.
Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) ...are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists.
Using the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan - the widely regarded "standard" gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent P-value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on P-value ranking is an expected mathematical consequence of the high variability of the t-values; the more stringent the P-value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations.
We recommend the use of FC-ranking plus a non-stringent P cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the P-value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and P-value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the P criterion balances sensitivity and specificity.
The gap between development of exploratory biomarkers and their acceptance in drug development and regulatory review is a hurdle in the development of better therapies. The U.S. Food and Drug ...Administration has developed a regulatory process for biomarker qualification to accelerate the process by which new biomarkers are integrated in the development of therapies.
Heterogeneity in the underlying mechanisms of disease processes and inter-patient variability in drug responses are major challenges in drug development. To address these challenges, biomarker ...strategies based on a range of platforms, such as microarray gene-expression technologies, are increasingly being applied to elucidate these sources of variability and thereby potentially increase drug development success rates. With the aim of enhancing understanding of the regulatory significance of such biomarker data by regulators and sponsors, the US Food and Drug Administration initiated a programme in 2004 to allow sponsors to submit exploratory genomic data voluntarily, without immediate regulatory impact. In this article, a selection of case studies from the first 5 years of this programme - which is now known as the voluntary exploratory data submission programme, and also involves collaboration with the European Medicines Agency - are discussed, and general lessons are highlighted.