Gliomas belong to a group of central nervous system tumors, and consist of various sub-regions. Gold standard labeling of these sub-regions in radiographic imaging is essential for both clinical and ...computational studies, including radiomic and radiogenomic analyses. Towards this end, we release segmentation labels and radiomic features for all pre-operative multimodal magnetic resonance imaging (MRI) (n=243) of the multi-institutional glioma collections of The Cancer Genome Atlas (TCGA), publicly available in The Cancer Imaging Archive (TCIA). Pre-operative scans were identified in both glioblastoma (TCGA-GBM, n=135) and low-grade-glioma (TCGA-LGG, n=108) collections via radiological assessment. The glioma sub-region labels were produced by an automated state-of-the-art method and manually revised by an expert board-certified neuroradiologist. An extensive panel of radiomic features was extracted based on the manually-revised labels. This set of labels and features should enable i) direct utilization of the TCGA/TCIA glioma collections towards repeatable, reproducible and comparative quantitative studies leading to new predictive, prognostic, and diagnostic assessments, as well as ii) performance evaluation of computer-aided segmentation methods, and comparison to our state-of-the-art method.
Grand challenges stimulate advances within the medical imaging research community; within a competitive yet friendly environment, they allow for a direct comparison of algorithms through a ...well-defined, centralized infrastructure. The tasks of the two-part PROSTATEx Challenges (the PROSTATEx Challenge and the PROSTATEx-2 Challenge) are (1) the computerized classification of clinically significant prostate lesions and (2) the computerized determination of Gleason Grade Group in prostate cancer, both based on multiparametric magnetic resonance images. The challenges incorporate well-vetted cases for training and testing, a centralized performance assessment process to evaluate results, and an established infrastructure for case dissemination, communication, and result submission. In the PROSTATEx Challenge, 32 groups apply their computerized methods (71 methods total) to 208 prostate lesions in the test set. The area under the receiver operating characteristic curve for these methods in the task of differentiating between lesions that are and are not clinically significant ranged from 0.45 to 0.87; statistically significant differences in performance among the top-performing methods, however, are not observed. In the PROSTATEx-2 Challenge, 21 groups apply their computerized methods (43 methods total) to 70 prostate lesions in the test set. When compared with the reference standard, the quadratic-weighted kappa values for these methods in the task of assigning a five-point Gleason Grade Group to each lesion range from
to 0.27; superiority to random guessing can be established for only two methods. When approached with a sense of commitment and scientific rigor, challenges foster interest in the designated task and encourage innovation in the field.
Individual MRI characteristics (e.g., volume) are routinely used to identify survival-associated phenotypes for glioblastoma (GBM). This study investigated whether combinations of MRI features can ...also stratify survival. Furthermore, the molecular differences between phenotype-induced groups were investigated.
Ninety-two patients with imaging, molecular, and survival data from the TCGA (The Cancer Genome Atlas)-GBM collection were included in this study. For combinatorial phenotype analysis, hierarchical clustering was used. Groups were defined based on a cutpoint obtained via tree-based partitioning. Furthermore, differential expression analysis of microRNA (miRNA) and mRNA expression data was performed using GenePattern Suite. Functional analysis of the resulting genes and miRNAs was performed using Ingenuity Pathway Analysis. Pathway analysis was performed using Gene Set Enrichment Analysis.
Clustering analysis reveals that image-based grouping of the patients is driven by 3 features: volume-class, hemorrhage, and T1/FLAIR-envelope ratio. A combination of these features stratifies survival in a statistically significant manner. A cutpoint analysis yields a significant survival difference in the training set (median survival difference: 12 months, p = 0.004) as well as a validation set (p = 0.0001). Specifically, a low value for any of these 3 features indicates favorable survival characteristics. Differential expression analysis between cutpoint-induced groups suggests that several immune-associated (natural killer cell activity, T-cell lymphocyte differentiation) and metabolism-associated (mitochondrial activity, oxidative phosphorylation) pathways underlie the transition of this phenotype. Integrating data for mRNA and miRNA suggests the roles of several genes regulating proliferation and invasion.
A 3-way combination of MRI phenotypes may be capable of stratifying survival in GBM. Examination of molecular processes associated with groups created by this combinatorial phenotype suggests the role of biological processes associated with growth and invasion characteristics.
Online public repositories for sharing research data allow investigators to validate existing research or perform secondary research without the expense of collecting new data. Patient data made ...publicly available through such repositories may constitute a breach of personally identifiable information if not properly de-identified. Imaging data are especially at risk because some intricacies of the Digital Imaging and Communications in Medicine (DICOM) format are not widely understood by researchers. If imaging data still containing protected health information (PHI) were released through a public repository, a number of different parties could be held liable, including the original researcher who collected and submitted the data, the original researcher's institution, and the organization managing the repository. To minimize these risks through proper de-identification of image data, one must understand what PHI exists and where that PHI resides, and one must have the tools to remove PHI without compromising the scientific integrity of the data. DICOM public elements are defined by the DICOM Standard. Modality vendors use private elements to encode acquisition parameters that are not yet defined by the DICOM Standard, or the vendor may not have updated an existing software product after DICOM defined new public elements. Because private elements are not standardized, a common de-identification practice is to delete all private elements, removing scientifically useful data as well as PHI. Researchers and publishers of imaging data can use the tools and process described in this article to de-identify DICOM images according to current best practices.
The purpose of this work is to describe the LUNGx Challenge for the computerized classification of lung nodules on diagnostic computed tomography (CT) scans as benign or malignant and report the ...performance of participants' computerized methods along with that of six radiologists who participated in an observer study performing the same Challenge task on the same dataset. The Challenge provided sets of calibration and testing scans, established a performance assessment process, and created an infrastructure for case dissemination and result submission. Ten groups applied their own methods to 73 lung nodules (37 benign and 36 malignant) that were selected to achieve approximate size matching between the two cohorts. Area under the receiver operating characteristic curve (AUC) values for these methods ranged from 0.50 to 0.68; only three methods performed statistically better than random guessing. The radiologists' AUC values ranged from 0.70 to 0.85; three radiologists performed statistically better than the best-performing computer method. The LUNGx Challenge compared the performance of computerized methods in the task of differentiating benign from malignant lung nodules on CT scans, placed in the context of the performance of radiologists on the same task. The continued public availability of the Challenge cases will provide a valuable resource for the medical imaging research community.
The COVID-19 pandemic was accompanied by an unprecedented surveillance effort. The resulting data were and will continue to be critical for surveillance and control of SARS-CoV-2. However, some ...genomic surveillance methods experienced challenges as the virus evolved, resulting in incomplete and poor quality data. Complete and quality coverage, especially of the S-gene, is important for supporting the selection of vaccine candidates. As such, we developed a robust method to target the S-gene for amplification and sequencing. By focusing on the S-gene and imposing strict coverage and quality metrics, we hope to increase the quality of surveillance data for this continually evolving gene. Our technique is currently being deployed globally to partner laboratories, and public health representatives from 79 countries have received hands-on training and support. Expanding access to quality surveillance methods will undoubtedly lead to earlier detection of novel variants and better inform vaccine strain selection.
Data sharing creates potential cost savings, supports data aggregation, and facilitates reproducibility to ensure quality research; however, data from heterogeneous systems require retrospective ...harmonization. This is a major hurdle for researchers who seek to leverage existing data. Efforts focused on strategies for data interoperability largely center around the use of standards but ignore the problems of competing standards and the value of existing data. Interoperability remains reliant on retrospective harmonization. Approaches to reduce this burden are needed.
The Cancer Imaging Archive (TCIA) is an example of an imaging repository that accepts data from a diversity of sources. It contains medical images from investigators worldwide and substantial nonimage data. Digital Imaging and Communications in Medicine (DICOM) standards enable querying across images, but TCIA does not enforce other standards for describing nonimage supporting data, such as treatment details and patient outcomes. In this study, we used 9 TCIA lung and brain nonimage files containing 659 fields to explore retrospective harmonization for cross-study query and aggregation. It took 329.5 hours, or 2.3 months, extended over 6 months to identify 41 overlapping fields in 3 or more files and transform 31 of them. We used the Genomic Data Commons (GDC) data elements as the target standards for harmonization.
We characterized the issues and have developed recommendations for reducing the burden of retrospective harmonization. Once we harmonized the data, we also developed a Web tool to easily explore harmonized collections.
While prospective use of standards can support interoperability, there are issues that complicate this goal. Our work recognizes and reveals retrospective harmonization issues when trying to reuse existing data and recommends national infrastructure to address these issues.
Data sharing is increasingly recognized as critical to cross-disciplinary research and to assuring scientific validity. Despite National Institutes of Health and National Science Foundation policies ...encouraging data sharing by grantees, little data sharing of clinical data has in fact occurred. A principal reason often given is the potential of inadvertent violation of the Health Insurance Portability and Accountability Act privacy regulations. While regulations specify the components of private health information that should be protected, there are no commonly accepted methods to de-identify clinical data objects such as images. This leads institutions to take conservative risk-averse positions on data sharing. In imaging trials, where images are coded according to the Digital Imaging and Communications in Medicine (DICOM) standard, the complexity of the data objects and the flexibility of the DICOM standard have made it especially difficult to meet privacy protection objectives. The recent release of DICOM Supplement 142 on image de-identification has removed much of this impediment. This article describes the development of an open-source software suite that implements DICOM Supplement 142 as part of the National Biomedical Imaging Archive (NBIA). It also describes the lessons learned by the authors as NBIA has acquired more than 20 image collections encompassing over 30 million images.