Purpose To identify the molecular basis of quantitative imaging characteristics of tumor-adjacent parenchyma at dynamic contrast material-enhanced magnetic resonance (MR) imaging and to evaluate ...their prognostic value in breast cancer. Materials and Methods In this institutional review board-approved, HIPAA-compliant study, 10 quantitative imaging features depicting tumor-adjacent parenchymal enhancement patterns were extracted and screened for prognostic features in a discovery cohort of 60 patients. By using data from The Cancer Genome Atlas (TCGA), a radiogenomic map for the tumor-adjacent parenchymal tissue was created and molecular pathways associated with prognostic parenchymal imaging features were identified. Furthermore, a multigene signature of the parenchymal imaging feature was built in a training cohort (n = 126), and its prognostic relevance was evaluated in two independent cohorts (n = 879 and 159). Results One image feature measuring heterogeneity (ie, information measure of correlation) was significantly associated with prognosis (false-discovery rate < 0.1), and at a cutoff of 0.57 stratified patients into two groups with different recurrence-free survival rates (log-rank P = .024). The tumor necrosis factor signaling pathway was identified as the top enriched pathway (hypergeometric P < .0001) among genes associated with the image feature. A 73-gene signature based on the tumor profiles in TCGA achieved good association with the tumor-adjacent parenchymal image feature (R
= 0.873), which stratified patients into groups regarding recurrence-free survival (log-rank P = .029) and overall survival (log-rank P = .042) in an independent TCGA cohort. The prognostic value was confirmed in another independent cohort (Gene Expression Omnibus GSE 1456), with log-rank P = .00058 for recurrence-free survival and log-rank P = .0026 for overall survival. Conclusion Heterogeneous enhancement patterns of tumor-adjacent parenchyma at MR imaging are associated with the tumor necrosis signaling pathway and poor survival in breast cancer.
RSNA, 2017 Online supplemental material is available for this article.
•Proposed two distinct deep learning models – (i) CNN Word – Glove, and (ii) Domain phrase attention-based hierarchical neural network (DPA-HNN), for synthesizing information on pulmonary emboli (PE) ...from clinical thoracic CT free-text radiology reports.•Visualization methods have been developed to identify the impact of input words on the output decision for both deep learning models.•Models are trained only on Stanford dataset (2512 reports) and are tested on four major healthcare centers dataset – Stanford (1000 reports), Duke (1000 reports), Colorado Children (1000 reports), and University of Pittsburg medical center (858 reports).•Comparative effectiveness of the deep learning models is judged against the current state-of-the-art – PEFinder as well as with traditional machine learning models – SVM and Adaboost with bag-of-words features.•This work proposed interesting experimental insight on the proficiency of CNN and RNN to automatize the analysis of unstructured imaging reports.
This paper explores cutting-edge deep learning methods for information extraction from medical imaging free text reports at a multi-institutional scale and compares them to the state-of-the-art domain-specific rule-based system – PEFinder and traditional machine learning methods – SVM and Adaboost. We proposed two distinct deep learning models – (i) CNN Word – Glove, and (ii) Domain phrase attention-based hierarchical recurrent neural network (DPA-HNN), for synthesizing information on pulmonary emboli (PE) from over 7370 clinical thoracic computed tomography (CT) free-text radiology reports collected from four major healthcare centers. Our proposed DPA-HNN model encodes domain-dependent phrases into an attention mechanism and represents a radiology report through a hierarchical RNN structure composed of word-level, sentence-level and document-level representations. Experimental results suggest that the performance of the deep learning models that are trained on a single institutional dataset, are better than rule-based PEFinder on our multi-institutional test sets. The best F1 score for the presence of PE in an adult patient population was 0.99 (DPA-HNN) and for a pediatrics population was 0.99 (HNN) which shows that the deep learning models being trained on adult data, demonstrated generalizability to pediatrics population with comparable accuracy. Our work suggests feasibility of broader usage of neural network models in automated classification of multi-institutional imaging text reports for a variety of applications including evaluation of imaging utilization, imaging yield, clinical decision support tools, and as part of automated classification of large corpus for medical imaging deep learning work.
Audit logs in electronic health record (EHR) systems capture interactions of providers with clinical data. We determine if machine learning (ML) models trained using audit logs in conjunction with ...clinical data ("observational supervision") outperform ML models trained using clinical data alone in clinical outcome prediction tasks, and whether they are more robust to temporal distribution shifts in the data.
Using clinical and audit log data from Stanford Healthcare, we trained and evaluated various ML models including logistic regression, support vector machine (SVM) classifiers, neural networks, random forests, and gradient boosted machines (GBMs) on clinical EHR data, with and without audit logs for two clinical outcome prediction tasks: major adverse kidney events within 120 days of ICU admission (MAKE-120) in acute kidney injury (AKI) patients and 30-day readmission in acute stroke patients. We further tested the best performing models using patient data acquired during different time-intervals to evaluate the impact of temporal distribution shifts on model performance.
Performance generally improved for all models when trained with clinical EHR data and audit log data compared with those trained with only clinical EHR data, with GBMs tending to have the overall best performance. GBMs trained with clinical EHR data and audit logs outperformed GBMs trained without audit logs in both clinical outcome prediction tasks: AUROC 0.88 (95% CI: 0.85-0.91) vs. 0.79 (95% CI: 0.77-0.81), respectively, for MAKE-120 prediction in AKI patients, and AUROC 0.74 (95% CI: 0.71-0.77) vs. 0.63 (95% CI: 0.62-0.64), respectively, for 30-day readmission prediction in acute stroke patients. The performance of GBM models trained using audit log and clinical data degraded less in later time-intervals than models trained using only clinical data.
Observational supervision with audit logs improved the performance of ML models trained to predict important clinical outcomes in patients with AKI and acute stroke, and improved robustness to temporal distribution shifts.
The aim of this study was to develop an open-source, modular, locally run or server-based system for 3D radiomics feature computation that can be used on any computer system and included in existing ...workflows for understanding associations and building predictive models between image features and clinical data, such as survival. The QIFE exploits various levels of parallelization for use on multiprocessor systems. It consists of a managing framework and four stages: input, pre-processing, feature computation, and output. Each stage contains one or more swappable components, allowing run-time customization. We benchmarked the engine using various levels of parallelization on a cohort of CT scans presenting 108 lung tumors. Two versions of the QIFE have been released: (1) the open-source MATLAB code posted to Github, (2) a compiled version loaded in a Docker container, posted to DockerHub, which can be easily deployed on any computer. The QIFE processed 108 objects (tumors) in 2:12 (h/mm) using 1 core, and 1:04 (h/mm) hours using four cores with object-level parallelization. We developed the Quantitative Image Feature Engine (QIFE), an open-source feature-extraction framework that focuses on modularity, standards, parallelism, provenance, and integration. Researchers can easily integrate it with their existing segmentation and imaging workflows by creating input and output components that implement their existing interfaces. Computational efficiency can be improved by parallelizing execution at the cost of memory usage. Different parallelization levels provide different trade-offs, and the optimal setting will depend on the size and composition of the dataset to be processed.
Display omitted
•Unsupervised method combining neural embedding with semantic dictionary mapping.•Creates a dense vector representation of unstructured radiology reports.•Semi-automate report ...categorization based on diagnosis of pulmonary embolism (PE).•Lowest generalization error with highest F1 scores.•Outperformed state-of-the-art rule-based system – PEFinder.•Can be extended to a different domain with minimal human effort.
We proposed an unsupervised hybrid method – Intelligent Word Embedding (IWE) that combines neural embedding method with a semantic dictionary mapping technique for creating a dense vector representation of unstructured radiology reports. We applied IWE to generate embedding of chest CT radiology reports from two healthcare organizations and utilized the vector representations to semi-automate report categorization based on clinically relevant categorization related to the diagnosis of pulmonary embolism (PE). We benchmark the performance against a state-of-the-art rule-based tool, PeFinder and out-of-the-box word2vec. On the Stanford test set, the IWE model achieved average F1 score 0.97, whereas the PeFinder scored 0.9 and the original word2vec scored 0.94. On UPMC dataset, the IWE model’s average F1 score was 0.94, when the PeFinder scored 0.92 and word2vec scored 0.85. The IWE model had lowest generalization error with highest F1 scores. Of particular interest, the IWE model (trained on the Stanford dataset) outperformed PeFinder on the UPMC dataset which was used originally to tailor the PeFinder model.
Technologies leveraging big data, including predictive algorithms and machine learning, are playing an increasingly important role in the delivery of healthcare. However, evidence indicates that such ...algorithms have the potential to worsen disparities currently intrinsic to the contemporary healthcare system, including racial biases. Blame for these deficiencies has often been placed on the algorithm-but the underlying training data bears greater responsibility for these errors, as biased outputs are inexorably produced by biased inputs. The utility, equity, and generalizability of predictive models depend on population-representative training data with robust feature sets. So while the conventional paradigm of big data is deductive in nature-clinical decision support-a future model harnesses the potential of big data for inductive reasoning. This may be conceptualized as clinical decision questioning, intended to liberate the human predictive process from preconceived lenses in data solicitation and/or interpretation. Efficacy, representativeness and generalizability are all heightened in this schema. Thus, the possible risks of biased big data arising from the inputs themselves must be acknowledged and addressed. Awareness of data deficiencies, structures for data inclusiveness, strategies for data sanitation, and mechanisms for data correction can help realize the potential of big data for a personalized medicine era. Applied deliberately, these considerations could help mitigate risks of perpetuation of health inequity amidst widespread adoption of novel applications of big data.
Display omitted
•We describe an automatic drusen segmentation method for SD-OCT retinal images.•We developed a projection method to generate an en face retinal image from SD-OCT images.•Experimental ...results demonstrate the effectivity of our method.•The qualitative and quantitative drusen evaluation may be clinically useful.
Spectral domain optical coherence tomography (SD-OCT) is a useful tool for the visualization of drusen, a retinal abnormality seen in patients with age-related macular degeneration (AMD); however, objective assessment of drusen is thwarted by the lack of a method to robustly quantify these lesions on serial OCT images. Here, we describe an automatic drusen segmentation method for SD-OCT retinal images, which leverages a priori knowledge of normal retinal morphology and anatomical features. The highly reflective and locally connected pixels located below the retinal nerve fiber layer (RNFL) are used to generate a segmentation of the retinal pigment epithelium (RPE) layer. The observed and expected contours of the RPE layer are obtained by interpolating and fitting the shape of the segmented RPE layer, respectively. The areas located between the interpolated and fitted RPE shapes (which have nonzero area when drusen occurs) are marked as drusen. To enhance drusen quantification, we also developed a novel method of retinal projection to generate an en face retinal image based on the RPE extraction, which improves the quality of drusen visualization over the current approach to producing retinal projections from SD-OCT images based on a summed-voxel projection (SVP), and it provides a means of obtaining quantitative features of drusen in the en face projection. Visualization of the segmented drusen is refined through several post-processing steps, drusen detection to eliminate false positive detections on consecutive slices, drusen refinement on a projection view of drusen, and drusen smoothing. Experimental evaluation results demonstrate that our method is effective for drusen segmentation. In a preliminary analysis of the potential clinical utility of our methods, quantitative drusen measurements, such as area and volume, can be correlated with the drusen progression in non-exudative AMD, suggesting that our approach may produce useful quantitative imaging biomarkers to follow this disease and predict patient outcome.
Age-related macular degeneration (AMD) is the leading cause of blindness among elderly individuals. Geographic atrophy (GA) is a phenotypic manifestation of the advanced stages of non-exudative AMD. ...Determination of GA extent in SD-OCT scans allows the quantification of GA-related features, such as radius or area, which could be of important value to monitor AMD progression and possibly identify regions of future GA involvement. The purpose of this work is to develop an automated algorithm to segment GA regions in SD-OCT images. An en face GA fundus image is generated by averaging the axial intensity within an automatically detected sub-volume of the three dimensional SD-OCT data, where an initial coarse GA region is estimated by an iterative threshold segmentation method and an intensity profile set, and subsequently refined by a region-based Chan-Vese model with a local similarity factor. Two image data sets, consisting on 55 SD-OCT scans from twelve eyes in eight patients with GA and 56 SD-OCT scans from 56 eyes in 56 patients with GA, respectively, were utilized to quantitatively evaluate the automated segmentation algorithm. We compared results obtained by the proposed algorithm, manual segmentation by graders, a previously proposed method, and experimental commercial software. When compared to a manually determined gold standard, our algorithm presented a mean overlap ratio (OR) of 81.86% and 70% for the first and second data sets, respectively, while the previously proposed method OR was 72.60% and 65.88% for the first and second data sets, respectively, and the experimental commercial software OR was 62.40% for the second data set.
Despite the relative ease of locating organs in the human body, automated organ segmentation has been hindered by the scarcity of labeled training data. Due to the tedium of labeling organ ...boundaries, most datasets are limited to either a small number of cases or a single organ. Furthermore, many are restricted to specific imaging conditions unrepresentative of clinical practice. To address this need, we developed a diverse dataset of 140 CT scans containing six organ classes: liver, lungs, bladder, kidney, bones and brain. For the lungs and bones, we expedited annotation using unsupervised morphological segmentation algorithms, which were accelerated by 3D Fourier transforms. Demonstrating the utility of the data, we trained a deep neural network which requires only 4.3 s to simultaneously segment all the organs in a case. We also show how to efficiently augment the data to improve model generalization, providing a GPU library for doing so. We hope this dataset and code, available through TCIA, will be useful for training and evaluating organ segmentation models.
Radiological measurements are reported in free text reports, and it is challenging to extract such measures for treatment planning such as lesion summarization and cancer response assessment. The ...purpose of this work is to develop and evaluate a natural language processing (NLP) pipeline that can extract measurements and their core descriptors, such as temporality, anatomical entity, imaging observation, RadLex descriptors, series number, image number, and segment from a wide variety of radiology reports (MR, CT, and mammogram). We created a hybrid NLP pipeline that integrates rule-based feature extraction modules and conditional random field (CRF) model for extraction of the measurements from the radiology reports and links them with clinically relevant features such as anatomical entities or imaging observations. The pipeline was trained on 1117 CT/MR reports, and performance of the system was evaluated on an independent set of 100 expert-annotated CT/MR reports and also tested on 25 mammography reports. The system detected 813 out of 806 measurements in the CT/MR reports; 784 were true positives, 29 were false positives, and 0 were false negatives. Similarly, from the mammography reports, 96% of the measurements with their modifiers were extracted correctly. Our approach could enable the development of computerized applications that can utilize summarized lesion measurements from radiology report of varying modalities and improve practice by tracking the same lesions along multiple radiologic encounters.