Transfer learning (TL) with convolutional neural networks aims to improve performances on a new task by leveraging the knowledge of similar tasks learned in advance. It has made a major contribution ...to medical image analysis as it overcomes the data scarcity problem as well as it saves time and hardware resources. However, transfer learning has been arbitrarily configured in the majority of studies. This review paper attempts to provide guidance for selecting a model and TL approaches for the medical image classification task.
425 peer-reviewed articles were retrieved from two databases, PubMed and Web of Science, published in English, up until December 31, 2020. Articles were assessed by two independent reviewers, with the aid of a third reviewer in the case of discrepancies. We followed the PRISMA guidelines for the paper selection and 121 studies were regarded as eligible for the scope of this review. We investigated articles focused on selecting backbone models and TL approaches including feature extractor, feature extractor hybrid, fine-tuning and fine-tuning from scratch.
The majority of studies (n = 57) empirically evaluated multiple models followed by deep models (n = 33) and shallow (n = 24) models. Inception, one of the deep models, was the most employed in literature (n = 26). With respect to the TL, the majority of studies (n = 46) empirically benchmarked multiple approaches to identify the optimal configuration. The rest of the studies applied only a single approach for which feature extractor (n = 38) and fine-tuning from scratch (n = 27) were the two most favored approaches. Only a few studies applied feature extractor hybrid (n = 7) and fine-tuning (n = 3) with pretrained models.
The investigated studies demonstrated the efficacy of transfer learning despite the data scarcity. We encourage data scientists and practitioners to use deep models (e.g. ResNet or Inception) as feature extractors, which can save computational costs and time without degrading the predictive power.
In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data into research data repositories ...for secondary use. Data management practices are of importance throughout these processes, and special attention should be given to provenance aspects. Insufficient knowledge can lead to validity risks and reduce the confidence and quality of the processed data. The need to implement maintainable data management practices is undisputed, but there is a great lack of clarity on the status. Our study examines the current data management practices throughout the data life cycle within the Medical Informatics in Research and Care in University Medicine (MIRACUM) consortium. We present a framework for the maturity status of data management practices and present recommendations to enable a trustful dissemination and reuse of routine health care data. In this mixed methods study, we conducted semistructured interviews with stakeholders from 10 DICs between July and September 2021. We used a self-designed questionnaire that we tailored to the MIRACUM DICs, to collect qualitative and quantitative data. Our study method is compliant with the Good Reporting of a Mixed Methods Study (GRAMMS) checklist. Our study provides insights into the data management practices at the MIRACUM DICs. We identify several traceability issues that can be partially explained with a lack of contextual information within nonharmonized workflow steps, unclear responsibilities, missing or incomplete data elements, and incomplete information about the computational environment information. Based on the identified shortcomings, we suggest a data management maturity framework to reach more clarity and to help define enhanced data management strategies. The data management maturity framework supports the production and dissemination of accurate and provenance-enriched data for secondary use. Our work serves as a catalyst for the derivation of an overarching data management strategy, abiding data integrity and provenance characteristics as key factors. We envision that this work will lead to the generation of fairer and maintained health research data of high quality.
The usefulness of the 3D Portable Document Format (PDF) for clinical, educational, and research purposes has recently been shown. However, the lack of a simple tool for converting biomedical data ...into the model data in the necessary Universal 3D (U3D) file format is a drawback for the broad acceptance of this new technology. A new module for the image processing and rapid prototyping framework MeVisLab does not only provide a platform-independent possibility to create surface meshes out of biomedical/DICOM and other data and to export them into U3D--it also lets the user add meta data to these meshes to predefine colors and names that can be processed by a PDF authoring software while generating 3D PDF files. Furthermore, the source code of the respective module is available and well documented so that it can easily be modified for own purposes.
Data from the electronic medical record comprise numerous structured but uncoded elements, which are not linked to standard terminologies. Reuse of such data for secondary research purposes has ...gained in importance recently. However, the identification of relevant data elements and the creation of database jobs for extraction, transformation and loading (ETL) are challenging: With current methods such as data warehousing, it is not feasible to efficiently maintain and reuse semantically complex data extraction and trans-formation routines. We present an ontology-supported approach to overcome this challenge by making use of abstraction: Instead of defining ETL procedures at the database level, we use ontologies to organize and describe the medical concepts of both the source system and the target system. Instead of using unique, specifically developed SQL statements or ETL jobs, we define declarative transformation rules within ontologies and illustrate how these constructs can then be used to automatically generate SQL code to perform the desired ETL procedures. This demonstrates how a suitable level of abstraction may not only aid the interpretation of clinical data, but can also foster the reutilization of methods for un-locking it.
Visual analysis and data delivery in the form of visualizations are of great importance in health care, as such forms of presentation can reduce errors and improve care and can also help provide new ...insights into long-term disease progression. Information visualization and visual analytics also address the complexity of long-term, time-oriented patient data by reducing inherent complexity and facilitating a focus on underlying and hidden patterns. This review aims to provide an overview of visualization techniques for time-oriented data in health care, supporting the comparison of patients. We systematically collected literature and report on the visualization techniques supporting the comparison of time-based data sets of single patients with those of multiple patients or their cohorts and summarized the use of these techniques. This scoping review used the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) checklist. After all collected articles were screened by 16 reviewers according to the criteria, 6 reviewers extracted the set of variables under investigation. The characteristics of these variables were based on existing taxonomies or identified through open coding. Of the 249 screened articles, we identified 22 (8.8%) that fit all criteria and reviewed them in depth. We collected and synthesized findings from these articles for medical aspects such as medical context, medical objective, and medical data type, as well as for the core investigated aspects of visualization techniques, interaction techniques, and supported tasks. The extracted articles were published between 2003 and 2019 and were mostly situated in clinical research. These systems used a wide range of visualization techniques, most frequently showing changes over time. Timelines and temporal line charts occurred 8 times each, followed by histograms with 7 occurrences and scatterplots with 5 occurrences. We report on the findings quantitatively through visual summarization, as well as qualitatively. The articles under review in general mitigated complexity through visualization and supported diverse medical objectives. We identified 3 distinct patient entities: single patients, multiple patients, and cohorts. Cohorts were typically visualized in condensed form, either through prior data aggregation or through visual summarization, whereas visualization of individual patients often contained finer details. All the systems provided mechanisms for viewing and comparing patient data. However, explicitly comparing a single patient with multiple patients or a cohort was supported only by a few systems. These systems mainly use basic visualization techniques, with some using novel visualizations tailored to a specific task. Overall, we found the visual comparison of measurements between single and multiple patients or cohorts to be underdeveloped, and we argue for further research in a systematic review, as well as the usefulness of a design space.
Secondary investigations into digital health records, including electronic patient data from German medical data integration centers (DICs), pave the way for enhanced future patient care. However, ...only limited information is captured regarding the integrity, traceability, and quality of the (sensitive) data elements. This lack of detail diminishes trust in the validity of the collected data. From a technical standpoint, adhering to the widely accepted FAIR (Findability, Accessibility, Interoperability, and Reusability) principles for data stewardship necessitates enriching data with provenance-related metadata. Provenance offers insights into the readiness for the reuse of a data element and serves as a supplier of data governance.
The primary goal of this study is to augment the reusability of clinical routine data within a medical DIC for secondary utilization in clinical research. Our aim is to establish provenance traces that underpin the status of data integrity, reliability, and consequently, trust in electronic health records, thereby enhancing the accountability of the medical DIC. We present the implementation of a proof-of-concept provenance library integrating international standards as an initial step.
We adhered to a customized road map for a provenance framework, and examined the data integration steps across the ETL (extract, transform, and load) phases. Following a maturity model, we derived requirements for a provenance library. Using this research approach, we formulated a provenance model with associated metadata and implemented a proof-of-concept provenance class. Furthermore, we seamlessly incorporated the internationally recognized Word Wide Web Consortium (W3C) provenance standard, aligned the resultant provenance records with the interoperable health care standard Fast Healthcare Interoperability Resources, and presented them in various representation formats. Ultimately, we conducted a thorough assessment of provenance trace measurements.
This study marks the inaugural implementation of integrated provenance traces at the data element level within a German medical DIC. We devised and executed a practical method that synergizes the robustness of quality- and health standard-guided (meta)data management practices. Our measurements indicate commendable pipeline execution times, attaining notable levels of accuracy and reliability in processing clinical routine data, thereby ensuring accountability in the medical DIC. These findings should inspire the development of additional tools aimed at providing evidence-based and reliable electronic health record services for secondary use.
The research method outlined for the proof-of-concept provenance class has been crafted to promote effective and reliable core data management practices. It aims to enhance biomedical data by imbuing it with meaningful provenance, thereby bolstering the benefits for both research and society. Additionally, it facilitates the streamlined reuse of biomedical data. As a result, the system mitigates risks, as data analysis without knowledge of the origin and quality of all data elements is rendered futile. While the approach was initially developed for the medical DIC use case, these principles can be universally applied throughout the scientific domain.
The purpose of this feasibility study is to investigate if latent diffusion models (LDMs) are capable to generate contrast enhanced (CE) MRI-derived subtraction maximum intensity projections (MIPs) ...of the breast, which are conditioned by lesions. We trained an LDM with n = 2832 CE-MIPs of breast MRI examinations of n = 1966 patients (median age: 50 years) acquired between the years 2015 and 2020. The LDM was subsequently conditioned with n = 756 segmented lesions from n = 407 examinations, indicating their location and BI-RADS scores. By applying the LDM, synthetic images were generated from the segmentations of an independent validation dataset. Lesions, anatomical correctness, and realistic impression of synthetic and real MIP images were further assessed in a multi-rater study with five independent raters, each evaluating n = 204 MIPs (50% real/50% synthetic images). The detection of synthetic MIPs by the raters was akin to random guessing with an AUC of 0.58. Interrater reliability of the lesion assessment was high both for real (Kendall's W = 0.77) and synthetic images (W = 0.85). A higher AUC was observed for the detection of suspicious lesions (BI-RADS
4) in synthetic MIPs (0.88 vs. 0.77; p = 0.051). Our results show that LDMs can generate lesion-conditioned MRI-derived CE subtraction MIPs of the breast, however, they also indicate that the LDM tended to generate rather typical or 'textbook representations' of lesions.
The secondary use of electronic health records (EHRs) promises to facilitate medical research. We reviewed general data requirements in observational studies and analyzed the feasibility of ...conducting observational studies with structured EHR data, in particular diagnosis and procedure codes.
After reviewing published observational studies from the University Hospital of Erlangen for general data requirements, we identified three different study populations for the feasibility analysis with eligibility criteria from three exemplary observational studies. For each study population, we evaluated the availability of relevant patient characteristics in our EHR, including outcome and exposure variables. To assess data quality, we computed distributions of relevant patient characteristics from the available structured EHR data and compared them to those of the original studies. We implemented computed phenotypes for patient characteristics where necessary. In random samples, we evaluated how well structured patient characteristics agreed with a gold standard from manually interpreted free texts. We categorized our findings using the four data quality dimensions "completeness", "correctness", "currency" and "granularity".
Reviewing general data requirements, we found that some investigators supplement routine data with questionnaires, interviews and follow-up examinations. We included 847 subjects in the feasibility analysis (Study 1 n = 411, Study 2 n = 423, Study 3 n = 13). All eligibility criteria from two studies were available in structured data, while one study required computed phenotypes in eligibility criteria. In one study, we found that all necessary patient characteristics were documented at least once in either structured or unstructured data. In another study, all exposure and outcome variables were available in structured data, while in the other one unstructured data had to be consulted. The comparison of patient characteristics distributions, as computed from structured data, with those from the original study yielded similar distributions as well as indications of underreporting. We observed violations in all four data quality dimensions.
While we found relevant patient characteristics available in structured EHR data, data quality problems may entail that it remains a case-by-case decision whether diagnosis and procedure codes are sufficient to underpin observational studies. Free-text data or subsequently supplementary study data may be important to complement a comprehensive patient history.
Prehabilitation has shown its potential for most intra-cavity surgery patients on enhancing preoperative functional capacity and postoperative outcomes. However, its large-scale implementation is ...limited by several constrictions, such as: i) unsolved practicalities of the service workflow, ii) challenges associated to change management in collaborative care; iii) insufficient access to prehabilitation; iv) relevant percentage of program drop-outs; v) need for program personalization; and, vi) economical sustainability. Transferability of prehabilitation programs from the hospital setting to the community would potentially provide a new scenario with greater accessibility, as well as offer an opportunity to effectively address the aforementioned issues and, thus, optimize healthcare value generation. A core aspect to take into account for an optimal management of prehabilitation programs is to use proper technological tools enabling: i) customizable and interoperable integrated care pathways facilitating personalization of the service and effective engagement among stakeholders; ii) remote monitoring (i.e. physical activity, physiological signs and patient-reported outcomes and experience measures) to support patient adherence to the program and empowerment for self-management; and, iii) use of health risk assessment supporting decision making for personalized service selection. The current manuscript details a proposal to bring digital innovation to community-based prehabilitation programs. Moreover, this approach has the potential to be adopted by programs supporting long-term management of cancer patients, chronic patients and prevention of multimorbidity in subjects at risk.
Data capture is one of the most expensive phases during the conduct of a clinical trial and the increasing use of electronic health records (EHR) offers significant savings to clinical research. To ...facilitate these secondary uses of routinely collected patient data, it is beneficial to know what data elements are captured in clinical trials. Therefore our aim here is to determine the most commonly used data elements in clinical trials and their availability in hospital EHR systems.
Case report forms for 23 clinical trials in differing disease areas were analyzed. Through an iterative and consensus-based process of medical informatics professionals from academia and trial experts from the European pharmaceutical industry, data elements were compiled for all disease areas and with special focus on the reporting of adverse events. Afterwards, data elements were identified and statistics acquired from hospital sites providing data to the EHR4CR project.
The analysis identified 133 unique data elements. Fifty elements were congruent with a published data inventory for patient recruitment and 83 new elements were identified for clinical trial execution, including adverse event reporting. Demographic and laboratory elements lead the list of available elements in hospitals EHR systems. For the reporting of serious adverse events only very few elements could be identified in the patient records.
Common data elements in clinical trials have been identified and their availability in hospital systems elucidated. Several elements, often those related to reimbursement, are frequently available whereas more specialized elements are ranked at the bottom of the data inventory list. Hospitals that want to obtain the benefits of reusing data for research from their EHR are now able to prioritize their efforts based on this common data element list.