The increased availability and usage of modern medical imaging induced a strong need for automatic medical image segmentation. Still, current image segmentation platforms do not provide the required ...functionalities for plain setup of medical image segmentation pipelines. Already implemented pipelines are commonly standalone software, optimized on a specific public data set. Therefore, this paper introduces the open-source Python library MIScnn.
The aim of MIScnn is to provide an intuitive API allowing fast building of medical image segmentation pipelines including data I/O, preprocessing, data augmentation, patch-wise analysis, metrics, a library with state-of-the-art deep learning models and model utilization like training, prediction, as well as fully automatic evaluation (e.g. cross-validation). Similarly, high configurability and multiple open interfaces allow full pipeline customization.
Running a cross-validation with MIScnn on the Kidney Tumor Segmentation Challenge 2019 data set (multi-class semantic segmentation with 300 CT scans) resulted into a powerful predictor based on the standard 3D U-Net model.
With this experiment, we could show that the MIScnn framework enables researchers to rapidly set up a complete medical image segmentation pipeline by using just a few lines of code. The source code for MIScnn is available in the Git repository: https://github.com/frankkramer-lab/MIScnn .
In the last decade, research on artificial intelligence has seen rapid growth with deep learning models, especially in the field of medical image segmentation. Various studies demonstrated that these ...models have powerful prediction capabilities and achieved similar results as clinicians. However, recent studies revealed that the evaluation in image segmentation studies lacks reliable model performance assessment and showed statistical bias by incorrect metric implementation or usage. Thus, this work provides an overview and interpretation guide on the following metrics for medical image segmentation evaluation in binary as well as multi-class problems: Dice similarity coefficient, Jaccard, Sensitivity, Specificity, Rand index, ROC curves, Cohen's Kappa, and Hausdorff distance. Furthermore, common issues like class imbalance and statistical as well as interpretation biases in evaluation are discussed. As a summary, we propose a guideline for standardized medical image segmentation evaluation to improve evaluation quality, reproducibility, and comparability in the research field.
Networks are a common methodology used to capture increasingly complex associations between biological entities. They serve as a resource of biological knowledge for bioinformatics analyses, and also ...comprise the subsequent results. However, the interpretation of biological networks is challenging and requires suitable visualizations dependent on the contained information. The most prominent software in the field for the visualization of biological networks is Cytoscape, a desktop modeling environment also including many features for analysis. A further challenge when working with networks is their distribution. Within a typical collaborative workflow, even slight changes of the network data force one to repeat the visualization step as well. Also, just minor adjustments to the visual representation not only need the networks to be transferred back and forth. Collaboration on the same resources requires specific infrastructure to avoid redundancies, or worse, the corruption of the data. A well-established solution is provided by the NDEx platform where users can upload a network, share it with selected colleagues or make it publicly available. NDExEdit is a web-based application where simple changes can be made to biological networks within the browser, and which does not require installation. With our tool, plain networks can be enhanced easily for further usage in presentations and publications. Since the network data is only stored locally within the web browser, users can edit their private networks without concerns of unintentional publication. The web tool is designed to conform to the Cytoscape Exchange (CX) format as a data model, which is used for the data transmission by both tools, Cytoscape and NDEx. Therefore the modified network can be directly exported to the NDEx platform or saved as a compatible CX file, additionally to standard image formats like PNG and JPEG.
The coronavirus disease 2019 (COVID-19) affects billions of lives around the world and has a significant impact on public healthcare. For quantitative assessment and disease monitoring medical ...imaging like computed tomography offers great potential as alternative to RT-PCR methods. For this reason, automated image segmentation is highly desired as clinical decision support. However, publicly available COVID-19 imaging data is limited which leads to overfitting of traditional approaches.
To address this problem, we propose an innovative automated segmentation pipeline for COVID-19 infected regions, which is able to handle small datasets by utilization as variant databases. Our method focuses on on-the-fly generation of unique and random image patches for training by performing several preprocessing methods and exploiting extensive data augmentation. For further reduction of the overfitting risk, we implemented a standard 3D U-Net architecture instead of new or computational complex neural network architectures.
Through a k-fold cross-validation on 20 CT scans as training and validation of COVID-19, we were able to develop a highly accurate as well as robust segmentation model for lungs and COVID-19 infected regions without overfitting on limited data. We performed an in-detail analysis and discussion on the robustness of our pipeline through a sensitivity analysis based on the cross-validation and impact on model generalizability of applied preprocessing techniques. Our method achieved Dice similarity coefficients for COVID-19 infection between predicted and annotated segmentation from radiologists of 0.804 on validation and 0.661 on a separate testing set consisting of 100 patients.
We demonstrated that the proposed method outperforms related approaches, advances the state-of-the-art for COVID-19 segmentation and improves robust medical image analysis based on limited data.
Data mining in the field of medical data analysis often needs to rely solely on the processing of unstructured data to retrieve relevant data. For German natural language processing, few open medical ...neural named entity recognition (NER) models have been published before this work. A major issue can be attributed to the lack of German training data.
We developed a synthetic data set and a novel German medical NER model for public access to demonstrate the feasibility of our approach. In order to bypass legal restrictions due to potential data leaks through model analysis, we did not make use of internal, proprietary data sets, which is a frequent veto factor for data set publication.
The underlying German data set was retrieved by translation and word alignment of a public English data set. The data set served as a foundation for model training and evaluation. For demonstration purposes, our NER model follows a simple network architecture that is designed for low computational requirements.
The obtained data set consisted of 8599 sentences including 30,233 annotations. The model achieved a class frequency-averaged F
score of 0.82 on the test set after training across 7 different NER types. Artifacts in the synthesized data set with regard to translation and alignment induced by the proposed method were exposed. The annotation performance was evaluated on an external data set and measured in comparison with an existing baseline model that has been trained on a dedicated German data set in a traditional fashion. We discussed the drop in annotation performance on an external data set for our simple NER model. Our model is publicly available.
We demonstrated the feasibility of obtaining a data set and training a German medical NER model by the exclusive use of public training data through our suggested method. The discussion on the limitations of our approach includes ways to further mitigate remaining problems in future work.
Novel and high-performance medical image classification pipelines are heavily utilizing ensemble learning strategies. The idea of ensemble learning is to assemble diverse models or multiple ...predictions and, thus, boost prediction performance. However, it is still an open question to what extent as well as which ensemble learning strategies are beneficial in deep learning based medical image classification pipelines. In this work, we proposed a reproducible medical image classification pipeline for analyzing the performance impact of the following ensemble learning techniques: Augmenting, Stacking, and Bagging. The pipeline consists of state-of-the-art preprocessing and image augmentation methods as well as 9 deep convolution neural network architectures. It was applied on four popular medical imaging datasets with varying complexity. Furthermore, 12 pooling functions for combining multiple predictions were analyzed, ranging from simple statistical functions like unweighted averaging up to more complex learning-based functions like support vector machines. Our results revealed that Stacking achieved the largest performance gain of up to 13% F1-score increase. Augmenting showed consistent improvement capabilities by up to 4% and is also applicable to single model based pipelines. Cross-validation based Bagging demonstrated significant performance gain close to Stacking, which resulted in an F1-score increase up to +11%. Furthermore, we demonstrated that simple statistical pooling functions are equal or often even better than more complex pooling functions. We concluded that the integration of ensemble learning techniques is a powerful method for any medical image classification pipeline to improve robustness and boost performance.
Contemporary deep learning approaches show cutting-edge performance in a variety of complex prediction tasks. Nonetheless, the application of deep learning in healthcare remains limited since deep ...learning methods are often considered as non-interpretable black-box models. However, the machine learning community made recent elaborations on interpretability methods explaining data point-specific decisions of deep learning techniques. We believe that such explanations can assist the need in personalized precision medicine decisions via explaining patient-specific predictions.
Layer-wise Relevance Propagation (LRP) is a technique to explain decisions of deep learning methods. It is widely used to interpret Convolutional Neural Networks (CNNs) applied on image data. Recently, CNNs started to extend towards non-Euclidean domains like graphs. Molecular networks are commonly represented as graphs detailing interactions between molecules. Gene expression data can be assigned to the vertices of these graphs. In other words, gene expression data can be structured by utilizing molecular network information as prior knowledge. Graph-CNNs can be applied to structured gene expression data, for example, to predict metastatic events in breast cancer. Therefore, there is a need for explanations showing which part of a molecular network is relevant for predicting an event, e.g., distant metastasis in cancer, for each individual patient.
We extended the procedure of LRP to make it available for Graph-CNN and tested its applicability on a large breast cancer dataset. We present Graph Layer-wise Relevance Propagation (GLRP) as a new method to explain the decisions made by Graph-CNNs. We demonstrate a sanity check of the developed GLRP on a hand-written digits dataset and then apply the method on gene expression data. We show that GLRP provides patient-specific molecular subnetworks that largely agree with clinical knowledge and identify common as well as novel, and potentially druggable, drivers of tumor progression.
The developed method could be potentially highly useful on interpreting classification results in the context of different omics data and prior knowledge molecular networks on the individual patient level, as for example in precision medicine approaches or a molecular tumor board.
Biological pathway data integration has become a topic of interest in the past years. This interest originates essentially from the continuously increasing size of existing prior knowledge as well as ...from the many challenges scientists face when studying biological pathways. Multipath is a framework that aims at helping re-trace the use of specific pathway knowledge in specific publications, and easing the data integration of multiple pathway types and further influencing knowledge sources. Multipath thus helps scientists to increase the reproducibility of their code and analysis by allowing the integration of numerous data sources and documentation of their integration steps while doing so. In this paper, we present the package Multipath, and we describe how it can be used for data integration and tracking pathway modifications. We present a multilayer model built from the Wnt Pathway as a demonstration.
The modelling of complex biological networks such as pathways has been a necessity for scientists over the last decades. The study of these networks also imposes a need to investigate different ...aspects of nodes or edges within the networks, or other biomedical knowledge related to it. Our aim is to provide a generic modelling framework to integrate multiple pathway types and further knowledge sources influencing these networks. This framework is defined by a multi-layered model allowing automatic network transformations and documentation. By providing a tool that generates this model, we aim to facilitate the data integration, boost the reproducibility and increase the interoperability between different sources and databases in the field of pathways. We present
R package that allows the user to create, modify and visualize graphs with multi-layers. The package is implemented with features to specifically handle multilayered graphs.
The estrogen receptor-α (ERα) determines the phenotype of breast cancers where it serves as a positive prognostic indicator. ERα is a well-established target for breast cancer therapy, but strategies ...to target its function remain of interest to address therapeutic resistance and further improve treatment. Recent findings indicate that proteasome inhibition can regulate estrogen-induced transcription, but how ERα function might be regulated was uncertain. In this study, we investigated the transcriptome-wide effects of the proteasome inhibitor bortezomib on estrogen-regulated transcription in MCF7 human breast cancer cells and showed that bortezomib caused a specific global decrease in estrogen-induced gene expression. This effect was specific because gene expression induced by the glucocorticoid receptor was unaffected by bortezomib. Surprisingly, we observed no changes in ERα recruitment or assembly of its transcriptional activation complex on ERα target genes. Instead, we found that proteasome inhibition caused a global decrease in histone H2B monoubiquitination (H2Bub1), leading to transcriptional elongation defects on estrogen target genes and to decreased chromatin dynamics overall. In confirming the functional significance of this link, we showed that RNA interference-mediated knockdown of the H2B ubiquitin ligase RNF40 decreased ERα-induced gene transcription. Surprisingly, RNF40 knockdown also supported estrogen-independent cell proliferation and activation of cell survival signaling pathways. Most importantly, we found that H2Bub1 levels decrease during tumor progression. H2Bub1 was abundant in normal mammary epithelium and benign breast tumors but absent in most malignant and metastatic breast cancers. Taken together, our findings show how ERα activity is blunted by bortezomib treatment as a result of reducing the downstream ubiquitin-dependent function of H2Bub1. In supporting a tumor suppressor role for H2Bub1 in breast cancer, our findings offer a rational basis to pursue H2Bub1-based therapies for future management of breast cancer.