We have implemented the lecular esign aboratory's nti icrobial eptides package ( ), a Python-based software package for the design, classification and visual representation of peptide data. modlAMP ...offers functions for molecular descriptor calculation and the retrieval of amino acid sequences from public or local sequence databases, and provides instant access to precompiled datasets for machine learning. The package also contains methods for the analysis and representation of circular dichroism spectra.
The modlAMP Python package is available under the BSD license from URL http://doi.org/10.5905/ethz-1007-72 or via pip from the Python Package Index (PyPI).
gisbert.schneider@pharma.ethz.ch.
Supplementary data are available at Bioinformatics online.
Membranolytic anticancer peptides represent a potential strategy in the fight against cancer. However, our understanding of the underlying structure-activity relationships and the mechanisms driving ...their cell selectivity is still limited. We developed a computational approach as a step towards the rational design of potent and selective anticancer peptides. This machine learning model distinguishes between peptides with and without anticancer activity. This classifier was experimentally validated by synthesizing and testing a selection of 12 computationally generated peptides. In total, 83% of these predictions were correct. We then utilized an evolutionary molecular design algorithm to improve the peptide selectivity for cancer cells. This simulated molecular evolution process led to a five-fold selectivity increase with regard to human dermal microvascular endothelial cells and more than ten-fold improvement towards human erythrocytes. The results of the present study advocate for the applicability of machine learning models and evolutionary algorithms to design and optimize novel synthetic anticancer peptides with reduced hemolytic liability and increased cell-type selectivity.
Double negative (DN) (CD19
CD20
CD27
IgD
) B cells are expanded in patients with autoimmune and infectious diseases; however their role in the humoral immune response remains unclear. Using ...systematic flow cytometric analyses of peripheral blood B cell subsets, we observed an inflated DN B cell population in patients with variety of active inflammatory conditions: myasthenia gravis, Guillain-Barré syndrome, neuromyelitis optica spectrum disorder, meningitis/encephalitis, and rheumatic disorders. Furthermore, we were able to induce DN B cells in healthy subjects following vaccination against influenza and tick borne encephalitis virus. Transcriptome analysis revealed a gene expression profile in DN B cells that clustered with naïve B cells, memory B cells, and plasmablasts. Immunoglobulin VH transcriptome sequencing and analysis of recombinant antibodies revealed clonal expansion of DN B cells that were targeted against the vaccine antigen. Our study suggests that DN B cells are expanded in multiple inflammatory neurologic diseases and represent an inducible B cell population that responds to antigenic stimulation, possibly through an extra-follicular maturation pathway.
As technical developments in omics and biomedical imaging increase the throughput of data generation in life sciences, the need for information systems capable of managing heterogeneous digital ...assets is increasing. In particular, systems supporting the findability, accessibility, interoperability, and reusability (FAIR) principles of scientific data management.
We propose a Service Oriented Architecture approach for integrated management and analysis of multi-omics and biomedical imaging data. Our architecture introduces an image management system into a FAIR-supporting, web-based platform for omics data management. Interoperable metadata models and middleware components implement the required data management operations. The resulting architecture allows for FAIR management of omics and imaging data, facilitating metadata queries from software applications. The applicability of the proposed architecture is demonstrated using two technical proofs of concept and a use case, aimed at molecular plant biology and clinical liver cancer research, which integrate various imaging and omics modalities.
We describe a data management architecture for integrated, FAIR-supporting management of omics and biomedical imaging data, and exemplify its applicability for basic biology research and clinical studies. We anticipate that FAIR data management systems for multi-modal data repositories will play a pivotal role in data-driven research, including studies which leverage advanced machine learning methods, as the joint analysis of omics and imaging data, in conjunction with phenotypic metadata, becomes not only desirable but necessary to derive novel insights into biological processes.
Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is a valuable experimental tool to study the immune state in health and following immune challenges such as infectious diseases, (auto)immune ...diseases, and cancer. Several tools have been developed to reconstruct B cell and T cell receptor sequences from AIRR-seq data and infer B and T cell clonal relationships. However, currently available tools offer limited parallelization across samples, scalability or portability to high-performance computing infrastructures. To address this need, we developed nf-core/airrflow, an end-to-end bulk and single-cell AIRR-seq processing workflow which integrates the Immcantation Framework following BCR and TCR sequencing data analysis best practices. The Immcantation Framework is a comprehensive toolset, which allows the processing of bulk and single-cell AIRR-seq data from raw read processing to clonal inference. nf-core/airrflow is written in Nextflow and is part of the nf-core project, which collects community contributed and curated Nextflow workflows for a wide variety of analysis tasks. We assessed the performance of nf-core/airrflow on simulated sequencing data with sequencing errors and show example results with real datasets. To demonstrate the applicability of nf-core/airrflow to the high-throughput processing of large AIRR-seq datasets, we validated and extended previously reported findings of convergent antibody responses to SARS-CoV-2 by analyzing 97 COVID-19 infected individuals and 99 healthy controls, including a mixture of bulk and single-cell sequencing datasets. Using this dataset, we extended the convergence findings to 20 additional subjects, highlighting the applicability of nf-core/airrflow to validate findings in small in-house cohorts with reanalysis of large publicly available AIRR datasets.
Immune checkpoint inhibitors (ICIs) belong to the therapeutic armamentarium in advanced hepatocellular carcinoma (HCC). However, only a minority of patients benefit from immunotherapy. Therefore, we ...aimed to identify indicators of therapy response. This multicenter analysis included 99 HCC patients. Progression-free (PFS) and overall survival (OS) were studied by Kaplan-Meier analyses for clinical parameters using weighted log-rank testing. Next-generation sequencing (NGS) was performed in a subset of 15 patients. The objective response (OR) rate was 19% median OS (mOS)16.7 months. Forty-one percent reached a PFS > 6 months; these patients had a significantly longer mOS (32.0 vs. 8.5 months). Child-Pugh (CP) A and B patients showed a mOS of 22.1 and 12.1 months, respectively. Ten of thirty CP-B patients reached PFS > 6 months, including 3 patients with an OR. Tumor mutational burden (TMB) could not predict responders. Of note, antibiotic treatment within 30 days around ICI initiation was associated with significantly shorter mOS (8.5 vs. 17.4 months). Taken together, this study shows favorable outcomes for OS with low AFP, OR, and PFS > 6 months. No specific genetic pattern, including TMB, could identify responders. Antibiotics around treatment initiation were associated with worse outcome, suggesting an influence of the host microbiome on therapy success.
B cells are acknowledged as crucial players in the pathogenesis of multiple sclerosis (MS). Several disease modifying drugs including cladribine have been shown to exert differential effects on ...peripheral blood B cell subsets. However, little is known regarding functional changes within the peripheral B cell populations. In this study, we obtained a detailed picture of B cell repertoire changes under cladribine treatment on a combined immunoglobulin (Ig) transcriptome and proteome level.
We performed next-generation sequencing of Ig heavy chain (IGH) transcripts and Ig mass spectrometry in cladribine-treated patients with relapsing-remitting multiple sclerosis (n = 8) at baseline and after 6 and 12 months of treatment in order to generate Ig transcriptome and Ig peptide libraries. Ig peptides were overlapped with the corresponding IGH transcriptome in order to analyze B cell clones on a combined transcriptome and proteome level.
The analysis of peripheral blood B cell percentages pointed towards a significant decrease of memory B cells and an increase of naive B cells following cladribine therapy. While basic IGH repertoire parameters (e.g. variable heavy chain family usage and Ig subclasses) were only slightly affected by cladribine treatment, a significantly decreased number of clones and significantly lower diversity in the memory subset was noticeable at 6 months following treatment which was sustained at 12 months. When looking at B-cell clones comprising sequences from the different time-points, clones spanning between all three time-points were significantly more frequent than clones including sequences from two time-points. Furthermore, Ig proteome analyses showed that Ig transcriptome specific peptides could mostly be equally aligned to all three time-points pointing towards a proportion of B-cell clones that are maintained during treatment.
Our findings suggest that peripheral B cell related treatment effects of cladribine tablets might be exerted through a reduction of possibly disease relevant clones in the memory B cell subset without disrupting the overall clonal composition of B cells. Our results -at least partially- might explain the relatively mild side effects regarding infections and the sustained immune response after vaccinations during treatment. However, exact disease driving B cell subsets and their effects remain unknown and should be addressed in future studies.
During the last few decades, the role of B cells has been well established and redefined in neuro-inflammatory diseases, including multiple sclerosis and autoantibody-associated diseases. In ...particular, B cell maturation and trafficking across the blood–brain barrier (BBB) has recently been deciphered with the development of next-generation sequencing (NGS) approaches, which allow the assessment of representative cerebrospinal fluid (CSF) and peripheral blood B cell repertoires. In this review, we perform literature research focusing on NGS studies that allow further insights into B cell pathophysiology during neuro-inflammation. Besides the analysis of CSF B cells, the paralleled assessment of peripheral blood B cell repertoire provides deep insights into not only the CSF compartment, but also in B cell trafficking patterns across the BBB. In multiple sclerosis, CSF-specific B cell maturation, in combination with a bidirectional exchange of B cells across the BBB, is consistently detectable. These data suggest that B cells most likely encounter antigen(s) within the CSF and migrate across the BBB, with further maturation also taking place in the periphery. Autoantibody-mediated diseases, such as neuromyelitis optica spectrum disorder and LGI1 / NMDAR encephalitis, also show features of a CSF-specific B cell maturation and clonal connectivity with peripheral blood. In conclusion, these data suggest an intense exchange of B cells across the BBB, possibly feeding autoimmune circuits. Further developments in sequencing technologies will help to dissect the exact pathophysiologic mechanisms of B cells during neuro-inflammation.
Abstract
Motivation
Machine learning has shown extensive growth in recent years and is now routinely applied to sensitive areas. To allow appropriate verification of predictive models before ...deployment, models must be deterministic. Solely fixing all random seeds is not sufficient for deterministic machine learning, as major machine learning libraries default to the usage of nondeterministic algorithms based on atomic operations.
Results
Various machine learning libraries released deterministic counterparts to the nondeterministic algorithms. We evaluated the effect of these algorithms on determinism and runtime. Based on these results, we formulated a set of requirements for deterministic machine learning and developed a new software solution, the mlf-core ecosystem, which aids machine learning projects to meet and keep these requirements. We applied mlf-core to develop deterministic models in various biomedical fields including a single-cell autoencoder with TensorFlow, a PyTorch-based U-Net model for liver-tumor segmentation in computed tomography scans, and a liver cancer classifier based on gene expression profiles with XGBoost.
Availability and implementation
The complete data together with the implementations of the mlf-core ecosystem and use case models are available at https://github.com/mlf-core.
Abstract
Data analysis tools are continuously changed and improved over time. In order to test how these changes influence the comparability between analyses, the output of different workflow options ...of the nf-core/rnaseq pipeline were compared. Five different pipeline settings (STAR+Salmon, STAR+RSEM, STAR+featureCounts, HISAT2+featureCounts, pseudoaligner Salmon) were run on three datasets (human, Arabidopsis, zebrafish) containing spike-ins of the External RNA Control Consortium (ERCC). Fold change ratios and differential expression of genes and spike-ins were used for comparative analyses of the different tools and versions settings of the pipeline. An overlap of 85% for differential gene classification between pipelines could be shown. Genes interpreted with a bias were mostly those present at lower concentration. Also, the number of isoforms and exons per gene were determinants. Previous pipeline versions using featureCounts showed a higher sensitivity to detect one-isoform genes like ERCC. To ensure data comparability in long-term analysis series it would be recommendable to either stay with the pipeline version the series was initialized with or to run both versions during a transition time in order to ensure that the target genes are addressed the same way.