Abstract
Major histocompatibility complex (MHC) molecules are expressed on the cell surface, where they present peptides to T cells, which gives them a key role in the development of T-cell immune ...responses. MHC molecules come in two main variants: MHC Class I (MHC-I) and MHC Class II (MHC-II). MHC-I predominantly present peptides derived from intracellular proteins, whereas MHC-II predominantly presents peptides from extracellular proteins. In both cases, the binding between MHC and antigenic peptides is the most selective step in the antigen presentation pathway. Therefore, the prediction of peptide binding to MHC is a powerful utility to predict the possible specificity of a T-cell immune response. Commonly MHC binding prediction tools are trained on binding affinity or mass spectrometry-eluted ligands. Recent studies have however demonstrated how the integration of both data types can boost predictive performances. Inspired by this, we here present NetMHCpan-4.1 and NetMHCIIpan-4.0, two web servers created to predict binding between peptides and MHC-I and MHC-II, respectively. Both methods exploit tailored machine learning strategies to integrate different training data types, resulting in state-of-the-art performance and outperforming their competitors. The servers are available at http://www.cbs.dtu.dk/services/NetMHCpan-4.1/ and http://www.cbs.dtu.dk/services/NetMHCIIpan-4.0/.
Antibodies have become an indispensable tool for many biotechnological and clinical applications. They bind their molecular target (antigen) by recognizing a portion of its structure (epitope) in a ...highly specific manner. The ability to predict epitopes from antigen sequences alone is a complex task. Despite substantial effort, limited advancement has been achieved over the last decade in the accuracy of epitope prediction methods, especially for those that rely on the sequence of the antigen only. Here, we present BepiPred-2.0 (http://www.cbs.dtu.dk/services/BepiPred/), a web server for predicting B-cell epitopes from antigen sequences. BepiPred-2.0 is based on a random forest algorithm trained on epitopes annotated from antibody-antigen protein structures. This new method was found to outperform other available tools for sequence-based epitope prediction both on epitope data derived from solved 3D structures, and on a large collection of linear epitopes downloaded from the IEDB database. The method displays results in a user-friendly and informative way, both for computer-savvy and non-expert users. We believe that BepiPred-2.0 will be a valuable tool for the bioinformatics and immunology community.
Biological interpretation of gene/protein lists resulting from -omics experiments can be a complex task. A common approach consists of reviewing Gene Ontology (GO) annotations for entries in such ...lists and searching for enrichment patterns. Unfortunately, there is a gap between machine-readable output of GO software and its human-interpretable form. This gap can be bridged by allowing users to simultaneously visualize and interact with term-term and gene-term relationships.
We created the open-source GOnet web-application (available at http://tools.dice-database.org/GOnet/ ), which takes a list of gene or protein entries from human or mouse data and performs GO term annotation analysis (mapping of provided entries to GO subsets) or GO term enrichment analysis (scanning for GO categories overrepresented in the input list). The application is capable of producing parsable data formats and importantly, interactive visualizations of the GO analysis results. The interactive results allow exploration of genes and GO terms as a graph that depicts the natural hierarchy of the terms and retains relationships between terms and genes/proteins. As a result, GOnet provides insight into the functional interconnection of the submitted entries.
The application can be used for GO analysis of any biological data sources resulting in gene/protein lists. It can be helpful for experimentalists as well as computational biologists working on biological interpretation of -omics data resulting in such lists.
Effective countermeasures against the recent emergence and rapid expansion of the 2019 novel coronavirus (SARS-CoV-2) require the development of data and tools to understand and monitor its spread ...and immune responses to it. However, little information is available about the targets of immune responses to SARS-CoV-2. We used the Immune Epitope Database and Analysis Resource (IEDB) to catalog available data related to other coronaviruses. This includes SARS-CoV, which has high sequence similarity to SARS-CoV-2 and is the best-characterized coronavirus in terms of epitope responses. We identified multiple specific regions in SARS-CoV-2 that have high homology to the SARS-CoV virus. Parallel bioinformatic predictions identified a priori potential B and T cell epitopes for SARS-CoV-2. The independent identification of the same regions using two approaches reflects the high probability that these regions are promising targets for immune recognition of SARS-CoV-2. These predictions can facilitate effective vaccine design against this virus of high priority.
Display omitted
•Ten experimentally defined regions within SARS-CoV have high homology with SARS-CoV-2•Parallel bioinformatics predicted potential B and T cell epitopes for SARS-CoV-2•Independent approaches identified the same immunodominant regions•The conserved immune regions have implications for vaccine design against multiple CoVs
Grifoni et al. identify potential targets for immune responses to the 2019 novel coronavirus (SARS-CoV-2) by sequence homology with closely related SARS-CoV and by a priori epitope prediction using bioinformatics approaches. This analysis provides essential information for understanding human immune responses to this virus and for evaluating diagnostic and vaccine candidates.
Cytotoxic T cells are of central importance in the immune system's response to disease. They recognize defective cells by binding to peptides presented on the cell surface by MHC class I molecules. ...Peptide binding to MHC molecules is the single most selective step in the Ag-presentation pathway. Therefore, in the quest for T cell epitopes, the prediction of peptide binding to MHC molecules has attracted widespread attention. In the past, predictors of peptide-MHC interactions have primarily been trained on binding affinity data. Recently, an increasing number of MHC-presented peptides identified by mass spectrometry have been reported containing information about peptide-processing steps in the presentation pathway and the length distribution of naturally presented peptides. In this article, we present NetMHCpan-4.0, a method trained on binding affinity and eluted ligand data leveraging the information from both data types. Large-scale benchmarking of the method demonstrates an increase in predictive performance compared with state-of-the-art methods when it comes to identification of naturally processed ligands, cancer neoantigens, and T cell epitopes.
Major histocompatibility complex II (MHC II) molecules play a vital role in the onset and control of cellular immunity. In a highly selective process, MHC II presents peptides derived from exogenous ...antigens on the surface of antigen-presenting cells for T cell scrutiny. Understanding the rules defining this presentation holds critical insights into the regulation and potential manipulation of the cellular immune system. Here, we apply the NNAlign_MA machine learning framework to analyze and integrate large-scale eluted MHC II ligand mass spectrometry (MS) data sets to advance prediction of CD4+ epitopes. NNAlign_MA allows integration of mixed data types, handling ligands with multiple potential allele annotations, encoding of ligand context, leveraging information between data sets, and has pan-specific power allowing accurate predictions outside the set of molecules included in the training data. Applying this framework, we identified accurate binding motifs of more than 50 MHC class II molecules described by MS data, particularly expanding coverage for DP and DQ beyond that obtained using current MS motif deconvolution techniques. Furthermore, in large-scale benchmarking, the final model termed NetMHCIIpan-4.0 demonstrated improved performance beyond current state-of-the-art predictors for ligand and CD4+ T cell epitope prediction. These results suggest that NNAlign_MA and NetMHCIIpan-4.0 are powerful tools for analysis of immunopeptidome MS data, prediction of T cell epitopes, and development of personalized immunotherapies.
Reliable prediction of antibody, or B-cell, epitopes remains challenging yet highly desirable for the design of vaccines and immunodiagnostics. A correlation between antigenicity, solvent ...accessibility, and flexibility in proteins was demonstrated. Subsequently, Thornton and colleagues proposed a method for identifying continuous epitopes in the protein regions protruding from the protein's globular surface. The aim of this work was to implement that method as a web-tool and evaluate its performance on discontinuous epitopes known from the structures of antibody-protein complexes.
Here we present ElliPro, a web-tool that implements Thornton's method and, together with a residue clustering algorithm, the MODELLER program and the Jmol viewer, allows the prediction and visualization of antibody epitopes in a given protein sequence or structure. ElliPro has been tested on a benchmark dataset of discontinuous epitopes inferred from 3D structures of antibody-protein complexes. In comparison with six other structure-based methods that can be used for epitope prediction, ElliPro performed the best and gave an AUC value of 0.732, when the most significant prediction was considered for each protein. Since the rank of the best prediction was at most in the top three for more than 70% of proteins and never exceeded five, ElliPro is considered a useful research tool for identifying antibody epitopes in protein antigens. ElliPro is available at http://tools.immuneepitope.org/tools/ElliPro.
The results from ElliPro suggest that further research on antibody epitopes considering more features that discriminate epitopes from non-epitopes may further improve predictions. As ElliPro is based on the geometrical properties of protein structure and does not require training, it might be more generally applied for predicting different types of protein-protein interactions.
Understanding immune memory to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is critical for improving diagnostics and vaccines and for assessing the likely future course of the ...COVID-19 pandemic. We analyzed multiple compartments of circulating immune memory to SARS-CoV-2 in 254 samples from 188 COVID-19 cases, including 43 samples at ≥6 months after infection. Immunoglobulin G (IgG) to the spike protein was relatively stable over 6+ months. Spike-specific memory B cells were more abundant at 6 months than at 1 month after symptom onset. SARS-CoV-2-specific CD4
T cells and CD8
T cells declined with a half-life of 3 to 5 months. By studying antibody, memory B cell, CD4
T cell, and CD8
T cell memory to SARS-CoV-2 in an integrated manner, we observed that each component of SARS-CoV-2 immune memory exhibited distinct kinetics.
The task of epitope discovery and vaccine design is increasingly reliant on bioinformatics analytic tools and access to depositories of curated data relevant to immune reactions and specific ...pathogens. The Immune Epitope Database and Analysis Resource (IEDB) was indeed created to assist biomedical researchers in the development of new vaccines, diagnostics, and therapeutics. The Analysis Resource is freely available to all researchers and provides access to a variety of epitope analysis and prediction tools. The tools include validated and benchmarked methods to predict MHC class I and class II binding. The predictions from these tools can be combined with tools predicting antigen processing, TCR recognition, and B cell epitope prediction. In addition, the resource contains a variety of secondary analysis tools that allow the researcher to calculate epitope conservation, population coverage, and other relevant analytic variables. The researcher involved in vaccine design and epitope discovery will also be interested in accessing experimental published data, relevant to the specific indication of interest. The database component of the IEDB contains a vast amount of experimentally derived epitope data that can be queried through a flexible user interface. The IEDB is linked to other pathogen-specific and immunological database resources.
While many genetic variants have been associated with risk for human diseases, how these variants affect gene expression in various cell types remains largely unknown. To address this gap, the DICE ...(database of immune cell expression, expression quantitative trait loci eQTLs, and epigenomics) project was established. Considering all human immune cell types and conditions studied, we identified cis-eQTLs for a total of 12,254 unique genes, which represent 61% of all protein-coding genes expressed in these cell types. Strikingly, a large fraction (41%) of these genes showed a strong cis-association with genotype only in a single cell type. We also found that biological sex is associated with major differences in immune cell gene expression in a highly cell-specific manner. These datasets will help reveal the effects of disease risk-associated genetic polymorphisms on specific immune cell types, providing mechanistic insights into how they might influence pathogenesis (https://dice-database.org).
Display omitted
•Cis-eQTLs for 12,254 unique genes were identified in 13 human immune cell types•41% of eGenes showed strong cis-association with genotype in a single cell type•GWAS variants were linked to cell types where their effects are most pronounced•Biological sex was associated with major differences in immune cell gene expression
Surveying gene expression and SNP genotypes across immune cell types from healthy humans reveals cis-eQTLs affecting over half of all expressed genes and demonstrates that variant effects often manifest in cell types other than those with highest gene expression.