UNI-MB - logo
UMNIK - logo
 
E-viri
Celotno besedilo
Recenzirano Odprti dostop
  • Precision Neoantigen Discov...
    Pyke, Rachel Marty; Mellacheruvu, Datta; Dea, Steven; Abbott, Charles; Zhang, Simo V.; Phillips, Nick A.; Harris, Jason; Bartha, Gabor; Desai, Sejal; McClory, Rena; West, John; Snyder, Michael P.; Chen, Richard; Boyle, Sean Michael

    Molecular & cellular proteomics, 04/2023, Letnik: 22, Številka: 4
    Journal Article

    Major histocompatibility complex (MHC)–bound peptides that originate from tumor-specific genetic alterations, known as neoantigens, are an important class of anticancer therapeutic targets. Accurately predicting peptide presentation by MHC complexes is a key aspect of discovering therapeutically relevant neoantigens. Technological improvements in mass spectrometry–based immunopeptidomics and advanced modeling techniques have vastly improved MHC presentation prediction over the past 2 decades. However, improvement in the accuracy of prediction algorithms is needed for clinical applications like the development of personalized cancer vaccines, the discovery of biomarkers for response to immunotherapies, and the quantification of autoimmune risk in gene therapies. Toward this end, we generated allele-specific immunopeptidomics data using 25 monoallelic cell lines and created Systematic Human Leukocyte Antigen (HLA) Epitope Ranking Pan Algorithm (SHERPA), a pan-allelic MHC-peptide algorithm for predicting MHC-peptide binding and presentation. In contrast to previously published large-scale monoallelic data, we used an HLA-null K562 parental cell line and a stable transfection of HLA allele to better emulate native presentation. Our dataset includes five previously unprofiled alleles that expand MHC diversity in the training data and extend allelic coverage in underprofiled populations. To improve generalizability, SHERPA systematically integrates 128 monoallelic and 384 multiallelic samples with publicly available immunoproteomics data and binding assay data. Using this dataset, we developed two features that empirically estimate the propensities of genes and specific regions within gene bodies to engender immunopeptides to represent antigen processing. Using a composite model constructed with gradient boosting decision trees, multiallelic deconvolution, and 2.15 million peptides encompassing 167 alleles, we achieved a 1.44-fold improvement of positive predictive value compared with existing tools when evaluated on independent monoallelic datasets and a 1.17-fold improvement when evaluating on tumor samples. With a high degree of accuracy, SHERPA has the potential to enable precision neoantigen discovery for future clinical applications. Display omitted •Generated 25 stably transfected monoallelic cell lines and applied immunopeptidomics.•Harmonized 512 public immunopeptidomic samples through systematic reprocessing.•Developed pan-allele MHC binding algorithm (SHERPA) utilizing 167 human HLA alleles.•Employed empirically derived antigen-processing features to predict MHC presentation.•SHERPA demonstrates up to 1.44-fold increased precision over competing algorithms. Accurately identifying neoantigens is critical for many clinical applications. We generated immunopeptidomics data from 25 stably transfected monoallelic cell lines. Then, we systematically reprocessed a large corpus of public data to improve MHC binding pocket diversity and to empirically learn the rules of antigen presentation. In applying these datasets, we trained SHERPA, an MHC binding and presentation prediction algorithm. SHERPA improves performance compared with existing tools by 1.44-fold in held-out monoallelic data and 1.11-fold for known immunogenic epitopes.