Advances in high-throughput sequencing technologies now allow for large-scale characterization of B cell immunoglobulin (Ig) repertoires. The high germline and somatic diversity of the Ig repertoire ...presents challenges for biologically meaningful analysis, which requires specialized computational methods. We have developed a suite of utilities, Change-O, which provides tools for advanced analyses of large-scale Ig repertoire sequencing data. Change-O includes tools for determining the complete set of Ig variable region gene segment alleles carried by an individual (including novel alleles), partitioning of Ig sequences into clonal populations, creating lineage trees, inferring somatic hypermutation targeting models, measuring repertoire diversity, quantifying selection pressure, and calculating sequence chemical properties. All Change-O tools utilize a common data format, which enables the seamless integration of multiple analyses into a single workflow.
Change-O is freely available for non-commercial use and may be downloaded from http://clip.med.yale.edu/changeo.
steven.kleinstein@yale.edu.
Individual variation in germline and expressed B-cell immunoglobulin (Ig) repertoires has been associated with aging, disease susceptibility, and differential response to infection and vaccination. ...Repertoire properties can now be studied at large-scale through next-generation sequencing of rearranged Ig genes. Accurate analysis of these repertoire-sequencing (Rep-Seq) data requires identifying the germline variable (V), diversity (D), and joining (J) gene segments used by each Ig sequence. Current V(D)J assignment methods work by aligning sequences to a database of known germline V(D)J segment alleles. However, existing databases are likely to be incomplete and novel polymorphisms are hard to differentiate from the frequent occurrence of somatic hypermutations in Ig sequences. Here we develop a Tool for Ig Genotype Elucidation via Rep-Seq (TIgGER). TIgGER analyzes mutation patterns in Rep-Seq data to identify novel V segment alleles, and also constructs a personalized germline database containing the specific set of alleles carried by a subject. This information is then used to improve the initial V segment assignments from existing tools, like IMGT/HighV-QUEST. The application of TIgGER to Rep-Seq data from seven subjects identified 11 novel V segment alleles, including at least one in every subject examined. These novel alleles constituted 13% of the total number of unique alleles in these subjects, and impacted 3% of V(D)J segment assignments. These results reinforce the highly polymorphic nature of human Ig V genes, and suggest that many novel alleles remain to be discovered. The integration of TIgGER into Rep-Seq processing pipelines will increase the accuracy of V segment assignments, thus improving B-cell repertoire analyses.
High-throughput immunoglobulin sequencing promises new insights into the somatic hypermutation and antigen-driven selection processes that underlie B-cell affinity maturation and adaptive immunity. ...The ability to estimate positive and negative selection from these sequence data has broad applications not only for understanding the immune response to pathogens, but is also critical to determining the role of somatic hypermutation in autoimmunity and B-cell cancers. Here, we develop a statistical framework for Bayesian estimation of Antigen-driven SELectIoN (BASELINe) based on the analysis of somatic mutation patterns. Our approach represents a fundamental advance over previous methods by shifting the problem from one of simply detecting selection to one of quantifying selection. Along with providing a more intuitive means to assess and visualize selection, our approach allows, for the first time, comparative analysis between groups of sequences derived from different germline V(D)J segments. Application of this approach to next-generation sequencing data demonstrates different selection pressures for memory cells of different isotypes. This framework can easily be adapted to analyze other types of DNA mutation patterns resulting from a mutator that displays hot/cold-spots, substitution preference or other intrinsic biases.
Driven by dramatic technological improvements, large-scale characterization of lymphocyte receptor repertoires via high-throughput sequencing is now feasible. Although promising, the high germline ...and somatic diversity, especially of B-cell immunoglobulin repertoires, presents challenges for analysis requiring the development of specialized computational pipelines. We developed the REpertoire Sequencing TOolkit (pRESTO) for processing reads from high-throughput lymphocyte receptor studies. pRESTO processes raw sequences to produce error-corrected, sorted and annotated sequence sets, along with a wealth of metrics at each step. The toolkit supports multiplexed primer pools, single- or paired-end reads and emerging technologies that use single-molecule identifiers. pRESTO has been tested on data generated from Roche and Illumina platforms. It has a built-in capacity to parallelize the work between available processors and is able to efficiently process millions of sequences generated by typical high-throughput projects.
pRESTO is freely available for academic use. The software package and detailed tutorials may be downloaded from http://clip.med.yale.edu/presto.
Mutation-derived neoantigens distinguish tumor from normal cells. T cells can sense the HLA-presented mutations, recognize tumor cells as non-self and destroy them. Therapeutically, immunotherapy ...antibodies can increase the virulence of the immune system by increasing T-cell cytotoxicity targeted toward neoantigens. Neoantigen vaccines act through antigen-presenting cells, such as dendritic cells, to activate patient-endogenous T cells that recognize vaccine-encoded mutations. Infusion of mutation-targeting T cells by adoptive cell therapy (ACT) directly increases the number and frequency of cytotoxic T cells recognizing and killing tumor cells. At the same time, publicly-funded consortia have profiled tumor genomes across many indications, identifying mutations in each tumor. For example, we find basal and HER2 positive tumors contain more mutated proteins and more TP53 mutations than luminal A/B breast tumors. HPV negative tumors have more mutated proteins than HPV positive head and neck tumors and in agreement with the hypothesis that HPV activity interferes with p53 activity, only 14% of the HPV positive mutations have TP53 mutations vs. 86% of the HPV negative tumors. Lung adenocarcinomas in smokers have over four times more mutated proteins relative to those in never smokers (median 248 vs. 61, respectively). With an eye toward immunotherapy applications, we review the spectrum of mutations in multiple indications, show variations in indication sub-types, and examine intra- and inter-indication prevalence of re-occurring mutation neoantigens that could be used for warehouse vaccines and ACT.
Angiosarcoma is an uncommon endothelial malignancy and a highly aggressive soft tissue sarcoma. Due to its infiltrative nature, successful management of localized angiosarcoma is often challenging. ...Systemic chemotherapy is used in the metastatic setting and occasionally in patients with high-risk localized disease in neoadjuvant or adjuvant settings. However, responses tend to be short-lived and most patients succumb to metastatic disease. Novel therapies are needed for patients with angiosarcomas.
We performed a retrospective analysis of patients with locally advanced or metastatic angiosarcoma, who were treated with checkpoint inhibitors at our institution. We collected their clinical information and outcome measurements. In one patient with achieved complete response, we analyzed circulating and infiltrating T cells within peripheral blood and tumor tissue.
We have treated seven angiosarcoma (AS) patients with checkpoint inhibitors either in the context of clinical trials or off label Pembrolizumab + Axitinib (NCT02636725; n = 1), AGEN1884, a CTLA-4 inhibitor (NCT02694822; n = 2), Pembrolizumab (n = 4). Five patients had cutaneous angiosarcoma, one primary breast angiosarcoma and one radiation-associated breast angiosarcoma. At 12 weeks, 5/7 patients (71%) had partial response of their lesions either on imaging and/or clinical exam and two (29%) had progressive disease. 6/7 patients are alive to date and, thus far, 3/7 patients (43%) have progressed (median 3.4 months)- one achieved partial response after pembrolizumab was switched to ongoing Nivolumab/Ipilimumab, one died of progressive disease at 31 weeks (primary breast angiosarcoma) and one was placed on pazopanib. One patient had a complete response (CR) following extended treatment with monotherapy AGEN1884. No patient experienced any ≥ grade 2 toxicities.
This case series underscores the value of targeted immunotherapy in treating angiosarcoma. It also identifies genetic heterogeneity of cutaneous angiosarcomas and discusses specific genetic findings that may explain reported benefits from immunotherapy.
Human memory B cells and marginal zone (MZ) B cells share common features such as the expression of CD27 and somatic mutations in their IGHV and BCL6 genes, but the relationship between them is ...controversial. Here, we show phenotypic progression within lymphoid tissues as MZ B cells emerge from the mature naïve B cell pool via a precursor CD27
CD45RB
population distant from memory cells. By imaging mass cytometry, we find that MZ B cells and memory B cells occupy different microanatomical niches in organised gut lymphoid tissues. Both populations disseminate widely between distant lymphoid tissues and blood, and both diversify their IGHV repertoire in gut germinal centres (GC), but nevertheless remain largely clonally separate. MZ B cells are therefore not developmentally contiguous with or analogous to classical memory B cells despite their shared ability to transit through GC, where somatic mutations are acquired.
The adaptive immune system confers protection by generating a diverse repertoire of antibody receptors that are rapidly expanded and contracted in response to specific targets. Next-generation DNA ...sequencing now provides the opportunity to survey this complex and vast repertoire. In the present work, we describe a set of tools for the analysis of antibody repertoires and their application to elucidating the dynamics of the response to viral vaccination in human volunteers. By analyzing data from 38 separate blood samples across 2 y, we found that the use of the germ-line library of V and J segments is conserved between individuals over time. Surprisingly, there appeared to be no correlation between the use level of a particular VJ combination and degree of expansion. We found the antibody RNA repertoire in each volunteer to be highly dynamic, with each individual displaying qualitatively different response dynamics. By using combinatorial phage display, we screened selected VH genes paired with their corresponding VL library for affinity against the vaccine antigens. Altogether, this work presents an additional set of tools for profiling the human antibody repertoire and demonstrates characterization of the fast repertoire dynamics through time in multiple individuals responding to an immune challenge.
Analyses of somatic hypermutation (SHM) patterns in B cell immunoglobulin (Ig) sequences contribute to our basic understanding of adaptive immunity, and have broad applications not only for ...understanding the immune response to pathogens, but also to determining the role of SHM in autoimmunity and B cell cancers. Although stochastic, SHM displays intrinsic biases that can confound statistical analysis, especially when combined with the particular codon usage and base composition in Ig sequences. Analysis of B cell clonal expansion, diversification, and selection processes thus critically depends on an accurate background model for SHM micro-sequence targeting (i.e., hot/cold-spots) and nucleotide substitution. Existing models are based on small numbers of sequences/mutations, in part because they depend on data from non-coding regions or non-functional sequences to remove the confounding influences of selection. Here, we combine high-throughput Ig sequencing with new computational analysis methods to produce improved models of SHM targeting and substitution that are based only on synonymous mutations, and are thus independent of selection. The resulting "S5F" models are based on 806,860 Synonymous mutations in 5-mer motifs from 1,145,182 Functional sequences and account for dependencies on the adjacent four nucleotides (two bases upstream and downstream of the mutation). The estimated profiles can explain almost half of the variance in observed mutation patterns, and clearly show that both mutation targeting and substitution are significantly influenced by neighboring bases. While mutability and substitution profiles were highly conserved across individuals, the variability across motifs was found to be much larger than previously estimated. The model and method source code are made available at http://clip.med.yale.edu/SHM.
Genome-wide transcriptional profiling of patient blood samples offers a powerful tool to investigate underlying disease mechanisms and personalized treatment decisions. Most studies are based on ...analysis of total peripheral blood mononuclear cells (PBMCs), a mixed population. In this case, accuracy is inherently limited since cell subset-specific differential expression of gene signatures will be diluted by RNA from other cells. While using specific PBMC subsets for transcriptional profiling would improve our ability to extract knowledge from these data, it is rarely obvious which cell subset(s) will be the most informative.
We have developed a computational method (Subset Prediction from Enrichment Correlation, SPEC) to predict the cellular source for a pre-defined list of genes (i.e. a gene signature) using only data from total PBMCs. SPEC does not rely on the occurrence of cell subset-specific genes in the signature, but rather takes advantage of correlations with subset-specific genes across a set of samples. Validation using multiple experimental datasets demonstrates that SPEC can accurately identify the source of a gene signature as myeloid or lymphoid, as well as differentiate between B cells, T cells, NK cells and monocytes. Using SPEC, we predict that myeloid cells are the source of the interferon-therapy response gene signature associated with HCV patients who are non-responsive to standard therapy.
SPEC is a powerful technique for blood genomic studies. It can help identify specific cell subsets that are important for understanding disease and therapy response. SPEC is widely applicable since only gene expression profiles from total PBMCs are required, and thus it can easily be used to mine the massive amount of existing microarray or RNA-seq data.