Abstract
Accurately inferring the genome-wide landscape of recombination rates in natural populations is a central aim in genomics, as patterns of linkage influence everything from genetic mapping to ...understanding evolutionary history. Here, we describe recombination landscape estimation using recurrent neural networks (ReLERNN), a deep learning method for estimating a genome-wide recombination map that is accurate even with small numbers of pooled or individually sequenced genomes. Rather than use summaries of linkage disequilibrium as its input, ReLERNN takes columns from a genotype alignment, which are then modeled as a sequence across the genome using a recurrent neural network. We demonstrate that ReLERNN improves accuracy and reduces bias relative to existing methods and maintains high accuracy in the face of demographic model misspecification, missing genotype calls, and genome inaccessibility. We apply ReLERNN to natural populations of African Drosophila melanogaster and show that genome-wide recombination landscapes, although largely correlated among populations, exhibit important population-specific differences. Lastly, we connect the inferred patterns of recombination with the frequencies of major inversions segregating in natural Drosophila populations.
Macaques are a commonly used model for studying immunity to human viruses, including for studies of SARS-CoV-2 infection and vaccination. However, it is unknown whether macaque antibody responses ...resemble the response in humans. To answer this question, we employed a phage-based deep mutational scanning approach (Phage-DMS) to compare which linear epitopes are targeted on the SARS-CoV-2 Spike protein in convalescent humans, convalescent (re-infected) rhesus macaques, mRNA-vaccinated humans, and repRNA-vaccinated pigtail macaques. We also used Phage-DMS to determine antibody escape pathways within each epitope, enabling a granular comparison of antibody binding specificities at the locus level. Overall, we identified some common epitope targets in both macaques and humans, including in the fusion peptide (FP) and stem helix-heptad repeat 2 (SH-H) regions. Differences between groups included a response to epitopes in the N-terminal domain (NTD) and C-terminal domain (CTD) in vaccinated humans but not vaccinated macaques, as well as recognition of a CTD epitope and epitopes flanking the FP in convalescent macaques but not convalescent humans. There was also considerable variability in the escape pathways among individuals within each group. Sera from convalescent macaques showed the least variability in escape overall and converged on a common response with vaccinated humans in the SH-H epitope region, suggesting highly similar antibodies were elicited. Collectively, these findings suggest that the antibody response to SARS-CoV-2 in macaques shares many features with humans, but with substantial differences in the recognition of certain epitopes and considerable individual variability in antibody escape profiles, suggesting a diverse repertoire of antibodies that can respond to major epitopes in both humans and macaques. Differences in macaque species and exposure type may also contribute to these findings.
Control of the COVID-19 pandemic will rely on SARS-CoV-2 vaccine-elicited antibodies to protect against emerging and future variants; an understanding of the unique features of the humoral responses ...to infection and vaccination, including different vaccine platforms, is needed to achieve this goal.
The epitopes and pathways of escape for Spike-specific antibodies in individuals with diverse infection and vaccination history were profiled using Phage-DMS. Principal component analysis was performed to identify regions of antibody binding along the Spike protein that differentiate the samples from one another. Within these epitope regions, we determined potential sites of escape by comparing antibody binding of peptides containing wild-type residues versus peptides containing a mutant residue.
Individuals with mild infection had antibodies that bound to epitopes in the S2 subunit within the fusion peptide and heptad-repeat regions, whereas vaccinated individuals had antibodies that additionally bound to epitopes in the N- and C-terminal domains of the S1 subunit, a pattern that was also observed in individuals with severe disease due to infection. Epitope binding appeared to change over time after vaccination, but other covariates such as mRNA vaccine dose, mRNA vaccine type, and age did not affect antibody binding to these epitopes. Vaccination induced a relatively uniform escape profile across individuals for some epitopes, whereas there was much more variation in escape pathways in mildly infected individuals. In the case of antibodies targeting the fusion peptide region, which was a common response to both infection and vaccination, the escape profile after infection was not altered by subsequent vaccination.
The finding that SARS-CoV-2 mRNA vaccination resulted in binding to additional epitopes beyond what was seen after infection suggests that protection could vary depending on the route of exposure to Spike antigen. The relatively conserved escape pathways to vaccine-induced antibodies relative to infection-induced antibodies suggests that if escape variants emerge they may be readily selected for across vaccinated individuals. Given that the majority of people will be first exposed to Spike via vaccination and not infection, this work has implications for predicting the selection of immune escape variants at a population level.
This work was supported by NIH grants AI138709 (PI JMO) and AI146028 (PI FAM). JMO received support as the Endowed Chair for Graduate Education (FHCRC). The research of FAM was supported in part by a Faculty Scholar grant from the Howard Hughes Medical Institute and the Simons Foundation. Scientific Computing Infrastructure at Fred Hutch was funded by ORIP grant S10OD028685.
Lassa virus is estimated to cause thousands of human deaths per year, primarily due to spillovers from its natural host, Mastomys rodents. Efforts to create vaccines and antibody therapeutics must ...account for the evolutionary variability of the Lassa virus's glycoprotein complex (GPC), which mediates viral entry into cells and is the target of neutralizing antibodies. To map the evolutionary space accessible to GPC, we used pseudovirus deep mutational scanning to measure how nearly all GPC amino-acid mutations affected cell entry and antibody neutralization. Our experiments defined functional constraints throughout GPC. We quantified how GPC mutations affected neutralization with a panel of monoclonal antibodies. All antibodies tested were escaped by mutations that existed among natural Lassa virus lineages. Overall, our work describes a biosafety-level-2 method to elucidate the mutational space accessible to GPC and shows how prospective characterization of antigenic variation could aid the design of therapeutics and vaccines.Lassa virus is estimated to cause thousands of human deaths per year, primarily due to spillovers from its natural host, Mastomys rodents. Efforts to create vaccines and antibody therapeutics must account for the evolutionary variability of the Lassa virus's glycoprotein complex (GPC), which mediates viral entry into cells and is the target of neutralizing antibodies. To map the evolutionary space accessible to GPC, we used pseudovirus deep mutational scanning to measure how nearly all GPC amino-acid mutations affected cell entry and antibody neutralization. Our experiments defined functional constraints throughout GPC. We quantified how GPC mutations affected neutralization with a panel of monoclonal antibodies. All antibodies tested were escaped by mutations that existed among natural Lassa virus lineages. Overall, our work describes a biosafety-level-2 method to elucidate the mutational space accessible to GPC and shows how prospective characterization of antigenic variation could aid the design of therapeutics and vaccines.
The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic ...simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.
Abstract
Summary
We present the phippery software suite for analyzing data from phage display methods that use immunoprecipitation and deep sequencing to capture antibody binding to peptides, often ...referred to as PhIP-Seq. It has three main components that can be used separately or in conjunction: (i) a Nextflow pipeline, phip-flow, to process raw sequencing data into a compact, multidimensional dataset format and allows for end-to-end automation of reproducible workflows. (ii) a Python API, phippery, which provides interfaces for tasks such as count normalization, enrichment calculation, multidimensional scaling, and more, and (iii) a Streamlit application, phip-viz, as an interactive interface for visualizing the data as a heatmap in a flexible manner.
Availability and implementation
All software packages are publicly available under the MIT License. The phip-flow pipeline: https://github.com/matsengrp/phip-flow. The phippery library: https://github.com/matsengrp/phippery. The phip-viz Streamlit application: https://github.com/matsengrp/phip-viz.
Abstract
Gradients of probabilistic model likelihoods with respect to their parameters are essential for modern computational statistics and machine learning. These calculations are readily available ...for arbitrary models via “automatic differentiation” implemented in general-purpose machine-learning libraries such as TensorFlow and PyTorch. Although these libraries are highly optimized, it is not clear if their general-purpose nature will limit their algorithmic complexity or implementation speed for the phylogenetic case compared to phylogenetics-specific code. In this paper, we compare six gradient implementations of the phylogenetic likelihood functions, in isolation and also as part of a variational inference procedure. We find that although automatic differentiation can scale approximately linearly in tree size, it is much slower than the carefully implemented gradient calculation for tree likelihood and ratio transformation operations. We conclude that a mixed approach combining phylogenetic libraries with machine learning libraries will provide the optimal combination of speed and model flexibility moving forward.
Abstract
Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data ...to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime’s many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.
Since its introduction in 2016, the msprime simulator has grown in popularity and is now one of the most commonly used tools in population genetics. This article marks the 1.0 release of msprime and summarizes the many features it has accumulated through an open source community development model. Despite its generality, msprime’s performance is excellent—in many cases orders of magnitude faster and more memory efficient than more specialized methods.