Class I major histocompatibility complex proteins play a critical role in the adaptive immune system by binding to peptides derived from cytosolic proteins and presenting them on the cell surface for ...surveillance by T cells. The varied peptide binding specificity of these highly polymorphic molecules has important consequences for vaccine design, transplantation, autoimmunity, and cancer development. Here, we describe a molecular modeling study of MHC-peptide interactions that integrates sampling techniques from protein-protein docking, loop modeling, de novo structure prediction, and protein design in order to construct atomically detailed peptide binding landscapes for a diverse set of MHC proteins. Specificity profiles derived from these landscapes recover key features of experimental binding profiles and can be used to predict peptide binding with reasonable accuracy. Family wide comparison of the predicted binding landscapes recapitulates previously reported patterns of specificity divergence and peptidere pert ire diversity while providing a structural basis for observed specificity patterns. The size and sequence diversity of these structure-based binding landscapes enable us to identify subtle patterns of covariation between peptide sequence positions; analysis of the associated structural models suggests physical interactions that may mediate these sequence correlations.
Identifying patients at increased risk for severe COVID-19 is of high priority during the pandemic as it could affect clinical management and shape public health guidelines. In this study we assessed ...whether a second PCR test conducted 2-7 days after a SARS-CoV-2 positive test could identify patients at risk for severe illness. Analysis of a nationwide electronic health records data of 1683 SARS-CoV-2 positive individuals indicated that a second negative PCR test result was associated with lower risk for severe illness compared to a positive result. This association was seen across different age groups and clinical settings. More importantly, it was not limited to recovering patients but also observed in patients who still had evidence of COVID-19 as determined by a subsequent positive PCR test. Our study suggests that an early second PCR test may be used as a supportive risk-assessment tool to improve disease management and patient care.
As the scientific research community along with healthcare professionals and decision makers around the world fight tirelessly against the coronavirus disease 2019 (COVID‐19) pandemic, the need for ...comparative effectiveness research (CER) on preventive and therapeutic interventions for COVID‐19 is immense. Randomized controlled trials markedly under‐represent the frail and complex patients seen in routine care, and they do not typically have data on long‐term treatment effects. The increasing availability of electronic health records (EHRs) for clinical research offers the opportunity to generate timely real‐world evidence reflective of routine care for optimal management of COVID‐19. However, there are many potential threats to the validity of CER based on EHR data that are not originally generated for research purposes. To ensure unbiased and robust results, we need high‐quality healthcare databases, rigorous study designs, and proper implementation of appropriate statistical methods. We aimed to describe opportunities and challenges in EHR‐based CER for COVID‐19‐related questions and to introduce best practices in pharmacoepidemiology to minimize potential biases. We structured our discussion into the following topics: (1) study population identification based on exposure status; (2) ascertainment of outcomes; (3) common biases and potential solutions; and (iv) data operational challenges specific to COVID‐19 CER using EHRs. We provide structured guidance for the proper conduct and appraisal of drug and vaccine effectiveness and safety research using EHR data for the pandemic. This paper is endorsed by the International Society for Pharmacoepidemiology (ISPE).
Assessing the impact of cesarean delivery (CD) on long-term childhood outcomes is challenging as conducting a randomized controlled trial is rarely feasible and inferring it from observational data ...may be confounded. Utilizing data from electronic health records of 737,904 births, we defined and emulated a target trial to estimate the effect of CD on predefined long-term pediatric outcomes. Causal effects were estimated using pooled logistic regression and standardized survival curves, leveraging data breadth to account for potential confounders. Diverse sensitivity analyses were performed including replication of results in an external validation set from the UK including 625,044 births. Children born in CD had an increased risk to develop asthma (10-year risk differences (95% CI) 0.64% (0.31, 0.98)), an average treatment effect of 0.10 (0.07–0.12) on body mass index (BMI) z-scores at age 5 years old and 0.92 (0.68–1.14) on the number of respiratory infection events until 5 years of age. A positive 10-year risk difference was also observed for atopy (10-year risk differences (95% CI) 0.74% (-0.06, 1.52)) and allergy 0.47% (-0.32, 1.28)). Increased risk for these outcomes was also observed in the UK cohort. Our findings add to a growing body of evidence on the long-term effects of CD on pediatric morbidity, may assist in the decision to perform CD when not medically indicated and paves the way to future research on the mechanisms underlying these effects and intervention strategies targeting them.
Assessing the impact of cesarean delivery (CD) on long-term childhood outcomes is challenging as conducting a randomized controlled trial is rarely feasible and inferring it from observational data ...may be confounded. Utilizing data from electronic health records of 737,904 births, we defined and emulated a target trial to estimate the effect of CD on predefined long-term pediatric outcomes. Causal effects were estimated using pooled logistic regression and standardized survival curves, leveraging data breadth to account for potential confounders. Diverse sensitivity analyses were performed including replication of results in an external validation set from the UK including 625,044 births. Children born in CD had an increased risk to develop asthma (10-year risk differences (95% CI) 0.64% (0.31, 0.98)), an average treatment effect of 0.10 (0.07–0.12) on body mass index (BMI) z-scores at age 5 years old and 0.92 (0.68–1.14) on the number of respiratory infection events until 5 years of age. A positive 10-year risk difference was also observed for atopy (10-year risk differences (95% CI) 0.74% (-0.06, 1.52)) and allergy 0.47% (-0.32, 1.28)). Increased risk for these outcomes was also observed in the UK cohort. Our findings add to a growing body of evidence on the long-term effects of CD on pediatric morbidity, may assist in the decision to perform CD when not medically indicated and paves the way to future research on the mechanisms underlying these effects and intervention strategies targeting them.
Purpose
Supplementing investigator‐specified variables with large numbers of empirically identified features that collectively serve as ‘proxies’ for unspecified or unmeasured factors can often ...improve confounding control in studies utilizing administrative healthcare databases. Consequently, there has been a recent focus on the development of data‐driven methods for high‐dimensional proxy confounder adjustment in pharmacoepidemiologic research. In this paper, we survey current approaches and recent advancements for high‐dimensional proxy confounder adjustment in healthcare database studies.
Methods
We discuss considerations underpinning three areas for high‐dimensional proxy confounder adjustment: (1) feature generation—transforming raw data into covariates (or features) to be used for proxy adjustment; (2) covariate prioritization, selection, and adjustment; and (3) diagnostic assessment. We discuss challenges and avenues of future development within each area.
Results
There is a large literature on methods for high‐dimensional confounder prioritization/selection, but relatively little has been written on best practices for feature generation and diagnostic assessment. Consequently, these areas have particular limitations and challenges.
Conclusions
There is a growing body of evidence showing that machine‐learning algorithms for high‐dimensional proxy‐confounder adjustment can supplement investigator‐specified variables to improve confounding control compared to adjustment based on investigator‐specified variables alone. However, more research is needed on best practices for feature generation and diagnostic assessment when applying methods for high‐dimensional proxy confounder adjustment in pharmacoepidemiologic studies.
Real-world healthcare data hold the potential to identify therapeutic solutions for progressive diseases by efficiently pinpointing safe and efficacious repurposing drug candidates. This approach ...circumvents key early clinical development challenges, particularly relevant for neurological diseases, concordant with the vision of the 21st Century Cures Act. However, to-date, these data have been utilized mainly for confirmatory purposes rather than as drug discovery engines. Here, we demonstrate the usefulness of real-world data in identifying drug repurposing candidates for disease-modifying effects, specifically candidate marketed drugs that exhibit beneficial effects on Parkinson's disease (PD) progression. We performed an observational study in cohorts of ascertained PD patients extracted from two large medical databases, Explorys SuperMart (N = 88,867) and IBM MarketScan Research Databases (N = 106,395); and applied two conceptually different, well-established causal inference methods to estimate the effect of hundreds of drugs on delaying dementia onset as a proxy for slowing PD progression. Using this approach, we identified two drugs that manifested significant beneficial effects on PD progression in both datasets: rasagiline, narrowly indicated for PD motor symptoms; and zolpidem, a psycholeptic. Each confers its effects through distinct mechanisms, which we explored via a comparison of estimated effects within the drug classification ontology. We conclude that analysis of observational healthcare data, emulating otherwise costly, large, and lengthy clinical trials, can highlight promising repurposing candidates, to be validated in prospective registration trials, beneficial against common, late-onset progressive diseases for which disease-modifying therapeutic solutions are scarce.
Summary
The traditional approach to childhood obesity prevention and treatment should fit most patients, but misdiagnosis and treatment failure could be observed in some cases that lie away from ...average as part of individual variation or misclassification. Here, we reflect on the contributions that high‐throughput technologies such as next‐generation sequencing, mass spectrometry–based metabolomics and microbiome analysis make towards a personalized medicine approach to childhood obesity. We hypothesize that diagnosing a child as someone with obesity captures only part of the phenotype; and that metabolomics, genomics, transcriptomics and analyses of the gut microbiome, could add precision to the term “obese,” providing novel corresponding biomarkers. Identifying a cluster –omic signature in a given child can thus facilitate the development of personalized prognostic, diagnostic, and therapeutic approaches. It can also be applied to the monitoring of symptoms/signs evolution, treatment choices and efficacy, predisposition to drug‐related side effects and potential relapse. This article is a narrative review of the literature and summary of the main observations, conclusions and perspectives raised during the annual meeting of the European Childhood Obesity Group. Authors discuss some recent advances and future perspectives on utilizing a systems approach to understanding and managing childhood obesity in the context of the existing omics data.
Motivation: The task of engineering a protein to perform a target biological function is known as protein design. A commonly used paradigm casts this functional design problem as a structural one, ...assuming a fixed backbone. In probabilistic protein design, positional amino acid probabilities are used to create a random library of sequences to be simultaneously screened for biological activity. Clearly, certain choices of probability distributions will be more successful in yielding functional sequences. However, since the number of sequences is exponential in protein length, computational optimization of the distribution is difficult. Results: In this paper, we develop a computational framework for probabilistic protein design following the structural paradigm. We formulate the distribution of sequences for a structure using the Boltzmann distribution over their free energies. The corresponding probabilistic graphical model is constructed, and we apply belief propagation (BP) to calculate marginal amino acid probabilities. We test this method on a large structural dataset and demonstrate the superiority of BP over previous methods. Nevertheless, since the results obtained by BP are far from optimal, we thoroughly assess the paradigm using high-quality experimental data. We demonstrate that, for small scale sub-problems, BP attains identical results to those produced by exact inference on the paradigmatic model. However, quantitative analysis shows that the distributions predicted significantly differ from the experimental data. These findings, along with the excellent performance we observed using BP on the smaller problems, suggest potential shortcomings of the paradigm. We conclude with a discussion of how it may be improved in the future. Contact: fromer@cs.huji.ac.il
Motivation: The development of epitope-based vaccines crucially relies on the ability to classify Human Leukocyte Antigen (HLA) molecules into sets that have similar peptide binding specificities, ...termed supertypes. In their seminal work, Sette and Sidney defined nine HLA class I supertypes and claimed that these provide an almost perfect coverage of the entire repertoire of HLA class I molecules. HLA alleles are highly polymorphic and polygenic and therefore experimentally classifying each of these molecules to supertypes is at present an impossible task. Recently, a number of computational methods have been proposed for this task. These methods are based on defining protein similarity measures, derived from analysis of binding peptides or from analysis of the proteins themselves. Results: In this paper we define both peptide derived and protein derived similarity measures, which are based on learning distance functions. The peptide derived measure is defined using a peptide–peptide distance function, which is learned using information about known binding and non-binding peptides. The protein derived similarity measure is defined using a protein–protein distance function, which is learned using information about alleles previously classified to supertypes by Sette and Sidney (1999). We compare the classification obtained by these two complimentary methods to previously suggested classification methods. In general, our results are in excellent agreement with the classifications suggested by Sette and Sidney (1999) and with those reported by Buus et al. (2004). The main important advantage of our proposed distance-based approach is that it makes use of two different and important immunological sources of information—HLA alleles and peptides that are known to bind or not bind to these alleles. Since each of our distance measures is trained using a different source of information, their combination can provide a more confident classification of alleles to supertypes. Contact:tomboy@cs.huji.ac.il; cheny@cs.huji.ac.il