Background: The number of applications of deep learning algorithms in bioinformatics is increasing as they usually achieve superior performance over classical approaches, especially, when bigger ...training datasets are available. In deep learning applications, discrete data, e.g. words or n-grams in language, or amino
acids or nucleotides in bioinformatics, are generally represented as a continuous vector through an embedding matrix. Recently, learning this embedding matrix directly from the data as part of the continuous iteration of the model to optimize the target prediction – a process called ‘end-to-end learning’ – has led to state-of-the-art results in many fields. Although usage of embeddings is well described in the bioinformatics literature, the potential of end-to-end learning for single amino acids, as compared to more classical manually-curated encoding strategies, has not been systematically addressed. To this end, we compared classical encoding matrices, namely one-hot, VHSE8 and BLOSUM62, to end-to-end learning of amino acid embeddings for two different prediction tasks using three widely used architectures, namely recurrent neural networks (RNN), convolutional neural networks (CNN), and the hybrid CNN-RNN.
Results: By using different deep learning architectures, we show that end-to-end learning is on par with classical encodings for embeddings of the same dimension even when limited training data is available, and might allow for a reduction in the embedding dimension without performance loss, which is critical when deploying the models to devices with limited computational capacities. We found that the embedding dimension is a major factor in controlling the model performance. Surprisingly, we observed that deep learning models are capable of learning from random vectors of appropriate dimension.
Conclusion: Our study shows that end-to-end learning is a flexible and powerful method for amino acid encoding. Further, due to the flexibility of deep learning systems, amino acid encoding schemes should be benchmarked against random vectors of the same dimension to disentangle the information content provided by the encoding scheme from the distinguishability effect provided by the scheme.
Genotype imputation of the human leukocyte antigen (HLA) region is a cost-effective means to infer classical HLA alleles from inexpensive and dense SNP array data. In the research setting, imputation ...helps avoid costs for wet lab-based HLA typing and thus renders association analyses of the HLA in large cohorts feasible. Yet, most HLA imputation reference panels target Caucasian ethnicities and multi-ethnic panels are scarce. We compiled a high-quality multi-ethnic reference panel based on genotypes measured with Illumina's Immunochip genotyping array and HLA types established using a high-resolution next generation sequencing approach. Our reference panel includes more than 1,300 samples from Germany, Malta, China, India, Iran, Japan and Korea and samples of African American ancestry for all classical HLA class I and II alleles including HLA-DRB3/4/5. Applying extensive cross-validation, we benchmarked the imputation using the HLA imputation tool HIBAG, our multi-ethnic reference and an independent, previously published data set compiled of subpopulations of the 1000 Genomes project. We achieved average imputation accuracies higher than 0.924 for the commonly studied HLA-A, -B, -C, -DQB1 and -DRB1 genes across all ethnicities. We investigated allele-specific imputation challenges in regard to geographic origin of the samples using sensitivity and specificity measurements as well as allele frequencies and identified HLA alleles that are challenging to impute for each of the populations separately. In conclusion, our new multi-ethnic reference data set allows for high resolution HLA imputation of genotypes at all classical HLA class I and II genes including the HLA-DRB3/4/5 loci based on diverse ancestry populations.
The human leukocyte antigen (HLA) proteins play a fundamental role in the adaptive immune system as they present peptides to T cells. Mass-spectrometry-based immunopeptidomics is a promising and ...powerful tool for characterizing the immunopeptidomic landscape of HLA proteins, that is the peptides presented on HLA proteins. Despite the growing interest in the technology, and the recent rise of immunopeptidomics-specific identification pipelines, there is still a gap in data-analysis and software tools that are specialized in analyzing and visualizing immunopeptidomics data. We present the IPTK library which is an open-source Python-based library for analyzing, visualizing, comparing, and integrating different omics layers with the identified peptides for an in-depth characterization of the immunopeptidome. Using different datasets, we illustrate the ability of the library to enrich the result of the identified peptidomes. Also, we demonstrate the utility of the library in developing other software and tools by developing an easy-to-use dashboard that can be used for the interactive analysis of the results. IPTK provides a modular and extendable framework for analyzing and integrating immunopeptidomes with different omics layers. The library is deployed into PyPI at https://pypi.org/project/IPTKL/ and into Bioconda at https://anaconda.org/bioconda/iptkl, while the source code of the library and the dashboard, along with the online tutorials are available at https://github.com/ikmb/iptoolkit.
Human Leukocyte Antigen class II (HLA-II) molecules present peptides to T lymphocytes and play an important role in adaptive immune responses. Characterizing the binding specificity of single HLA-II ...molecules has profound impacts for understanding cellular immunity, identifying the cause of autoimmune diseases, for immunotherapeutics, and vaccine development. Here, novel high-density peptide microarray technology combined with machine learning techniques were used to address this task at an unprecedented level of high-throughput. Microarrays with over 200,000 defined peptides were assayed with four exemplary HLA-II molecules. Machine learning was applied to mine the signals. The comparison of identified binding motifs, and power for predicting eluted ligands and CD4+ epitope datasets to that obtained using NetMHCIIpan-3.2, confirmed a high quality of the chip readout. These results suggest that the proposed microarray technology offers a novel and unique platform for large-scale unbiased interrogation of peptide binding preferences of HLA-II molecules.
During the peak of hospitalizations of patients with severe Covid-19 in Italy and Spain in March, a group of researchers in these and other countries obtained and analyzed samples, resulting in the ...identification of two chromosomal loci associated with the disorder.
Here we report a case where the manifestations of insulin-dependent diabetes occurred following SARS-CoV-2 infection in a young individual in the absence of autoantibodies typical for type 1 diabetes ...mellitus. Specifically, a 19-year-old white male presented at our emergency department with diabetic ketoacidosis, C-peptide level of 0.62 µg l
, blood glucose concentration of 30.6 mmol l
(552 mg dl
) and haemoglobin A1c of 16.8%. The patient´s case history revealed probable COVID-19 infection 5-7 weeks before admission, based on a positive test for antibodies against SARS-CoV-2 proteins as determined by enzyme-linked immunosorbent assay. Interestingly, the patient carried a human leukocyte antigen genotype (HLA DR1-DR3-DQ2) considered to provide only a slightly elevated risk of developing autoimmune type 1 diabetes mellitus. However, as noted, no serum autoantibodies were observed against islet cells, glutamic acid decarboxylase, tyrosine phosphatase, insulin and zinc-transporter 8. Although our report cannot fully establish causality between COVID-19 and the development of diabetes in this patient, considering that SARS-CoV-2 entry receptors, including angiotensin-converting enzyme 2, are expressed on pancreatic β-cells and, given the circumstances of this case, we suggest that SARS-CoV-2 infection, or COVID-19, might negatively affect pancreatic function, perhaps through direct cytolytic effects of the virus on β-cells.
Genetic predisposition is has been identified as a cause of cancer, yet little is known about the role of adult cancer predisposition syndromes in childhood cancer. We examined the extent to which ...heterozygous pathogenic germline variants in BRCA1, BRCA2, PALB2, ATM, CHEK2, MSH2, MSH6, MLH1, and PMS2 contribute to cancer risk in children and adolescents.
We conducted a meta-analysis of 11 studies that incorporated comprehensive germline testing for children and adolescents with cancer. ClinVar pathogenic or likely pathogenic variants (PVs) in genes of interest were compared with 2 control groups. Results were validated in a cohort of mainly European patients and controls. We employed the Proxy External Controls Association Test to account for different pipelines.
Among 3975 children and adolescents with cancer, statistically significant associations with cancer risk were observed for PVs in BRCA1 and 2 (26 PVs vs 63 PVs among 27 501 controls, odds ratio = 2.78, 95% confidence interval = 1.69 to 4.45; P < .001) and mismatch repair genes (19 PVs vs 14 PVs among 27 501 controls, odds ratio = 7.33, 95% confidence interval = 3.64 to 14.82; P <.001). Associations were seen in brain and other solid tumors but not in hematologic neoplasms. We confirmed similar findings in 1664 pediatric cancer patients primarily of European descent.
These data suggest that heterozygous PVs in BRCA1 and 2 and mismatch repair genes contribute with reduced penetrance to cancer risk in children and adolescents. No changes to predictive genetic testing and surveillance recommendations are required.
Natural killer (NK) cells are innate immune cells that contribute to host defense against virus infections. NK cells respond to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in vitro ...and are activated in patients with acute coronavirus disease 2019 (COVID-19). However, by which mechanisms NK cells detect SARS-CoV-2-infected cells remains largely unknown. Here, we show that the Non-structural protein 13 of SARS-CoV-2 encodes for a peptide that is presented by human leukocyte antigen E (HLA-E). In contrast with self-peptides, the viral peptide prevents binding of HLA-E to the inhibitory receptor NKG2A, thereby rendering target cells susceptible to NK cell attack. In line with these observations, NKG2A-expressing NK cells are particularly activated in patients with COVID-19 and proficiently limit SARS-CoV-2 replication in infected lung epithelial cells in vitro. Thus, these data suggest that a viral peptide presented by HLA-E abrogates inhibition of NKG2A+ NK cells, resulting in missing self-recognition.
Display omitted
•SARS-CoV-2 Non-structural protein 13 encodes for an HLA-E-restricted peptide•HLA-E/Nsp13232–240 complexes do not bind to the inhibitory receptor NKG2A•Nsp13232–240 allows for NKG2A+ NK cell activation by missing self-recognition•NKG2A+ NK cells proficiently restrict SARS-CoV-2 replication in vitro
Natural killer (NK) cells eliminate virus-infected cells. Hammer et al. show that SARS-CoV-2 encodes for a peptide that does not bind to an inhibitory receptor of NK cells, thereby facilitating NK cell activation. This missing self-recognition could enable NK cells to detect and kill SARS-CoV-2-infected cells.
ObjectiveOne of the current hypotheses to explain the proinflammatory immune response in IBD is a dysregulated T cell reaction to yet unknown intestinal antigens. As such, it may be possible to ...identify disease-associated T cell clonotypes by analysing the peripheral and intestinal T-cell receptor (TCR) repertoire of patients with IBD and controls.DesignWe performed bulk TCR repertoire profiling of both the TCR alpha and beta chains using high-throughput sequencing in peripheral blood samples of a total of 244 patients with IBD and healthy controls as well as from matched blood and intestinal tissue of 59 patients with IBD and disease controls. We further characterised specific T cell clonotypes via single-cell RNAseq.ResultsWe identified a group of clonotypes, characterised by semi-invariant TCR alpha chains, to be significantly enriched in the blood of patients with Crohn’s disease (CD) and particularly expanded in the CD8+ T cell population. Single-cell RNAseq data showed an innate-like phenotype of these cells, with a comparable gene expression to unconventional T cells such as mucosal associated invariant T and natural killer T (NKT) cells, but with distinct TCRs.ConclusionsWe identified and characterised a subpopulation of unconventional Crohn-associated invariant T (CAIT) cells. Multiple evidence suggests these cells to be part of the NKT type II population. The potential implications of this population for CD or a subset thereof remain to be elucidated, and the immunophenotype and antigen reactivity of CAIT cells need further investigations in future studies.
Reply to Li and Colleagues Kratz, Christian P; Smirnov, Dmitrii; Autry, Robert ...
JNCI : Journal of the National Cancer Institute,
06/2023, Letnik:
115, Številka:
6
Journal Article