Methylated DNA immunoprecipitation (MeDIP) is a popular enrichment based method and can be combined with sequencing (termed MeDIP-seq) to interrogate the methylation status of cytosines across entire ...genomes. However, quality control and analysis of MeDIP-seq data have remained to be a challenge.
We report genome-wide DNA methylation profiles of wild type (wt) and mutant mouse cells, comprising 3 biological replicates of Thymine DNA glycosylase (Tdg) knockout (KO) embryonic stem cells (ESCs), in vitro differentiated neural precursor cells (NPCs) and embryonic fibroblasts (MEFs). The resulting 18 methylomes were analysed with MeDUSA (Methylated DNA Utility for Sequence Analysis), a novel MeDIP-seq computational analysis pipeline for the identification of differentially methylated regions (DMRs). The observed increase of hypermethylation in MEF promoter-associated CpG islands supports a previously proposed role for Tdg in the protection of regulatory regions from epigenetic silencing. Further analysis of genes and regions associated with the DMRs by gene ontology, pathway, and ChIP analyses revealed further insights into Tdg function, including an association of TDG with low-methylated distal regulatory regions.
We demonstrate that MeDUSA is able to detect both large-scale changes between cells from different stages of differentiation and also small but significant changes between the methylomes of cells that only differ in the KO of a single gene. These changes were validated utilising publicly available datasets and confirm TDG's function in the protection of regulatory regions from epigenetic silencing.
Regulatory change has long been hypothesized to drive the delineation of the human phenotype from other closely related primates. Here we provide evidence that CpG dinucleotides play a special role ...in this process. CpGs enable epigenome variability via DNA methylation, and this epigenetic mark functions as a regulatory mechanism. Therefore, species-specific CpGs may influence species-specific regulation. We report non-polymorphic species-specific CpG dinucleotides (termed "CpG beacons") as a distinct genomic feature associated with CpG island (CGI) evolution, human traits and disease. Using an inter-primate comparison, we identified 21 extreme CpG beacon clusters (≥ 20/kb peaks, empirical p < 1.0 × 10
−3
) in humans, which include associations with four monogenic developmental and neurological disease related genes (Benjamini-Hochberg corrected p = 6.03 × 10
−3
). We also demonstrate that beacon-mediated CpG density gain in CGIs correlates with reduced methylation in these species in orthologous CGIs over time, via human, chimpanzee and macaque MeDIP-seq. Therefore mapping into both the genomic and epigenomic space the identified CpG beacon clusters define points of intersection where a substantial two-way interaction between genetic sequence and epigenetic state has occurred. Taken together, our data support a model for CpG beacons to contribute to CGI evolution from genesis to tissue-specific to constitutively active CGIs.
Stem cells have been found in most tissues/organs. These somatic stem cells produce replacements for lost and damaged cells, and it is not completely understood how this regenerative capacity becomes ...diminished during aging. To study the possible involvement of epigenetic changes in somatic stem cell aging, we used murine hematopoiesis as a model system. Hematopoietic stem cells (HSCs) were enriched for via Hoechst exclusion activity (SP-HSC) from young, medium-aged and old mice and subjected to comprehensive, global methylome (MeDIP-seq) analysis. With age, we observed a global loss of DNA methylation of approximately 5%, but an increase in methylation at some CpG islands. Just over 100 significant (FDR < 0.2) aging-specific differentially methylated regions (aDMRs) were identified, which are surprisingly few considering the profound age-based changes that occur in HSC biology. Interestingly, the polycomb repressive complex -2 (PCRC2) target genes Kiss1r, Nav2 and Hsf4 were hypermethylated with age. The promoter for the Sdpr gene was determined to be progressively hypomethylated with age. This occurred concurrently with an increase in gene expression with age. To explore this relationship further, we cultured isolated SP-HSC in the presence of 5-aza-deoxycytdine and demonstrated a negative correlation between Sdpr promoter methylation and gene expression. We report that DNA methylation patterns are well preserved during hematopoietic stem cell aging, confirm that PCRC2 targets are increasingly methylated with age, and suggest that SDPR expression changes with age in HSCs may be regulated via age-based alterations in DNA methylation.
A substantial proportion of cancer cases present with a metastatic tumor and require further testing to determine the primary site; many of these are never fully diagnosed and remain cancer of ...unknown primary origin (CUP). It has been previously demonstrated that the somatic point mutations detected in a tumor can be used to identify its site of origin with limited accuracy. We hypothesized that higher accuracy could be achieved by a classification algorithm based on the following feature sets: 1) the number of nonsynonymous point mutations in a set of 232 specific cancer-associated genes, 2) frequencies of the 96 classes of single-nucleotide substitution determined by the flanking bases, and 3) copy number profiles, if available.
We used publicly available somatic mutation data from the COSMIC database to train random forest classifiers to distinguish among those tissues of origin for which sufficient data was available. We selected feature sets using cross-validation and then derived two final classifiers (with or without copy number profiles) using 80 % of the available tumors. We evaluated the accuracy using the remaining 20 %. For further validation, we assessed accuracy of the without-copy-number classifier on three independent data sets: 1669 newly available public tumors of various types, a cohort of 91 breast metastases, and a set of 24 specimens from 9 lung cancer patients subjected to multiregion sequencing.
The cross-validation accuracy was highest when all three types of information were used. On the left-out COSMIC data not used for training, we achieved a classification accuracy of 85 % across 6 primary sites (with copy numbers), and 69 % across 10 primary sites (without copy numbers). Importantly, a derived confidence score could distinguish tumors that could be identified with 95 % accuracy (32 %/75 % of tumors with/without copy numbers) from those that were less certain. Accuracy in the independent data sets was 46 %, 53 % and 89 % respectively, similar to the accuracy expected from the training data.
Identification of primary site from point mutation and/or copy number data may be accurate enough to aid clinical diagnosis of cancers of unknown primary origin.
The use of tumour xenografts is a well-established research tool in cancer genomics but has not yet been comprehensively evaluated for cancer epigenomics.
In this study, we assessed the suitability ...of patient-derived tumour xenografts (PDXs) for methylome analysis using Infinium 450 K Beadchips and MeDIP-seq.
Controlled for confounding host (mouse) sequences, comparison of primary PDXs and matching patient tumours in a rare (osteosarcoma) and common (colon) cancer revealed that an average 2.7% of the assayed CpG sites undergo major (Δβ ≥ 0.51) methylation changes in a cancer-specific manner as a result of the xenografting procedure. No significant subsequent methylation changes were observed after a second round of xenografting between primary and secondary PDXs. Based on computational simulation using publically available methylation data, we additionally show that future studies comparing two groups of PDXs should use 15 or more samples in each group to minimise the impact of xenografting-associated changes in methylation on comparison results.
Our results from rare and common cancers indicate that PDXs are a suitable discovery tool for cancer epigenomics and we provide guidance on how to overcome the observed limitations.
Use of circulating tumour DNA (ctDNA) as a liquid biopsy has been proposed for potential identification and monitoring of solid tumours. We investigate a next-generation sequencing approach for ...mutation detection in ctDNA in two related studies using a targeted panel. The first study was retrospective, using blood samples taken from melanoma patients at diverse timepoints before or after treatment, aiming to evaluate correlation between mutations identified in biopsy and ctDNA, and to acquire a first impression of influencing factors. We found good concordance between ctDNA and tumour mutations of melanoma patients when blood samples were collected within one year of biopsy or before treatment. In contrast, when ctDNA was sequenced after targeted treatment in melanoma, mutations were no longer found in 9 out of 10 patients, suggesting the method might be useful for detecting treatment response. Building on these findings, we focused the second study on ctDNA obtained before biopsy in lung patients, i.e. when a tentative diagnosis of lung cancer had been made, but no treatment had started. The main objective of this prospective study was to evaluate use of ctDNA in diagnosis, investigating the concordance of biopsy and ctDNA-derived mutation detection. Here we also found positive correlation between diagnostic lung biopsy results and pre-biopsy ctDNA sequencing, providing support for using ctDNA as a cost-effective, non-invasive solution when the tumour is inaccessible or when biopsy poses significant risk to the patient.
BackgroundA significant challenge within the field of personalized neoantigen therapies is the determination of which neoantigen targets will elicit durable, therapeutically relevant immune ...responses. T cell responses can be detected for circa 10–20% of neoepitopes selected for use in vaccines. Screening of memory responses in tumor-infiltrating T cells show much lower rates of 1–2%. Of this small percentage of neoantigens capable of driving an immune response, only a subset will be resistant to methods of tumor immune evasion. Therefore, it is paramount that both these challenges are faced in order to obtain a durable clinical response.Across different types of neoantigens, the relationship between clonal neoantigens and response to immunotherapy has previously been demonstrated across multiple indications supporting the key role of clonal neoantigens as substrate for T cell recognition of tumors.MethodsAchilles Therapeutics aims to deliver precision immunotherapies specifically targeting clonal neoantigens identified through the Achilles Clonality Engine methodology within our PELEUSTM bioinformatics platform. The PELEUSTM platform incorporates a Bayesian approach allowing for the determination of the probability of each potential neoantigen being clonal.In addition to clonality, and to improve our ability to select for immunogenic neoantigens, we have developed an extensive pipeline for identification of tumor-derived memory T cell responses to clonal neoantigens.ResultsThrough the use of data obtained by screening circa 10,000 neoantigens for T cell reactivity in expanded tumor-infiltrating lymphocytes, we developed and validated an AI method, NeoRanker, for predicting neoantigen immunogenicity. Using a small set of features incorporating genomic, transcriptomic and proteomic data for training purposes, NeoRanker is able to preferentially enrich our clonal neoantigen list for those capable of driving either CD8+ or CD4+ T cell responses. When benchmarked against well-known tools in the field including BigMHC and Prime, NeoRanker displayed the best performance as measured by the area under the receiver operator characteristic curve.ConclusionsWe believe this technology has broad applicability for optimising target selection across all types of personalized neoantigen vaccines and cell therapies.Trial RegistrationNCT03997474; NCT04032847; NCT03517917
Common human diseases are caused by the complex interplay of genetic susceptibility as well as environmental factors. Due to the environment's influence on the epigenome, and therefore genome ...function, as well as conversely the genome's facilitative effect on the epigenome, analysis of this level of regulation may increase our knowledge of disease pathogenesis.
In order to identify human-specific epigenetic influences, we have performed a novel genome-wide DNA methylation analysis comparing human, chimpanzee and rhesus macaque.
We have identified that the immunological Leukotriene B4 receptor (LTB4R, BLT1 receptor) is the most epigenetically divergent human gene in peripheral blood in comparison with other primates. This difference is due to the co-ordinated active state of human-specific hypomethylation in the promoter and human-specific increased gene body methylation. This gene is significant in innate immunity and the LTB4/LTB4R pathway is involved in the pathogenesis of the spectrum of human inflammatory diseases. This finding was confirmed by additional neutrophil-only DNA methylome and lymphoblastoid H3K4me3 chromatin comparative data. Additionally we show through functional analysis that this receptor has increased expression and a higher response to the LTB4 ligand in human versus rhesus macaque peripheral blood mononuclear cells. Genome-wide we also find human species-specific differentially methylated regions (human s-DMRs) are more prevalent in CpG island shores than within the islands themselves, and within the latter are associated with the CTCF motif.
This result further emphasises the exclusive nature of the human immunological system, its divergent adaptation even from very closely related primates, and the power of comparative epigenomics to identify and understand human uniqueness.
One in six cancers worldwide is caused by infection and human papillomavirus (HPV) is one of the main culprits. To better understand the dynamics of HPV integration and its effect on both the viral ...and host methylomes, we conducted whole-genome DNA methylation analysis using MeDIP-seq of HPV+ and HPV- head and neck squamous cell carcinoma (HNSCC). We determined the viral subtype to be HPV-16 in all cases and show that HPV-16 integrates into the host genome at multiple random sites and that this process predominantly involves the transcriptional repressor gene (E2) in the viral genome. Comparative analysis identified 453 (FDR ≤ 0.01) differentially methylated regions (DMRs) in the HPV+ host methylome. Bioinformatics characterization of these DMRs confirmed the previously reported cadherin genes to be affected but also revealed new targets for HPV-mediated methylation changes at regions not covered by array-based platforms, including the recently identified super-enhancers.
Lineage-specific, or taxonomically restricted genes (TRGs), especially those that are species and strain-specific, are of special interest because they are expected to play a role in defining ...exclusive ecological adaptations to particular niches. Despite this, they are relatively poorly studied and little understood, in large part because many are still orphans or only have homologues in very closely related isolates. This lack of homology confounds attempts to establish the likelihood that a hypothetical gene is expressed and, if so, to determine the putative function of the protein.
We have developed "QIPP" ("Quality Index for Predicted Proteins"), an index that scores the "quality" of a protein based on non-homology-based criteria. QIPP can be used to assign a value between zero and one to any protein based on comparing its features to other proteins in a given genome. We have used QIPP to rank the predicted proteins in the proteomes of Bacteria and Archaea. This ranking reveals that there is a large amount of variation in QIPP scores, and identifies many high-scoring orphans as potentially "authentic" (expressed) orphans. There are significant differences in the distributions of QIPP scores between orphan and non-orphan genes for many genomes and a trend for less well-conserved genes to have lower QIPP scores.
The implication of this work is that QIPP scores can be used to further annotate predicted proteins with information that is independent of homology. Such information can be used to prioritize candidates for further analysis. Data generated for this study can be found in the OrphanMine at http://www.genomics.ceh.ac.uk/orphan_mine.