The NHGRI-EBI GWAS Catalog has provided data from published genome-wide association studies since 2008. In 2015, the database was redesigned and relocated to EMBL-EBI. The new infrastructure includes ...a new graphical user interface (www.ebi.ac.uk/gwas/), ontology supported search functionality and an improved curation interface. These developments have improved the data release frequency by increasing automation of curation and providing scaling improvements. The range of available Catalog data has also been extended with structured ancestry and recruitment information added for all studies. The infrastructure improvements also support scaling for larger arrays, exome and sequencing studies, allowing the Catalog to adapt to the needs of evolving study design, genotyping technologies and user needs in the future.
The accurate description of ancestry is essential to interpret, access, and integrate human genomics data, and to ensure that these benefit individuals from all ancestral backgrounds. However, there ...are no established guidelines for the representation of ancestry information. Here we describe a framework for the accurate and standardized description of sample ancestry, and validate it by application to the NHGRI-EBI GWAS Catalog. We confirm known biases and gaps in diversity, and find that African and Hispanic or Latin American ancestry populations contribute a disproportionately high number of associations. It is our hope that widespread adoption of this framework will lead to improved analysis, interpretation, and integration of human genomics data.
DNA methylation is a critical epigenetic modification that is established and maintained across the genome by DNA methyltransferase enzymes (Dnmts). Altered patterns of DNA methylation are a frequent ...occurrence in many tumor genomes, and inhibitors of Dnmts have become important epigenetic drugs. Azacitidine is a cytidine analog that is incorporated into DNA and induces the specific inhibition and proteasomal-mediated degradation of Dnmts. The downstream effects of azacitidine on CpG methylation and on gene transcription have been widely studied in many systems, but how azacitidine impacts the proteome is not well-understood. In addition, with its specific ability to induce the rapid degradation of Dnmts (in particular, the primary maintenance DNA methyltransferase, Dnmt1), it may be employed as a specific chemical knockdown for investigating the Dnmt1-associated functional or physical interactome. In this study, we use quantitative proteomics to analyze the degradation profile of proteins in the nuclear proteome of cells treated with azacitidine. We identify specific proteins as well as multiple pathways and processes that are impacted by azacitidine. The Dnmt1 interaction partner, Uhrf1, exhibits significant azacitidine-induced degradation, and this azacitidine-induced degradation is independent of the levels of Dnmt1 protein. We identify multiple other chromatin- and epigenetic-associated factors, including the bromodomain-containing transcriptional regulator, Brd2. We show that azacitidine induces highly specific perturbations of the Dnmt1-associated proteome, and while interaction partners such as Uhrf1 are sensitive to azacitidine, others such as the Dnmt1 interaction partner and stability regulator, Usp7, are not. In summary, we have conducted the first comprehensive proteomic analysis of the azacitidine-sensitive nuclear proteome, and we show how 5-azacitidine can be used as a specific probe to explore Dnmt- and chromatin-related protein networks.
DNA methyltransferase I plays the central role in maintenance of CpG DNA methylation patterns across the genome and alteration of CpG methylation patterns is a frequent and significant occurrence ...across many cancers. Cancer cells carrying hypomorphic alleles of Dnmt1 have become important tools for understanding Dnmt1 function and CpG methylation. In this study, we analyse colorectal cancer cells with a homozygous deletion of exons 3 to 5 of Dnmt1, resulting in reduced Dnmt1 activity. Although this cell model has been widely used to study the epigenome, the effects of the Dnmt1 hypomorph on cell signalling pathways and the wider proteome are largely unknown. In this study, we perform the first quantitative proteomic analysis of this important cell model and identify multiple signalling pathways and processes that are significantly dysregulated in the hypomorph cells. In Dnmt1 hypomorph cells, we observed a clear and unexpected signature of increased Epithelial-to-Mesenchymal transition (EMT) markers as well as reduced expression and sub-cellular re-localization of Beta-Catenin. Expression of wild-type Dnmt1 in hypomorph cells or knock-down of wild-type Dnmt1 did not recapitulate or rescue the observed protein profiles in Dnmt1 hypomorph cells suggesting that hypomorphic Dnmt1 causes changes not solely attributable to Dnmt1 protein levels. In summary, we present the first comprehensive proteomic analysis of the widely studied Dnmt1 hypomorph colorectal cancer cells and identify redistribution of Dnmt1 and its interaction partner Beta-Catenin.
The dysregulation of Wnt signaling is a frequent occurrence in many different cancers. Oncogenic mutations of CTNNB1/β-catenin, the key nuclear effector of canonical Wnt signaling, lead to the ...accumulation and stabilization of β-catenin protein with diverse effects in cancer cells. Although the transcriptional response to Wnt/β-catenin signaling activation has been widely studied, an integrated understanding of the effects of oncogenic β-catenin on molecular networks is lacking. We used affinity-purification mass spectrometry (AP-MS), label-free liquid chromatography–tandem mass spectrometry, and RNA-Seq to compare protein–protein interactions, protein expression, and gene expression in colorectal cancer cells expressing mutant (oncogenic) or wild-type β-catenin. We generate an integrated molecular network and use it to identify novel protein modules that are associated with mutant or wild-type β-catenin. We identify a DNA methyltransferase I associated subnetwork that is enriched in cells with mutant β-catenin and a subnetwork enriched in wild-type cells associated with the CDKN2A tumor suppressor, linking these processes to the transformation of colorectal cancer cells through oncogenic β-catenin signaling. In summary, multiomics analysis of a defined colorectal cancer cell model provides a significantly more comprehensive identification of functional molecular networks associated with oncogenic β-catenin signaling.
The acquisition of mutations that activate oncogenes or inactivate tumor suppressors is a primary feature of most cancers. Mutations that directly alter protein sequence and structure drive the ...development of tumors through aberrant expression and modification of proteins, in many cases directly impacting components of signal transduction pathways and cellular architecture. Cancer-associated mutations may have direct or indirect effects on proteins and their interactions and while the effects of mutations on signaling pathways have been widely studied, how mutations alter underlying protein-protein interaction networks is much less well understood. Systematic mapping of oncoprotein protein interactions using proteomics techniques as well as computational network analyses is revealing how oncoprotein mutations perturb protein-protein interaction networks and drive the cancer phenotype.
Data reuse is a common and vital practice in molecular biology and enables the knowledge gathered over recent decades to drive discovery and innovation in the life sciences. Much of this knowledge ...has been collated into molecular biology databases, such as UniProtKB, and these resources derive enormous value from sharing data among themselves. However, quantifying and documenting this kind of data reuse remains a challenge.
The article reports on a one-day virtual workshop hosted by the UniProt Consortium in March 2023, attended by representatives from biodata resources, experts in data management, and NIH program managers. Workshop discussions focused on strategies for tracking data reuse, best practices for reusing data, and the challenges associated with data reuse and tracking. Surveys and discussions showed that data reuse is widespread, but critical information for reproducibility is sometimes lacking. Challenges include costs of tracking data reuse, tensions between tracking data and open sharing, restrictive licenses, and difficulties in tracking commercial data use. Recommendations that emerged from the discussion include: development of standardized formats for documenting data reuse, education about the obstacles posed by restrictive licenses, and continued recognition by funding agencies that data management is a critical activity that requires dedicated resources.
Summaries of survey results are available at: https://docs.google.com/forms/d/1j-VU2ifEKb9C-sW6l3ATB79dgHdRk5v_lESv2hawnso/viewanalytics (survey of data providers) and https://docs.google.com/forms/d/18WbJFutUd7qiZoEzbOytFYXSfWFT61hVce0vjvIwIjk/viewanalytics (survey of users).
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication ...we describe enhancements made to our data processing pipeline and to our website to adapt to an ever-increasing information content. The number of sequences in UniProtKB has risen to over 227 million and we are working towards including a reference proteome for each taxonomic group. We continue to extract detailed annotations from the literature to update or create reviewed entries, while unreviewed entries are supplemented with annotations provided by automated systems using a variety of machine-learning techniques. In addition, the scientific community continues their contributions of publications and annotations to UniProt entries of their interest. Finally, we describe our new website (https://www.uniprot.org/), designed to enhance our users' experience and make our data easily accessible to the research community. This interface includes access to AlphaFold structures for more than 85% of all entries as well as improved visualisations for subcellular localisation of proteins.
Abstract
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this ...article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
Abstract
The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations ...cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO—a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations—evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)—mechanistic models of molecular “pathways” (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.