UniChem is a freely available compound identifier mapping service on the internet, designed to optimize the efficiency with which structure-based hyperlinks may be built and maintained between ...chemistry-based resources. In the past, the creation and maintenance of such links at EMBL-EBI, where several chemistry-based resources exist, has required independent efforts by each of the separate teams. These efforts were complicated by the different data models, release schedules, and differing business rules for compound normalization and identifier nomenclature that exist across the organization. UniChem, a large-scale, non-redundant database of Standard InChIs with pointers between these structures and chemical identifiers from all the separate chemistry resources, was developed as a means of efficiently sharing the maintenance overhead of creating these links. Thus, for each source represented in UniChem, all links to and from all other sources are automatically calculated and immediately available for all to use. Updated mappings are immediately available upon loading of new data releases from the sources. Web services in UniChem provide users with a single simple automatable mechanism for maintaining all links from their resource to all other sources represented in UniChem. In addition, functionality to track changes in identifier usage allows users to monitor which identifiers are current, and which are obsolete. Lastly, UniChem has been deliberately designed to allow additional resources to be included with minimal effort. Indeed, the recent inclusion of data sources external to EMBL-EBI has provided a simple means of providing users with an even wider selection of resources with which to link to, all at no extra cost, while at the same time providing a simple mechanism for external resources to link to all EMBL-EBI chemistry resources.
Assignment of function to protein sequence is a task of growing importance in the life sciences, as new high-throughput sequencing DNA technologies generate ever increasing quantities of genomic and ...meta-genomic data. Patterns within the sequence space, caused by the evolutionary conservation and assembly of protein domains, make possible the inference of function from sequence similarity. Clustering similar sequences is a useful technique for finding conserved sequences; the CluSTr database is a publicly-available database arranging proteins in a hierarchy structured by similarity. The protein classification tool InterProScan builds on this approach by applying a range of methods to detect proteins that contain signatures indicative of the presence of particular conserved domains. The use of ontologies to describe protein function provides a flexible and abstract language to classify proteins. Together, these techniques can provide an understanding of the shape of the protein space, and can be used to explore the unchartered waters of the emerging metagenomic world.
Integr8 is a new web portal for exploring the biology of organisms with completely deciphered genomes. For over 190 species, Integr8 provides access to general information, recent publications, and a ...detailed statistical overview of the genome and proteome of the organism. The preparation of this analysis is supported through Genome Reviews, a new database of bacterial and archaeal DNA sequences in which annotation has been upgraded (compared to the original submission) through the integration of data from many sources, including the EMBL Nucleotide Sequence Database, the UniProt Knowledgebase, InterPro, CluSTr, GOA and HOGENOM. Integr8 also allows the users to customize their own interactive analysis, and to download both customized and prepared datasets for their own use. Integr8 is available at http://www.ebi.ac.uk/integr8.
Open Science is a founding principle of ELIXIR, a pan‐European research infrastructure for life science data, with 21 Member countries plus the European Molecular Biology Laboratory. The mission of ...ELIXIR is to coordinate bioinformatics resources so that they form a single, integrated and pan‐European infrastructure, which can be used freely by academic and private‐sector researchers across the globe. As a recipient of public and charitable funding, ELIXIR must demonstrate its value, and the need to produce evidence in support of this is intensifying. Our practice‐led journey towards demonstrating public value is articulated around five main challenges and, for each, we present our pragmatic approach for tackling it. We begin by showing how we are working towards demystifying what research infrastructures do. We then shed light on the sort of evidence our funders and other stakeholders are asking us for, how this evidence varies in nature and scope, and our tactics to satisfy them. We follow‐on by providing our thoughts on possible barriers and solutions to embedding impact evaluation in our activities. Finally, we provide lessons learned, which we believe are sufficiently transferable and will be inspirational to other research infrastructures as they embark on their own journeys to demonstrate public value.
Using the Reactome Database Rothfels, Karen; Milacic, Marija; Matthews, Lisa ...
Current protocols,
April 2023, 2023-Apr, Letnik:
3, Številka:
4
Journal Article
The developmental and epileptic encephalopathies (DEE) are a group of rare, severe neurodevelopmental disorders, where even the most thorough sequencing studies leave 60-65% of patients without a ...molecular diagnosis. Here, we explore the incompleteness of transcript models used for exome and genome analysis as one potential explanation for a lack of current diagnoses. Therefore, we have updated the GENCODE gene annotation for 191 epilepsy-associated genes, using human brain-derived transcriptomic libraries and other data to build 3,550 putative transcript models. Our annotations increase the transcriptional 'footprint' of these genes by over 674 kb. Using
as a case study, due to its close phenotype/genotype correlation with Dravet syndrome, we screened 122 people with Dravet syndrome or a similar phenotype with a panel of exon sequences representing eight established genes and identified two de novo
variants that now - through improved gene annotation - are ascribed to residing among our exons. These two (from 122 screened people, 1.6%) molecular diagnoses carry significant clinical implications. Furthermore, we identified a previously classified
intronic Dravet syndrome-associated variant that now lies within a deeply conserved exon. Our findings illustrate the potential gains of thorough gene annotation in improving diagnostic yields for genetic disorders.
Cell-cell communication is essential for tissue development, regeneration and function, and its disruption can lead to diseases and developmental abnormalities. The revolution of single-cell genomics ...technologies offers unprecedented insights into cellular identities, opening new avenues to resolve the intricate cellular interactions present in tissue niches. CellPhoneDB is a bioinformatics toolkit designed to infer cell-cell communication by combining a curated repository of bona fide ligand-receptor interactions with a set of computational and statistical methods to integrate them with single-cell genomics data. Importantly, CellPhoneDB captures the multimeric nature of molecular complexes, thus representing cell-cell communication biology faithfully. Here we present CellPhoneDB v5, an updated version of the tool, which offers several new features. Firstly, the repository has been expanded by one-third with the addition of new interactions. These encompass interactions mediated by non-protein ligands such as endocrine hormones and GPCR ligands. Secondly, it includes a differentially expression-based methodology for more tailored interaction queries. Thirdly, it incorporates novel computational methods to prioritise specific cell-cell interactions, leveraging other single-cell modalities, such as spatial information or TF activities (i.e. CellSign module). Finally, we provide CellPhoneDBViz, a module to interactively visualise and share results amongst users. Altogether, CellPhoneDB v5 elevates the precision of cell-cell communication inference, ushering in new perspectives to comprehend tissue biology in both healthy and pathological states.