Cancer cell lines are essential in cancer research, yet accurate authentication of these cell lines can be challenging, particularly for consanguineous cell lines with close genetic similarities. We ...introduce a new Cancer Cell Line Hunter (CCLHunter) method to tackle this challenge. This approach utilizes the information of single nucleotide polymorphisms, expression profiles, and kindred topology to authenticate 1389 human cancer cell lines accurately. CCLHunter can precisely and efficiently authenticate cell lines from consanguineous lineages and those derived from other tissues of the same individual. Our evaluation results indicate that CCLHunter has a complete accuracy rate of 93.27%, with an accuracy of 89.28% even for consanguineous cell lines, outperforming existing methods. Additionally, we provide convenient access to CCLHunter through standalone software and a web server at https://ngdc.cncb.ac.cn/cclhunter.
•CCLHunter utilizes SNP, expression profile, and cell line kindred to authenticate 1389 CCLs from RNA-seq data.•CCLHunter showed significant advantages in authenticating post-derived CCLs.•CCLHunter is a useful tool for CCL authentication, inspiring future research for genetic distance differentiation.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Compared with conventional comparative genomics, the recent studies in pan-genomics have provided further insights into species genomic dynamics, taxonomy and identification, pathogenicity and ...environmental adaptation. To better understand genome characteristics of species of interest and to fully excavate key metabolic and resistant genes and their conservations and variations, here we present ProPan (https://ngdc.cncb.ac.cn/propan), a public database covering 23 archaeal species and 1,481 bacterial species (in a total of 51,882 strains) for comprehensively profiling prokaryotic pan-genome dynamics. By analyzing and integrating these massive datasets, ProPan offers three major aspects for the pan-genome dynamics of the species of interest: 1) the evaluations of various species' characteristics and composition in pan-genome dynamics; 2) the visualization of map association, the functional annotation and presence/absence variation for all contained species' gene clusters; 3) the typical characteristics of the environmental adaptation, including resistance genes prediction of 126 substances (biocide, antimicrobial drug and metal) and evaluation of 31 metabolic cycle processes. Besides, ProPan develops a very user-friendly interface, flexible retrieval and multi-level real-time statistical visualization. Taken together, ProPan will serve as a weighty resource for the studies of prokaryotic pan-genome dynamics, taxonomy and identification as well as environmental adaptation.
Abstract
The National Genomics Data Center (NGDC) provides a suite of database resources to support worldwide research activities in both academia and industry. With the rapid advancements in ...higher-throughput and lower-cost sequencing technologies and accordingly the huge volume of multi-omics data generated at exponential scales and rates, NGDC is continually expanding, updating and enriching its core database resources through big data integration and value-added curation. In the past year, efforts for update have been mainly devoted to BioProject, BioSample, GSA, GWH, GVM, NONCODE, LncBook, EWAS Atlas and IC4R. Newly released resources include three human genome databases (PGG.SNV, PGG.Han and CGVD), eLMSG, EWAS Data Hub, GWAS Atlas, iSheep and PADS Arsenal. In addition, four web services, namely, eGPS Cloud, BIG Search, BIG Submission and BIG SSO, have been significantly improved and enhanced. All of these resources along with their services are publicly accessible at https://bigd.big.ac.cn.
The National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), provides a family of database resources to support global academic and industrial communities. ...With the explosive accumulation of multi-omics data generated at an unprecedented rate, CNCB-NGDC constantly expands and updates core database resources by big data archive, integrative analysis and value-added curation. In the past year, efforts have been devoted to integrating multiple omics data, synthesizing the growing knowledge, developing new resources and upgrading a set of major resources. Particularly, several database resources are newly developed for infectious diseases and microbiology (MPoxVR, KGCoV, ProPan), cancer-trait association (ASCancer Atlas, TWAS Atlas, Brain Catalog, CCAS) as well as tropical plants (TCOD). Importantly, given the global health threat caused by monkeypox virus and SARS-CoV-2, CNCB-NGDC has newly constructed the monkeypox virus resource, along with frequent updates of SARS-CoV-2 genome sequences, variants as well as haplotypes. All the resources and services are publicly accessible at https://ngdc.cncb.ac.cn.
Genotype imputation is a statistical method for estimating missing genotypes from a denser haplotype reference panel. Existing methods usually performed well on common variants, but they may not be ...ideal for low-frequency and rare variants. Previous studies showed that the population similarity between study and reference panels is one of the key factors influencing the imputation accuracy. Here, we developed an imputation reference panel reconstruction method (RefRGim) using convolutional neural networks (CNNs), which can generate a study-specified reference panel for each input data based on the genetic similarity of individuals from current study and references. The CNNs were pretrained with single nucleotide polymorphism data from the 1000 Genomes Project. Our evaluations showed that genotype imputation with RefRGim can achieve higher accuracies than original reference panel, especially for low-frequency and rare variants. RefRGim will serve as an efficient reference panel reconstruction method for genotype imputation. RefRGim is freely available via GitHub: https://github.com/shishuo16/RefRGim.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Abstract
Human endogenous retroviruses (HERVs), as remnants of ancient exogenous retrovirus infected and integrated into germ cells, comprise ∼8% of the human genome. These HERVs have been implicated ...in numerous diseases, and extensive research has been conducted to uncover their specific roles. Despite these efforts, a comprehensive source of HERV-disease association still needs to be added. To address this gap, we introduce the HervD Atlas (https://ngdc.cncb.ac.cn/hervd/), an integrated knowledgebase of HERV-disease associations manually curated from all related published literature. In the current version, HervD Atlas collects 60 726 HERV-disease associations from 254 publications (out of 4692 screened literature), covering 21 790 HERVs (21 049 HERV-Terms and 741 HERV-Elements) belonging to six types, 149 diseases and 610 related/affected genes. Notably, an interactive knowledge graph that systematically integrates all the HERV-disease associations and corresponding affected genes into a comprehensive network provides a powerful tool to uncover and deduce the complex interplay between HERVs and diseases. The HervD Atlas also features a user-friendly web interface that allows efficient browsing, searching, and downloading of all association information, research metadata, and annotation information. Overall, the HervD Atlas is an essential resource for comprehensive, up-to-date knowledge on HERV-disease research, potentially facilitating the development of novel HERV-associated diagnostic and therapeutic strategies.
Graphical Abstract
Graphical Abstract
Abstract
The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides a suite of database resources in support of worldwide research activities in both ...academia and industry. With the vast amounts of multi-omics data generated at unprecedented scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big data integration and value-added curation. Resources with significant updates in the past year include BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Science Wikis (a catalog of biological knowledge wikis for community annotations) and IC4R (Information Commons for Rice). Newly released resources include EWAS Atlas (a knowledgebase of epigenome-wide association studies), iDog (an integrated omics data resource for dog) and RNA editing resources (for editome-disease associations and plant RNA editosome, respectively). To promote biodiversity and health big data sharing around the world, the Open Biodiversity and Health Big Data (BHBD) initiative is introduced. All of these resources are publicly accessible at http://bigd.big.ac.cn.
Abstract
The National Genomics Data Center (NGDC), which is a part of the China National Center for Bioinformation (CNCB), provides a family of database resources to support the global academic and ...industrial communities. With the rapid accumulation of multi-omics data at an unprecedented pace, CNCB-NGDC continuously expands and updates core database resources through big data archiving, integrative analysis and value-added curation. Importantly, NGDC collaborates closely with major international databases and initiatives to ensure seamless data exchange and interoperability. Over the past year, significant efforts have been dedicated to integrating diverse omics data, synthesizing expanding knowledge, developing new resources, and upgrading major existing resources. Particularly, several database resources are newly developed for the biodiversity of protists (P10K), bacteria (NTM-DB, MPA) as well as plant (PPGR, SoyOmics, PlantPan) and disease/trait association (CROST, HervD Atlas, HALL, MACdb, BioKA, BioKA, RePoS, PGG.SV, NAFLDkb). All the resources and services are publicly accessible at https://ngdc.cncb.ac.cn.
Graphical Abstract
Graphical Abstract
With growing penetrations of wind power in electricity systems, the coordinated dispatch of integrated electricity and natural gas systems is becoming a popular research topic. Distributionally ...robust optimization can cope with the wind uncertainty of integrated electricity and natural gas systems by providing optimal solutions for the worst-case probability distribution. However, limited historical wind data hinder the estimation of worst-case probability distribution. As a breakthrough in artificial intelligence, generative adversarial networks can be established to approximate a complex uncertain probability distribution from raw data and generate realistic data subject to the identical distribution. This paper proposes a data-driven optimization method for economic dispatch of integrated electricity and natural gas systems with wind uncertainty, whose probability distribution is free. Based on limited historical data, the data-driven generative adversarial network generates artificial wind power data, which helps to improve the estimation of worst-case probability distribution in distributionally robust optimization. Moreover, the robustness of optimization solutions can be adjusted cost-effectively by controlling the auxiliary data number. In a case study, optimization solutions of the proposed method are shown to achieve a lower probability of chance constraint violation at a nearly negligible cost increase compared with those from four typical optimization methods.
•A rough generative adversarial network generates auxiliary wind power data.•Distributionally robust optimization is improved through generated historical data.•Historical data distribution is free and optimization is completely data-driven.•Robustness of optimization solutions can be controlled through auxiliary data number.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP