Abstract
WEGO (Web Gene Ontology Annotation Plot), created in 2006, is a simple but useful tool for visualizing, comparing and plotting GO (Gene Ontology) annotation results. Owing largely to the ...rapid development of high-throughput sequencing and the increasing acceptance of GO, WEGO has benefitted from outstanding performance regarding the number of users and citations in recent years, which motivated us to update to version 2.0. WEGO uses the GO annotation results as input. Based on GO's standardized DAG (Directed Acyclic Graph) structured vocabulary system, the number of genes corresponding to each GO ID is calculated and shown in a graphical format. WEGO 2.0 updates have targeted four aspects, aiming to provide a more efficient and up-to-date approach for comparative genomic analyses. First, the number of input files, previously limited to three, is now unlimited, allowing WEGO to analyze multiple datasets. Also added in this version are the reference datasets of nine model species that can be adopted as baselines in genomic comparative analyses. Furthermore, in the analyzing processes each Chi-square test is carried out for multiple datasets instead of every two samples. At last, WEGO 2.0 provides an additional output graph along with the traditional WEGO histogram, displaying the sorted P-values of GO terms and indicating their significant differences. At the same time, WEGO 2.0 features an entirely new user interface. WEGO is available for free at http://wego.genomics.org.cn.
Quality control (QC) and preprocessing are essential steps for sequencing data analysis to ensure the accuracy of results. However, existing tools cannot provide a satisfying solution with integrated ...comprehensive functions, proper architectures, and highly scalable acceleration. In this article, we demonstrate SOAPnuke as a tool with abundant functions for a "QC-Preprocess-QC" workflow and MapReduce acceleration framework. Four modules with different preprocessing functions are designed for processing datasets from genomic, small RNA, Digital Gene Expression, and metagenomic experiments, respectively. As a workflow-like tool, SOAPnuke centralizes processing functions into 1 executable and predefines their order to avoid the necessity of reformatting different files when switching tools. Furthermore, the MapReduce framework enables large scalability to distribute all the processing works to an entire compute cluster.We conducted a benchmarking where SOAPnuke and other tools are used to preprocess a ∼30× NA12878 dataset published by GIAB. The standalone operation of SOAPnuke struck a balance between resource occupancy and performance. When accelerated on 16 working nodes with MapReduce, SOAPnuke achieved ∼5.7 times the fastest speed of other tools.
Rice is a staple crop that has undergone substantial phenotypic and physiological changes during domestication. Here we resequenced the genomes of 40 cultivated accessions selected from the major ...groups of rice and 10 accessions of their wild progenitors (Oryza rufipogon and Oryza nivara) to >15 × raw data coverage. We investigated genome-wide variation patterns in rice and obtained 6.5 million high-quality single nucleotide polymorphisms (SNPs) after excluding sites with missing data in any accession. Using these population SNP data, we identified thousands of genes with significantly lower diversity in cultivated but not wild rice, which represent candidate regions selected during domestication. Some of these variants are associated with important biological features, whereas others have yet to be functionally characterized. The molecular markers we have identified should be valuable for breeding and for identifying agronomically important genes in rice.
The plant cell wall exhibits preferential sites for the accumulation of metals at toxic concentrations. Through modification of wall polysaccharide components, elements, such as silicon (Si) and zinc ...(Zn), may play active roles in alleviating the toxicity of heavy metals, including cadmium (Cd). However, enhanced tolerance for Cd stress may rely on synergistic effects between nutrient elements. Here, we cultured Si-accumulating suspension cells of rice (Oryza sativa) exposed to Cd and Zn treatments, either separately or in combination, and investigated cells using noninvasive microtest technology (NMT), inductively coupled plasma mass spectroscopy (ICP-MS) and atomic force microscopy (AFM). We found that Zn alleviated Cd toxicity in the presence of Si in the cell walls by binding of Zn²⁺ to ligands through the formation of the Si-hemicellulose matrixZn complexes and co-precipitates to greatly inhibit Cd²⁺ uptake into cells. This, in turn, induced the lower expression of Cd-related transporters. This synergistic effect could be decisive for the survival of cells under conditions of high Cd concentrations.
Oesophageal cancer is one of the most aggressive cancers and is the sixth leading cause of cancer death worldwide. Approximately 70% of global oesophageal cancer cases occur in China, with ...oesophageal squamous cell carcinoma (ESCC) being the histopathological form in the vast majority of cases (>90%). Currently, there are limited clinical approaches for the early diagnosis and treatment of ESCC, resulting in a 10% five-year survival rate for patients. However, the full repertoire of genomic events leading to the pathogenesis of ESCC remains unclear. Here we describe a comprehensive genomic analysis of 158 ESCC cases, as part of the International Cancer Genome Consortium research project. We conducted whole-genome sequencing in 17 ESCC cases and whole-exome sequencing in 71 cases, of which 53 cases, plus an additional 70 ESCC cases not used in the whole-genome and whole-exome sequencing, were subjected to array comparative genomic hybridization analysis. We identified eight significantly mutated genes, of which six are well known tumour-associated genes (TP53, RB1, CDKN2A, PIK3CA, NOTCH1, NFE2L2), and two have not previously been described in ESCC (ADAM29 and FAM135B). Notably, FAM135B is identified as a novel cancer-implicated gene as assayed for its ability to promote malignancy of ESCC cells. Additionally, MIR548K, a microRNA encoded in the amplified 11q13.3-13.4 region, is characterized as a novel oncogene, and functional assays demonstrate that MIR548K enhances malignant phenotypes of ESCC cells. Moreover, we have found that several important histone regulator genes (MLL2 (also called KMT2D), ASH1L, MLL3 (KMT2C), SETD1B, CREBBP and EP300) are frequently altered in ESCC. Pathway assessment reveals that somatic aberrations are mainly involved in the Wnt, cell cycle and Notch pathways. Genomic analyses suggest that ESCC and head and neck squamous cell carcinoma share some common pathogenic mechanisms, and ESCC development is associated with alcohol drinking. This study has explored novel biological markers and tumorigenic pathways that would greatly improve therapeutic strategies for ESCC.
Global disparities in prostate cancer (PCa) incidence highlight the urgent need to identify genomic abnormalities in prostate tumors in different ethnic populations including Asian men.
To ...systematically explore the genomic complexity and define disease-driven genetic alterations in PCa.
The study sequenced whole-genome and transcriptome of tumor-benign paired tissues from 65 treatment-naive Chinese PCa patients. Subsequent targeted deep sequencing of 293 PCa-relevant genes was performed in another cohort of 145 prostate tumors.
The genomic alteration landscape in PCa was analyzed using an integrated computational pipeline. Relationships with PCa progression and survival were analyzed using nonparametric test, log-rank, and multivariable Cox regression analyses.
We demonstrated an association of high frequency of CHD1 deletion with a low rate of TMPRSS2-ERG fusion and relatively high percentage of mutations in androgen receptor upstream activator genes in Chinese patients. We identified five putative clustered deleted tumor suppressor genes and provided experimental and clinical evidence that PCDH9, deleted/loss in approximately 23% of tumors, functions as a novel tumor suppressor gene with prognostic potential in PCa. Furthermore, axon guidance pathway genes were frequently deregulated, including gain/amplification of PLXNA1 gene in approximately 17% of tumors. Functional and clinical data analyses showed that increased expression of PLXNA1 promoted prostate tumor growth and independently predicted prostate tumor biochemical recurrence, metastasis, and poor survival in multi-institutional cohorts of patients with PCa. A limitation of this study is that other genetic alterations were not experimentally investigated.
There are shared and salient genetic characteristics of PCa in Chinese and Caucasian men. Novel genetic alterations in PCDH9 and PLXNA1 were associated with disease progression.
We reported the first large-scale and comprehensive genomic data of prostate cancer from Asian population. Identification of these genetic alterations may help advance prostate cancer diagnosis, prognosis, and treatment.
We presented the first comprehensive genetic alteration landscape of prostate cancer in Chinese men and identify novel genes and progression pathways that may help advance prostate cancer diagnosis, prognosis, and personalized medicine.
Although apoferritin has been widely utilized as a new class of natural protein nanovehicles for encapsulation and delivery of nutraceuticals, its ability to remove metal heavy ions has yet to be ...explored. In this study, for the first time, we demonstrated that the ferritin from kuruma prawns (Marsupenaeus japonicus), named MjF, has a pronouncedly larger ability to resist denaturation induced by Cd2+ and Hg2+ as compared to its analogue, human H-chain ferritin (HuHF), despite the fact that these two proteins share a high similarity in protein structure. Treatment of HuHF with Cd2+ or Hg2+ at a metal ion/protein shell ratio of 100/1 resulted in marked protein aggregation, while the MjF solution was kept constantly clear upon treatment with Cd2+ and Hg2+ at different protein shell/metal ion ratios (50/1, 100/1, 250/1, 500/1, 1000/1, and 2500/1). Structural comparison analyses in conjunction with the newly solved crystal structure of the complex of MjF plus Cd2+ or Hg2+ revealed that cysteine (Cys) is a major residue responsible for such binding, and that the large difference in the ability to resist denaturation induced by these two heavy metal ions between MjF and HuHF is mainly derived from the different positions of Cys residues in these two proteins; namely, Cys residues in HuHF are located on the outer surface, while Cys residues from MjF are buried within the protein shell. All of these findings raise the high possibility that prawn ferritin, as a food-derived protein, could be developed into a novel bio-template to remove heavy metal ions from contaminated food systems.
Understanding the dynamics of eukaryotic transcriptome is essential for studying the complexity of transcriptional regulation and its impact on phenotype. However, comprehensive studies of ...transcriptomes at single base resolution are rare, even for modern organisms, and lacking for rice. Here, we present the first transcriptome atlas for eight organs of cultivated rice. Using high-throughput paired-end RNA-seq, we unambiguously detected transcripts expressing at an extremely low level, as well as a substantial number of novel transcripts, exons, and untranslated regions. An analysis of alternative splicing in the rice transcriptome revealed that alternative cis-splicing occurred in approximately 33% of all rice genes. This is far more than previously reported. In addition, we also identified 234 putative chimeric transcripts that seem to be produced by trans-splicing, indicating that transcript fusion events are more common than expected. In-depth analysis revealed a multitude of fusion transcripts that might be by-products of alternative splicing. Validation and chimeric transcript structural analysis provided evidence that some of these transcripts are likely to be functional in the cell. Taken together, our data provide extensive evidence that transcriptional regulation in rice is vastly more complex than previously believed.
Tumor heterogeneity presents a challenge for inferring clonal evolution and driver gene identification. Here, we describe a method for analyzing the cancer genome at a single-cell nucleotide level. ...To perform our analyses, we first devised and validated a high-throughput whole-genome single-cell sequencing method using two lymphoblastoid cell line single cells. We then carried out whole-exome single-cell sequencing of 90 cells from a JAK2-negative myeloproliferative neoplasm patient. The sequencing data from 58 cells passed our quality control criteria, and these data indicated that this neoplasm represented a monoclonal evolution. We further identified essential thrombocythemia (ET)-related candidate mutations such as SESN2 and NTRK1, which may be involved in neoplasm progression. This pilot study allowed the initial characterization of the disease-related genetic architecture at the single-cell nucleotide level. Further, we established a single-cell sequencing method that opens the way for detailed analyses of a variety of tumor types, including those with high genetic complex between patients.
Display omitted
► Building a new whole-genome SCS of high genome coverage, sensitivity, and specificity ► Whole-exome SCS of a typical JAK2-negative myeloproliferative neoplasm patient ► Depicting intratumoral genetics of the neoplasm at a single-cell nucleotide level ► Provision of evidence for monoclonal evolution of the neoplasm
A new high-throughput method based on non-PCR amplification allows whole-exome sequencing of single cells at the nucleotide level. Sequencing of 90 individual tumor cells from a JAK2-negative myeloproliferative neoplasm provides evidence for monoclonal evolution of the cancer.
Biological age (BA) has been proposed to evaluate the aging status instead of chronological age (CA). Our study shows evidence that there might be multiple “clocks” within the whole-body system: ...systemic aging drivers/clocks overlaid with organ/tissue-specific counterparts. We utilize multi-omics data, including clinical tests, immune repertoire, targeted metabolomic molecules, gut microbiomes, physical fitness examinations, and facial skin examinations, to estimate the BA of different organs (e.g., liver, kidney) and systems (immune and metabolic system). The aging rates of organs/systems are diverse. People’s aging patterns are different. We also demonstrate several applications of organs/systems BA in two independent datasets. Mortality predictions are compared among organs' BA in the dataset of the United States National Health and Nutrition Examination Survey. Polygenic risk score of BAs constructed in the Chinese Longitudinal Healthy Longevity Survey cohort can predict the possibility of becoming centenarian.
Display omitted
•Constructing biological ages of organs/systems using multi-omics features•Organs and systems are aging at different rates•Specific biological age could predict disease of corresponding organs•Biological ages of organs and systems have diverse genetic architectures
Nie et al. estimate biological ages of organs and systems using 402 multi-omics features from 4,066 individuals and demonstrate several applications. They find that organs and systems are aging at different rates, and biological ages could be utilized for population stratification, mortality prediction, and phenotypes of genetic association studies.