MicroRNAs (miRNAs) are small non-coding RNAs that are involved in post-transcriptional regulation of gene expression. In this high-throughput sequencing era, a tremendous amount of RNA-seq data is ...accumulating, and full utilization of publicly available miRNA data is an important challenge. These data are useful to determine expression values for each miRNA, but quantification pipelines are in a primitive stage and still evolving; there are many factors that affect expression values significantly.
We used 304 high-quality microRNA sequencing (miRNA-seq) datasets from NCBI-SRA and calculated expression profiles for different tissues and cell-lines. In each miRNA-seq dataset, we found an average of more than 500 miRNAs with higher than 5x coverage, and we explored the top five highly expressed miRNAs in each tissue and cell-line. This user-friendly miRmine database has options to retrieve expression profiles of single or multiple miRNAs for a specific tissue or cell-line, either normal or with disease information. Results can be displayed in multiple interactive, graphical and downloadable formats.
http://guanlab.ccmb.med.umich.edu/mirmine.
bharatpa@umich.edu.
Supplementary data are available at Bioinformatics online.
We celebrate the 10th anniversary of the launch of the HUPO Human Proteome Project (HPP) and its major milestone of confident detection of at least one protein from each of 90% of the predicted ...protein-coding genes, based on the output of the entire proteomics community. The Human Genome Project reached a similar decadal milestone 20 years ago. The HPP has engaged proteomics teams around the world, strongly influenced data-sharing, enhanced quality assurance, and issued stringent guidelines for claims of detecting previously “missing proteins.” This invited perspective complements papers on “A High-Stringency Blueprint of the Human Proteome” and “The Human Proteome Reaches a Major Milestone” in special issues of Nature Communications and Journal of Proteome Research, respectively, released in conjunction with the October 2020 virtual HUPO Congress and its celebration of the 10th anniversary of the HUPO HPP.
Display omitted
•The global Human Proteome Project is the flagship activity of the HUPO.•HPP Guidelines for MS Data have greatly enhanced confidence in proteomics data.•The community has identified proteins from 90% of predicted protein-coding genes.•A total of 1899 predicted proteins lack sufficient evidence of expression as of 2020.
Starting from several organ-oriented projects, HUPO in 2010 launched the Human Proteome Project to identify and characterize the protein parts list and integrate proteomics into multiomics research. Key steps were partnerships with neXtProt, PRIDE, PeptideAtlas, Human Protein Atlas, and instrument makers; global engagement of researchers; creation of ProteomeXchange; adoption of HPP Guidelines for Interpretation of MS Data and SRMAtlas for proteotypic peptides; annual metrics of finding “missing proteins” and functionally annotating proteins; and initiatives for early career scientists.
Human blood plasma provides a highly accessible window to the proteome of any individual in health and disease. Since its inception in 2002, the Human Proteome Organization’s Human Plasma Proteome ...Project (HPPP) has been promoting advances in the study and understanding of the full protein complement of human plasma and on determining the abundance and modifications of its components. In 2017, we review the history of the HPPP and the advances of human plasma proteomics in general, including several recent achievements. We then present the latest 2017-04 build of Human Plasma PeptideAtlas, which yields ∼43 million peptide-spectrum matches and 122,730 distinct peptide sequences from 178 individual experiments at a 1% protein-level FDR globally across all experiments. Applying the latest Human Proteome Project Data Interpretation Guidelines, we catalog 3509 proteins that have at least two non-nested uniquely mapping peptides of nine amino acids or more and >1300 additional proteins with ambiguous evidence. We apply the same two-peptide guideline to historical PeptideAtlas builds going back to 2006 and examine the progress made in the past ten years in plasma proteome coverage. We also compare the distribution of proteins in historical PeptideAtlas builds in various RNA abundance and cellular localization categories. We then discuss advances in plasma proteomics based on targeted mass spectrometry as well as affinity assays, which during early 2017 target ∼2000 proteins. Finally, we describe considerations about sample handling and study design, concluding with an outlook for future advances in deciphering the human plasma proteome.
Depleted gut microbiome α-diversity is associated with several human diseases, but the extent to which this is reflected in the host molecular phenotype is poorly understood. We attempted to predict ...gut microbiome α-diversity from ~1,000 blood analytes (laboratory tests, proteomics and metabolomics) in a cohort enrolled in a consumer wellness program (N = 399). Although 77 standard clinical laboratory tests and 263 plasma proteins could not accurately predict gut α-diversity, we found that 45% of the variance in α-diversity was explained by a subset of 40 plasma metabolites (13 of the 40 of microbial origin). The prediction capacity of these 40 metabolites was confirmed in a separate validation cohort (N = 540) and across disease states, showing that our findings are robust. Several of the metabolite biomarkers that are reported here are linked with cardiovascular disease, diabetes and kidney function. Associations between host metabolites and gut microbiome α-diversity were modified in those with extreme obesity (body mass index ≥ 35), suggesting metabolic perturbation. The ability of the blood metabolome to predict gut microbiome α-diversity could pave the way to the development of clinical tests for monitoring gut microbial health.
Despite the immense progress recently witnessed in protein structure prediction, the modeling accuracy for proteins that lack sequence and/or structure homologs remains to be improved. We developed ...an open-source program, DeepFold, which integrates spatial restraints predicted by multi-task deep residual neural-networks along with a knowledge-based energy function to guide its gradient-descent folding simulations. The results on large-scale benchmark tests showed that DeepFold creates full-length models with accuracy significantly beyond classical folding approaches and other leading deep learning methods. Of particular interest is the modeling performance on the most difficult targets with very few homologous sequences, where DeepFold achieved an average TM-score that was 40.3% higher than trRosetta and 44.9% higher than DMPfold. Furthermore, the folding simulations for DeepFold were 262 times faster than traditional fragment assembly simulations. These results demonstrate the power of accurately predicted deep learning potentials to improve both the accuracy and speed of ab initio protein structure prediction.
Every data-rich community research effort requires a clear plan for ensuring the quality of the data interpretation and comparability of analyses. To address this need within the Human Proteome ...Project (HPP) of the Human Proteome Organization (HUPO), we have developed through broad consultation a set of mass spectrometry data interpretation guidelines that should be applied to all HPP data contributions. For submission of manuscripts reporting HPP protein identification results, the guidelines are presented as a one-page checklist containing 15 essential points followed by two pages of expanded description of each. Here we present an overview of the guidelines and provide an in-depth description of each of the 15 elements to facilitate understanding of the intentions and rationale behind the guidelines, for both authors and reviewers. Broadly, these guidelines provide specific directions regarding how HPP data are to be submitted to mass spectrometry data repositories, how error analysis should be presented, and how detection of novel proteins should be supported with additional confirmatory evidence. These guidelines, developed by the HPP community, are presented to the broader scientific community for further discussion.
The Human Proteome Project is a major, comprehensive initiative of the Human Proteome Organization. This global collaborative effort aims to identify and characterize at least one protein product and ...many PTM, SAP, and splice variant isoforms from the 20,300 human protein-coding genes. The deliverables are an extensive parts list and an array of technology platforms, reagents, spectral libraries, and linked knowledge bases that advance the field and facilitate the use of proteomics by a much wider community of life scientists. Such enablement will help address the Grand Challenge of using proteomics to bridge major gaps between evidence of genomic variation and diverse phenotypes.
The HUPO Human Proteome Project (HPP) has made an outstanding launch, including a special issue of the Journal of Proteome Research on the Chromosome-centric HPP with a total of 48 articles.
This article is part of a Special Issue entitled: Can Proteomics Fill the Gap Between Genomics and Phenotypes?
Display omitted
•The global Human Proteome Project (HPP) aims to characterize the proteins from all protein-coding genes.•The Chromosome-centric HPP component (C-HPP) has 24 chromosome teams (plus one for mitochondria).•The Biology and Disease-driven HPP (B/D-HPP) now has 16 project teams, enabling a broad range of research.•The HPP baseline Master Table has ~13,000 confidently-identified proteins.•The Journal of Proteome Research published a 2013 C-HPP special issue with a total of 48 articles.
Despite considerable research progress on SARS-CoV-2, the direct zoonotic origin (intermediate host) of the virus remains ambiguous. The most definitive approach to identify the intermediate host ...would be the detection of SARS-CoV-2-like coronaviruses in wild animals. However, due to the high number of animal species, it is not feasible to screen all the species in the laboratory. Given that binding to ACE2 proteins is the first step for the coronaviruses to invade host cells, we propose a computational pipeline to identify potential intermediate hosts of SARS-CoV-2 by modeling the binding affinity between the Spike receptor-binding domain (RBD) and host ACE2. Using this pipeline, we systematically examined 285 ACE2 variants from mammals, birds, fish, reptiles, and amphibians, and found that the binding energies calculated for the modeled Spike-RBD/ACE2 complex structures correlated closely with the effectiveness of animal infection as determined by multiple experimental data sets. Built on the optimized binding affinity cutoff, we suggest a set of 96 mammals, including 48 experimentally investigated ones, which are permissive to SARS-CoV-2, with candidates from primates, rodents, and carnivores at the highest risk of infection. Overall, this work not only suggests a limited range of potential intermediate SARS-CoV-2 hosts for further experimental investigation, but also, more importantly, it proposes a new structure-based approach to general zoonotic origin and susceptibility analyses that are critical for human infectious disease control and wildlife protection.