The SARS-CoV-2, a positive-sense single-stranded RNA Coronavirus, is a global threat to human health. Thus, understanding its life cycle mechanistically would be important to facilitate the design of ...antiviral drugs. A key aspect of viral progression is the synthesis of viral proteins by the ribosome of the human host. In Coronaviruses, this process is regulated by the viral 5ʹ and 3ʹ untranslated regions (UTRs), but the precise regulatory mechanism has not yet been well understood. In particular, the 5ʹ-UTR of the viral genome is most likely involved in translation initiation of viral proteins. Here, we performed inline probing and RNase V1 probing to establish a model of the secondary structure of SARS-CoV-2 5ʹ-UTR. We found that the 5ʹ-UTR contains stable structures including a very stable four-way junction close to the AUG start codon. Sequence alignment analysis of SARS-CoV-2 variants 5ʹ-UTRs revealed a highly conserved structure with few co-variations that confirmed our secondary structure model based on probing experiments.
We present the Single-Cell Clustering Assessment Framework, a method for the automated identification of putative cell types from single-cell RNA sequencing (scRNA-seq) data. By iteratively applying ...a machine learning approach to a given set of cells, we simultaneously identify distinct cell groups and a weighted list of feature genes for each group. The differentially expressed feature genes discriminate the given cell group from other cells. Each such group of cells corresponds to a putative cell type or state, characterized by the feature genes as markers. Benchmarking using expert-annotated scRNA-seq datasets shows that our method automatically identifies the 'ground truth' cell assignments with high accuracy.
Computational prediction of nucleic acid binding sites in proteins are necessary to disentangle functional mechanisms in most biological processes and to explore the binding mechanisms. Several ...strategies have been proposed, but the state-of-the-art approaches display a great diversity in i) the definition of nucleic acid binding sites; ii) the training and test datasets; iii) the algorithmic methods for the prediction strategies; iv) the performance measures and v) the distribution and availability of the prediction programs. Here we report a large-scale assessment of 19 web servers and 3 stand-alone programs on 41 datasets including more than 5000 proteins derived from 3D structures of protein-nucleic acid complexes. Well-defined binary assessment criteria (specificity, sensitivity, precision, accuracy…) are applied. We found that i) the tools have been greatly improved over the years; ii) some of the approaches suffer from theoretical defects and there is still room for sorting out the essential mechanisms of binding; iii) RNA binding and DNA binding appear to follow similar driving forces and iv) dataset bias may exist in some methods.
Single-cell transcriptomics is a versatile tool for exploring heterogeneous cell populations, but as with all genomics experiments, batch effects can hamper data integration and interpretation. The ...success of batch-effect correction is often evaluated by visual inspection of low-dimensional embeddings, which are inherently imprecise. Here we present a user-friendly, robust and sensitive k-nearest-neighbor batch-effect test (kBET; https://github.com/theislab/kBET ) for quantification of batch effects. We used kBET to assess commonly used batch-regression and normalization approaches, and to quantify the extent to which they remove batch effects while preserving biological variability. We also demonstrate the application of kBET to data from peripheral blood mononuclear cells (PBMCs) from healthy donors to distinguish cell-type-specific inter-individual variability from changes in relative proportions of cell populations. This has important implications for future data-integration efforts, central to projects such as the Human Cell Atlas.
Abstract
Motivation
Increasing numbers of large scale single cell RNA-Seq projects are leading to a data explosion, which can only be fully exploited through data integration. A number of methods ...have been developed to combine diverse datasets by removing technical batch effects, but most are computationally intensive. To overcome the challenge of enormous datasets, we have developed BBKNN, an extremely fast graph-based data integration algorithm. We illustrate the power of BBKNN on large scale mouse atlasing data, and favourably benchmark its run time against a number of competing methods.
Availability and implementation
BBKNN is available at https://github.com/Teichlab/bbknn, along with documentation and multiple example notebooks, and can be installed from pip.
Supplementary information
Supplementary data are available at Bioinformatics online.
Abstract
The growing number of available single-cell gene expression datasets from different species creates opportunities to explore evolutionary relationships between cell types across species. ...Cross-species integration of single-cell RNA-sequencing data has been particularly informative in this context. However, in order to do so robustly it is essential to have rigorous benchmarking and appropriate guidelines to ensure that integration results truly reflect biology. Here, we benchmark 28 combinations of gene homology mapping methods and data integration algorithms in a variety of biological settings. We examine the capability of each strategy to perform species-mixing of known homologous cell types and to preserve biological heterogeneity using 9 established metrics. We also develop a new biology conservation metric to address the maintenance of cell type distinguishability. Overall, scANVI, scVI and SeuratV4 methods achieve a balance between species-mixing and biology conservation. For evolutionarily distant species, including in-paralogs is beneficial. SAMap outperforms when integrating whole-body atlases between species with challenging gene homology annotation. We provide our freely available cross-species integration and assessment pipeline to help analyse new data and develop new algorithms.
LiNi0.8Co0.1Mn0.1O2 is considered as a promising cathode material for lithium ion batteries because of its high capacity and low cost. However, the LiNi0.8Co0.1Mn0.1O2 suffers structural instability ...and irreversible phase transition during charge/discharge processes, especially under high voltage, resulting in serious capacity fading and thermal runaway. Here, we propose a simple and effective method of modifying LiNi0.8Co0.1Mn0.1O2 by Mg doping. Benefiting from the pillaring effects of inactive Mg in the crystal structure, Li(Ni0.8Co0.1Mn0.1)1-xMgxO2 materials exhibit low Li+/Ni2+ cation mixing, high structural stability, and improved cyclic stability in the voltage of 3.0–4.5 V. The optimal Li(Ni0.8Co0.1Mn0.1)0.97Mg0.03O2 achieves a high capacity retention of 81% over 350 cycles at 0.5 C and exhibits enhanced thermal stability at 4.5 V. The promotion mechanism is explored systematically by a combination study of electrochemical characterizations, demonstrating the faster Li+ diffusion kinetics, higher electronic conductivity, and stronger structure due to the Mg doping. Moreover, the full cell of Li(Ni0.8Co0.1Mn0.1)0.97Mg0.03O2//mesocarbon microbeads delivers a promising energy density of 595.3 W h kg−1 at 0.5 C (based on the mass of the cathode). The present work demonstrates that moderate Mg doping is a facile yet effective strategy to modify high-performance LiNi0.8Co0.1Mn0.1O2 for high-voltage lithium ion batteries.
•LiNi0.8Co0.1Mn0.1O2 cathode material is modified by Mg doping.•The Li+/Ni2+ mixing of LiNi0.8Co0.1Mn0.1O2 material is reduced by Mg doping.•NCMMg0.03 material exhibits improved cycling retention and thermal stability.•The NCMMg0.03//MCMB full cell delivers an energy density of 595.3 W h kg−1 at 0.5 C.
RNA is a unique bio-macromolecule that can both record genetic information and perform biological functions in a variety of molecular processes, including transcription, splicing, translation, and ...even regulating protein function. RNAs adopt specific three-dimensional conformations to enable their functions. Experimental determination of high-resolution RNA structures using x-ray crystallography is both laborious and demands expertise, thus, hindering our comprehension of RNA structural biology. The computational modeling of RNA structure was a milestone in the birth of bioinformatics. Although computational modeling has been greatly improved over the last decade showing many successful cases, the accuracy of such computational modeling is not only length-dependent but also varies according to the complexity of the structure. To increase credibility, various experimental data were integrated into computational modeling. In this review, we summarize the experiments that can be integrated into RNA structure modeling as well as the computational methods based on these experimental data. We also demonstrate how computational modeling can help the experimental determination of RNA structure. We highlight the recent advances in computational modeling which can offer reliable structure models using high-throughput experimental data.
Protein side-chain conformation is closely related to their biological functions. The side-chain prediction is a key step in protein design, protein docking and structure optimization. However, ...side-chain polymorphism comprehensively exists in protein as various types and has been long overlooked by side-chain prediction. But such conformational variations have not been quantitatively studied and the correlations between these variations and residue features are vague. Here, we performed statistical analyses on large scale data sets and found that the side-chain conformational flexibility is closely related to the exposure to solvent, degree of freedom and hydrophilicity. These analyses allowed us to quantify different types of side-chain variabilities in PDB. The results underscore that protein side-chain conformation prediction is not a single-answer problem, leading us to reconsider the assessment approaches of side-chain prediction programs.
Abstract
Several single-cell RNA sequencing (scRNA-seq) studies analyzing immune response to COVID-19 infection have been recently published. Most of these studies have small sample sizes, which ...limits the conclusions that can be made with high confidence. By re-analyzing these data in a standardized manner, we validated 8 of the 20 published results across multiple datasets. In particular, we found a consistent decrease in T-cells with increasing COVID-19 infection severity, upregulation of type I Interferon signal pathways, presence of expanded B-cell clones in COVID-19 patients but no consistent trend in T-cell clonal expansion. Overall, our results show that the conclusions drawn from scRNA-seq data analysis of small cohorts of COVID-19 patients need to be treated with some caution.