Too much to know Blair, Ann
2010, 20101130, 2010-11-02, 20100101
eBook, Book
The flood of information brought to us by advancing technology is often accompanied by a distressing sense of "information overload," yet this experience is not unique to modern times. In fact, says ...Ann M. Blair in this intriguing book, the invention of the printing press and the ensuing abundance of books provoked sixteenth- and seventeenth-century European scholars to register complaints very similar to our own. Blair examines methods of information management in ancient and medieval Europe as well as the Islamic world and China, then focuses particular attention on the organization, composition, and reception of Latin reference books in print in early modern Europe. She explores in detail the sophisticated and sometimes idiosyncratic techniques that scholars and readers developed in an era of new technology and exploding information.
Abstract Motivation With single-cell DNA methylation studies yielding vast datasets, existing data formats struggle with the unique challenges of storage and efficient operations, highlighting a need ...for improved solutions. Results BAllC (Binary All Cytosines) emerges as a tailored format for methylation data, addressing these challenges. BAllCools, its complementary software toolkit, enhances parsing, indexing, and querying capabilities, promising superior operational speeds and reduced storage needs. Availability and implementation https://github.com/jksr/ballcools
Abstract Motivation Hi-C is gaining prominence as a method for mapping genome organization. With declining sequencing costs and a growing demand for higher-resolution data, efficient tools for ...processing Hi-C datasets at different resolutions are crucial. Over the past decade, the .hic and Cooler file formats have become the de-facto standard to store interaction matrices produced by Hi-C experiments in binary format. Interoperability issues make it unnecessarily difficult to convert between the two formats and to develop applications that can process each format natively. Results We developed hictk, a toolkit that can transparently operate on .hic and .cool files with excellent performance. The toolkit is written in C++ and consists of a C++ library with Python and R bindings as well as CLI tools to perform common operations directly from the shell, including converting between .hic and .mcool formats. We benchmark the performance of hictk and compare it with other popular tools and libraries. We conclude that hictk significantly outperforms existing tools while providing the flexibility of natively working with both file formats without code duplication. Availability and implementation The hictk library, Python bindings and CLI tools are released under the MIT license as a multi-platform application available at github.com/paulsengroup/hictk. Pre-built binaries for Linux and macOS are available on bioconda. Python bindings for hictk are available on GitHub at github.com/paulsengroup/hictkpy, while R bindings are available on GitHub at github.com/paulsengroup/hictkR.
Abstract Motivation Understanding the molecular evolutionary history of organisms usually requires visual comparison of genomic regions from related species or strains. Although several applications ...already exist to achieve this task, they are either too old, too limited, or too complex for most user’s needs. Results GenoFig is a graphical application for the visualization of prokaryotic genomic regions, intended to be as easy to use as possible and flexible enough to adapt to a variety of needs. GenoFig allows the personalized representation of annotations extracted from GenBank files in a consistent way across sequences, using regular expressions. It also provides several unique options to optimize the display of homologous regions between sequences, as well as other more classical features such as sequence GC percent or GC-skew representations. In summary, GenoFig is a simple, free, and highly configurable tool to explore the evolution of specific genomic regions in prokaryotes and to produce publication-ready figures. Availability and implementation Genofig is fully available at https://forgemia.inra.fr/public-pgba/genofig under a GPL 3.0 license.
Abstract Summary Subcluster analysis is a powerful means to improve clustering and characterization of single cell RNA-Seq data. However, there are no existing tools to systematically integrate ...results from multiple subclusters, which creates hurdles for accurate data quantification, visualization, and interpretation in downstream analysis. To address this issue, we developed Ragas, an R package that integrates multi-level subclustering objects for streamlined analysis and visualization. A new data structure was implemented to seamlessly connect and assemble miscellaneous single cell analyses from different levels of subclustering, along with several new or enhanced visualization functions. Moreover, a re-projection algorithm was developed to integrate nearest-neighbor graphs from multiple subclusters in order to maximize their separability on the combined cell embeddings, which significantly improved the presentation of rare and homogeneous subpopulations. Availability and implementation The Ragas package and its documentation can be accessed through https://github.com/jig4003/Ragas and its source code is also available at https://zenodo.org/records/11244921.
Abstract Motivation Metabolomics, as an essential tool in systems biology, is now widely accessible to researchers of all levels. Yet challenges remain in data analysis and result interpretation. To ...address these challenges, we introduced MetaboReport, a versatile and interactive web app that simplifies metabolomics experiment design, data preprocessing, exploration, statistical analysis, visualization, and reporting. Results MetaboReport produces a comprehensive HTML report, including project details, an introduction, interactive plots and tables, statistical results and an in-depth explanations and interpretation of the results. MetaboReport is particularly tailored for research labs and metabolomics core facilities that provide metabolomics services, allowing them to efficiently manage and document different metabolomics projects, and effectively report the metabolomics results to users. Availability and implementation MetaboReport is freely accessible on https://metaboreport.com, with source code available on GitHub (https://github.com/YonghuiDong/MetReport). Alternatively, users can install MetaboReport as a standalone desktop app (https://metaboreport.sourceforge.io).
Today, the prediction of structures of large protein complexes solely from their sequence information requires prior knowledge of the stoichiometry of the complex. To address this challenge, we have ...enhanced the Monte Carlo Tree Search algorithms in MoLPC to enable the assembly of protein complexes while simultaneously predicting their stoichiometry.
In MoLPC2, we have improved the predictions by allowing sampling alternative AlphaFold predictions. Using MoLPC2, we accurately predicted the structures of 50 out of 175 non-redundant protein complexes (TM-score > = 0.8) without knowing the stoichiometry. MoLPC2 provides new opportunities for predicting protein complex structures without stoichiometry information.
MoLPC2 is freely available at https://github.com/hychim/molpc2. A notebook is also available from the repository for easy use.
Supplementary data are available at Bioinformatics online.
The identification of minimal genetic interventions that modulate metabolic processes constitutes one of the most relevant applications of genome-scale metabolic models (GEMs). The concept of Minimal ...Cut Sets (MCSs) and its extension at the gene level, genetic Minimal Cut Sets (gMCSs), have attracted increasing interest in the field of Systems Biology to address this task. Different computational tools have been developed to calculate MCSs and gMCSs using both commercial and open-source software.
Here, we present gMCSpy, an efficient Python package to calculate gMCSs in GEMs using both commercial and non-commercial optimization solvers. We show that gMCSpy substantially overperforms our previous computational tool GMCS, which exclusively relied on commercial software. Moreover, we compared gMCSpy with recently published competing algorithms in the literature, finding significant improvements in both accuracy and computation time. All these advances make gMCSpy an attractive tool for researchers in the field of Systems Biology for different applications in health and biotechnology.
The Python package gMCSpy can be accessed at: https://github.com/PlanesLab/gMCSpy.
Supplementary data are available at Bioinformatics online.
Abstract Motivation Complex diseases are often caused and characterized by misregulation of multiple biological pathways. Differential network analysis aims to detect significant rewiring of ...biological network structures under different conditions and has become an important tool for understanding the molecular etiology of disease progression and therapeutic response. With few exceptions, most existing differential network analysis tools perform differential tests on separately learned network structures that are computationally expensive and prone to collapse when grouped samples are limited or less consistent. Results We previously developed an accurate differential network analysis method—differential dependency networks (DDN), that enables joint learning of common and rewired network structures under different conditions. We now introduce the DDN3.0 tool that improves this framework with three new and highly efficient algorithms, namely, unbiased model estimation with a weighted error measure applicable to imbalance sample groups, multiple acceleration strategies to improve learning efficiency, and data-driven determination of proper hyperparameters. The comparative experimental results obtained from both realistic simulations and case studies show that DDN3.0 can help biologists more accurately identify, in a study-specific and often unknown conserved regulatory circuitry, a network of significantly rewired molecular players potentially responsible for phenotypic transitions. Availability and implementation The Python package of DDN3.0 is freely available at https://github.com/cbil-vt/DDN3. A user’s guide and a vignette are provided at https://ddn-30.readthedocs.io/.
Abstract Motivation Single nucleotide polymorphism (SNP) markers are increasingly popular for population genomics and inferring ancestry for individuals of unknown origin. Because large SNP datasets ...are impractical for rapid and routine analysis, diagnostics rely on panels of highly informative markers. Strategies exist for selecting these markers, however, resources for efficiently evaluating their performance are limited for non-model systems. Results snpAIMeR is a user-friendly R package that evaluates the efficacy of genomic markers for the cluster assignment of unknown individuals. It is intended to help minimize panel size and genotyping effort by determining the informativeness of candidate diagnostic markers. Provided genotype data from individuals of known origin, it uses leave-one-out cross-validation to determine population assignment rates for individual markers and marker combinations. Availability and implementation snpAIMeR is available on CRAN (https://CRAN.R-project.org/package=snpAIMeR).