Abstract
Motivation
In ancient DNA research, the authentication of ancient samples based on specific features remains a crucial step in data analysis. Because of this central importance, researchers ...lacking deeper programming knowledge should be able to run a basic damage authentication analysis. Such software should be user-friendly and easy to integrate into an analysis pipeline.
Results
DamageProfiler is a Java-based, stand-alone software to determine damage patterns in ancient DNA. The results are provided in various file formats and plots for further processing. DamageProfiler has an intuitive graphical as well as command line interface that allows the tool to be easily embedded into an analysis pipeline.
Availability and implementation
All of the source code is freely available on GitHub (https://github.com/Integrative-Transcriptomics/DamageProfiler).
Supplementary information
Supplementary data are available at Bioinformatics online.
One of the major methods to identify microbial community composition, to unravel microbial population dynamics, and to explore microbial diversity in environmental samples is high-throughput DNA- or ...RNA-based 16S rRNA (gene) amplicon sequencing in combination with bioinformatics analyses. However, focusing on environmental samples from contrasting habitats, it was not systematically evaluated (i) which analysis methods provide results that reflect reality most accurately, (ii) how the interpretations of microbial community studies are biased by different analysis methods and (iii) if the most optimal analysis workflow can be implemented in an easy-to-use pipeline. Here, we compared the performance of 16S rRNA (gene) amplicon sequencing analysis tools (i.e., Mothur, QIIME1, QIIME2, and MEGAN) using three mock datasets with known microbial community composition that differed in sequencing quality, species number and abundance distribution (i.e., even or uneven), and phylogenetic diversity (i.e., closely related or well-separated amplicon sequences). Our results showed that QIIME2 outcompeted all other investigated tools in sequence recovery (>10 times fewer false positives), taxonomic assignments (>22% better F-score) and diversity estimates (>5% better assessment), suggesting that this approach is able to reflect the in situ microbial community most accurately. Further analysis of 24 environmental datasets obtained from four contrasting terrestrial and freshwater sites revealed dramatic differences in the resulting microbial community composition for all pipelines at genus level. For instance, at the investigated river water sites Sphaerotilus was only reported when using QIIME1 (8% abundance) and Agitococcus with QIIME1 or QIIME2 (2 or 3% abundance, respectively), but both genera remained undetected when analyzed with Mothur or MEGAN. Since these abundant taxa probably have implications for important biogeochemical cycles (e.g., nitrate and sulfate reduction) at these sites, their detection and semi-quantitative enumeration is crucial for valid interpretations. A high-performance computing conformant workflow was constructed to allow FAIR (Findable, Accessible, Interoperable, and Re-usable) 16S rRNA (gene) amplicon sequence analysis starting from raw sequence files, using the most optimal methods identified in our study. Our presented workflow should be considered for future studies, thereby facilitating the analysis of high-throughput 16S rRNA (gene) sequencing data substantially, while maximizing reliability and confidence in microbial community data analysis.
Egypt, located on the isthmus of Africa, is an ideal region to study historical population dynamics due to its geographic location and documented interactions with ancient civilizations in Africa, ...Asia and Europe. Particularly, in the first millennium BCE Egypt endured foreign domination leading to growing numbers of foreigners living within its borders possibly contributing genetically to the local population. Here we present 90 mitochondrial genomes as well as genome-wide data sets from three individuals obtained from Egyptian mummies. The samples recovered from Middle Egypt span around 1,300 years of ancient Egyptian history from the New Kingdom to the Roman Period. Our analyses reveal that ancient Egyptians shared more ancestry with Near Easterners than present-day Egyptians, who received additional sub-Saharan admixture in more recent times. This analysis establishes ancient Egyptian mummies as a genetic source to study ancient human history and offers the perspective of deciphering Egypt's past at a genome-wide level.
The broadening utilisation of ancient DNA to address archaeological, palaeontological, and biological questions is resulting in a rising diversity in the size of laboratories and scale of analyses ...being performed. In the context of this heterogeneous landscape, we present an advanced, and entirely redesigned and extended version of the EAGER pipeline for the analysis of ancient genomic data. This Nextflow pipeline aims to address three main themes: accessibility and adaptability to different computing configurations, reproducibility to ensure robust analytical standards, and updating the pipeline to the latest routine ancient genomic practices. The new version of EAGER has been developed within the nf-core initiative to ensure high-quality software development and maintenance support; contributing to a long-term life-cycle for the pipeline. nf-core/eager will assist in ensuring that a wider range of ancient DNA analyses can be applied by a diverse range of research groups and fields.
Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is a valuable experimental tool to study the immune state in health and following immune challenges such as infectious diseases, (auto)immune ...diseases, and cancer. Several tools have been developed to reconstruct B cell and T cell receptor sequences from AIRR-seq data and infer B and T cell clonal relationships. However, currently available tools offer limited parallelization across samples, scalability or portability to high-performance computing infrastructures. To address this need, we developed nf-core/airrflow, an end-to-end bulk and single-cell AIRR-seq processing workflow which integrates the Immcantation Framework following BCR and TCR sequencing data analysis best practices. The Immcantation Framework is a comprehensive toolset, which allows the processing of bulk and single-cell AIRR-seq data from raw read processing to clonal inference. nf-core/airrflow is written in Nextflow and is part of the nf-core project, which collects community contributed and curated Nextflow workflows for a wide variety of analysis tasks. We assessed the performance of nf-core/airrflow on simulated sequencing data with sequencing errors and show example results with real datasets. To demonstrate the applicability of nf-core/airrflow to the high-throughput processing of large AIRR-seq datasets, we validated and extended previously reported findings of convergent antibody responses to SARS-CoV-2 by analyzing 97 COVID-19 infected individuals and 99 healthy controls, including a mixture of bulk and single-cell sequencing datasets. Using this dataset, we extended the convergence findings to 20 additional subjects, highlighting the applicability of nf-core/airrflow to validate findings in small in-house cohorts with reanalysis of large publicly available AIRR datasets.
Life scientists are increasingly turning to high-throughput sequencing technologies in their research programs, owing to the enormous potential of these methods. In a parallel manner, the number of ...core facilities that provide bioinformatics support are also increasing. Notably, the generation of complex large datasets has necessitated the development of bioinformatics support core facilities that aid laboratory scientists with cost-effective and efficient data management, analysis, and interpretation. In this article, we address the challenges-related to communication, good laboratory practice, and data handling-that may be encountered in core support facilities when providing bioinformatics support, drawing on our own experiences working as support bioinformaticians on multidisciplinary research projects. Most importantly, the article proposes a list of guidelines that outline how these challenges can be preemptively avoided and effectively managed to increase the value of outputs to the end user, covering the entire research project lifecycle, including experimental design, data analysis, and management (i.e., sharing and storage). In addition, we highlight the importance of clear and transparent communication, comprehensive preparation, appropriate handling of samples and data using monitoring systems, and the employment of appropriate tools and standard operating procedures to provide effective bioinformatics support.
Gallbladder cancer is associated with a dismal prognosis, and accurate in vivo models will be elemental to improve our understanding of this deadly disease and develop better treatment options. We ...have generated a transplantation-based murine model for gallbladder cancer that histologically mimics the human disease, including the development of distant metastasis. Murine gallbladder-derived organoids are genetically modified by either retroviral transduction or transfection with CRISPR/Cas9 encoding plasmids, thereby allowing the rapid generation of complex cancer genotypes. We characterize the model in the presence of two of the most frequent oncogenic drivers-Kras and ERBB2-and provide evidence that the tumor histology is highly dependent on the driver oncogene. Further, we demonstrate the utility of the model for the preclinical assessment of novel therapeutic approaches by showing that liposomal Irinotecan (Nal-IRI) is retained in tumor cells and significantly prolongs the survival of gallbladder cancer-bearing mice compared to conventional irinotecan.
The automated reconstruction of genome sequences in ancient genome analysis is a multifaceted process.
Here we introduce EAGER, a time-efficient pipeline, which greatly simplifies the analysis of ...large-scale genomic data sets. EAGER provides features to preprocess, map, authenticate, and assess the quality of ancient DNA samples. Additionally, EAGER comprises tools to genotype samples to discover, filter, and analyze variants.
EAGER encompasses both state-of-the-art tools for each step as well as new complementary tools tailored for ancient DNA data within a single integrated solution in an easily accessible format.
The NanoString™ nCounter® technology platform is a widely used targeted quantification platform for the analysis of gene expression of up to ∼800 genes. Whereas the software tools by the manufacturer ...can perform the analysis in an interactive and GUI driven approach, there is no portable and user-friendly workflow available that can be used to perform reproducible analysis of multiple samples simultaneously in a scalable fashion on different computing infrastructures.
Here, we present the nf-core/nanostring open-source pipeline to perform a comprehensive analysis including quality control and additional features such as expression visualization, annotation with additional metadata and input creation for differential gene expression analysis. The workflow features an easy installation, comprehensive documentation, open-source code with the possibility for further extensions, a strong portability across multiple computing environments and detailed quality metrics reporting covering all parts of the pipeline. nf-core/nanostring has been implemented in the Nextflow workflow language and supports Docker, Singularity, Podman container technologies as well as Conda environments, enabling easy deployment on any Nextflow supported compatible system, including most widely used cloud computing environments such as Google GCP or Amazon AWS.
The source code, documentation and installation instructions as well as results for continuous tests are freely available at https://github.com/nf-core/nanostring and https://nf-co.re/nanostring.