Fungi are major ecological players in both terrestrial and aquatic environments by cycling organic matter and channelling nutrients across trophic levels. High-throughput sequencing (HTS) studies of ...fungal communities are redrawing the map of the fungal kingdom by hinting at its enormous - and largely uncharted - taxonomic and functional diversity. However, HTS approaches come with a range of pitfalls and potential biases, cautioning against unwary application and interpretation of HTS technologies and results. In this Review, we provide an overview and practical recommendations for aspects of HTS studies ranging from sampling and laboratory practices to data processing and analysis. We also discuss upcoming trends and techniques in the field and summarize recent and noteworthy results from HTS studies targeting fungal communities and guilds. Our Review highlights the need for reproducibility and public data availability in the study of fungal communities. If the associated challenges and conceptual barriers are overcome, HTS offers immense possibilities in mycology and elsewhere.
DNA sequences are increasingly seen as one of the primary information sources for species identification in many organism groups. Such approaches, popularly known as barcoding, are underpinned by the ...assumption that the reference databases used for comparison are sufficiently complete and feature correctly and informatively annotated entries.
The present study uses a large set of fungal DNA sequences from the inclusive International Nucleotide Sequence Database to show that the taxon sampling of fungi is far from complete, that about 20% of the entries may be incorrectly identified to species level, and that the majority of entries lack descriptive and up-to-date annotations.
The problems with taxonomic reliability and insufficient annotations in public DNA repositories form a tangible obstacle to sequence-based species identification, and it is manifest that the greatest challenges to biological barcoding will be of taxonomical, rather than technical, nature.
The nuclear ribosomal internal transcribed spacer (ITS) region is the most commonly chosen genetic marker for the molecular identification of fungi in environmental sequencing and molecular ecology ...studies. Several analytical issues complicate such efforts, one of which is the formation of chimeric—artificially joined—DNA sequences during PCR amplification or sequence assembly. Several software tools are currently available for chimera detection, but rely to various degrees on the presence of a chimera-free reference dataset for optimal performance. However, no such dataset is available for use with the fungal ITS region. This study introduces a comprehensive, automatically updated reference dataset for fungal ITS sequences based on the UNITE database for the molecular identification of fungi. This dataset supports chimera detection throughout the fungal kingdom and for full-length ITS sequences as well as partial (ITS1 or ITS2 only) datasets. The performance of the dataset on a large set of artificial chimeras was above 99.5%, and we subsequently used the dataset to remove nearly 1,000 compromised fungal ITS sequences from public circulation. The dataset is available at http://unite.ut.ee/repository.php and is subject to web-based third-party curation.
Here, we describe the taxon hypothesis (TH) paradigm, which covers the construction, identification, and communication of taxa as datasets. Defining taxa as datasets of individuals and their traits ...will make taxon identification and most importantly communication of taxa precise and reproducible. This will allow datasets with standardized and atomized traits to be used digitally in identification pipelines and communicated through persistent identifiers. Such datasets are particularly useful in the context of formally undescribed or even physically undiscovered species if data such as sequences from samples of environmental DNA (eDNA) are available. Implementing the TH paradigm will to some extent remove the impediment to hastily discover and formally describe all extant species in that the TH paradigm allows discovery and communication of new species and other taxa also in the absence of formal descriptions. The TH datasets can be connected to a taxonomic backbone providing access to the vast information associated with the tree of life. In parallel to the description of the TH paradigm, we demonstrate how it is implemented in the UNITE digital taxon communication system. UNITE TH datasets include rich data on individuals and their rDNA ITS sequences. These datasets are equipped with digital object identifiers (DOI) that serve to fix their identity in our communication. All datasets are also connected to a GBIF taxonomic backbone. Researchers processing their eDNA samples using UNITE datasets will, thus, be able to publish their findings as taxon occurrences in the GBIF data portal. UNITE species hypothesis (species level THs) datasets are increasingly utilized in taxon identification pipelines and even formally undescribed species can be identified and communicated by using UNITE. The TH paradigm seeks to achieve unambiguous, unique, and traceable communication of taxa and their properties at any level of the tree of life. It offers a rapid way to discover and communicate undescribed species in identification pipelines and data portals before they are lost to the sixth mass extinction.
Novel high-throughput sequencing methods outperform earlier approaches in terms of resolution and magnitude. They enable identification and relative quantification of community members and offer new ...insights into fungal community ecology. These methods are currently taking over as the primary tool to assess fungal communities of plant-associated endophytes, pathogens, and mycorrhizal symbionts, as well as free-living saprotrophs.
Taking advantage of the collective experience of six research groups, we here review the different stages involved in fungal community analysis, from field sampling via laboratory procedures to bioinformatics and data interpretation. We discuss potential pitfalls, alternatives, and solutions.
Highlighted topics are challenges involved in: obtaining representative DNA/RNA samples and replicates that encompass the targeted variation in community composition, selection of marker regions and primers, options for amplification and multiplexing, handling of sequencing errors, and taxonomic identification.
Without awareness of methodological biases, limitations of markers, and bioinformatics challenges, large-scale sequencing projects risk yielding artificial results and misleading conclusions.
Fungi strongly influence ecosystem structure and functioning, playing a key role in many ecological services as decomposers, plant mutualists and pathogens. The Mediterranean area is a biodiversity ...hotspot that is increasingly threatened by intense land use. Therefore, to achieve a balance between conservation and human development, a better understanding of the impact of land use on the underlying fungal communities is needed.
We used parallel pyrosequencing of the nuclear ribosomal its regions to characterize the fungal communities in five soils subjected to different anthropogenic impact in a typical mediterranean landscape: a natural cork-oak forest, a pasture, a managed meadow, and two vineyards. Marked differences in the distribution of taxon assemblages among the different sites and communities were found. Data analyses consistently indicated a sharp distinction of the fungal community of the cork oak forest soil from those described in the other soils. Each soil showed features of the fungal assemblages retrieved which can be easily related to the above-ground settings: ectomycorrhizal phylotypes were numerous in natural sites covered by trees, but were nearly completely missing from the anthropogenic and grass-covered sites; similarly, coprophilous fungi were common in grazed sites.
Data suggest that investigation on the below-ground fungal community may provide useful elements on the above-ground features such as vegetation coverage and agronomic procedures, allowing to assess the cost of anthropogenic land use to hidden diversity in soil. Datasets provided in this study may contribute to future searches for fungal bio-indicators as biodiversity markers of a specific site or a land-use degree.
Soil fungi and oomycetes (syn. peronosporomycetes) are the most common causes of pea diseases, and these pathogens often occur in complexes involving several species. Information on the dynamics ...within this complex of pathogens, and also between the complex of pathogens and other fungi in the development of root disease is limited. In this study, next-generation sequencing of nuclear ribosomal internal transcribed spacer-1 was used to characterize fungal communities in agricultural soils from nine pea fields, in which pea roots showed different degrees of disease. Fungal species richness, diversity, and community composition were analyzed and compared among the different pea soils. After filtering for quality and excluding non-fungal sequences, 55,460 sequences clustering into 434 operational taxonomic units (OTUs), were obtained from the nine soil samples. These sequences were found to correspond to 145–200 OTUs in each soil. The fungal communities in the nine soils were strongly dominated by Ascomycota and Basidiomycota.
Phoma,
Podospora,
Pseudaleuria, and
Veronaea, at genus level, correlated to the disease severity index of pea roots;
Phoma was most abundant in soils with diseased plants, whereas
Podospora,
Pseudaleuria, and
Veronaea were most abundant in healthy soils. No correlation was found between the disease severity index and the abundance of some of the other fungi and oomycetes normally considered as root pathogens in pea.
► NGS revealed diverse fungal communities in soils with diseased and healthy pea. ►
P. medicaginis was the most abundant species and correlated to the DSI of pea roots. ►
Phoma,
Podospora,
Pseudaleuria, and
Veronaea all correlated to the DSI of pea roots.
The International Space Station (ISS) is a unique built environment due to the effects of microgravity, space radiation, elevated carbon dioxide levels, and especially continuous human habitation. ...Understanding the composition of the ISS microbial community will facilitate further development of safety and maintenance practices. The primary goal of this study was to characterize the viable microbiome of the ISS-built environment. A second objective was to determine if the built environments of Earth-based cleanrooms associated with space exploration are an appropriate model of the ISS environment.
Samples collected from the ISS and two cleanrooms at the Jet Propulsion Laboratory (JPL, Pasadena, CA) were analyzed by traditional cultivation, adenosine triphosphate (ATP), and propidium monoazide-quantitative polymerase chain reaction (PMA-qPCR) assays to estimate viable microbial populations. The 16S rRNA gene Illumina iTag sequencing was used to elucidate microbial diversity and explore differences between ISS and cleanroom microbiomes. Statistical analyses showed that members of the phyla Actinobacteria, Firmicutes, and Proteobacteria were dominant in the samples examined but varied in abundance. Actinobacteria were predominant in the ISS samples whereas Proteobacteria, least abundant in the ISS, dominated in the cleanroom samples. The viable bacterial populations seen by PMA treatment were greatly decreased. However, the treatment did not appear to have an effect on the bacterial composition (diversity) associated with each sampling site.
The results of this study provide strong evidence that specific human skin-associated microorganisms make a substantial contribution to the ISS microbiome, which is not the case in Earth-based cleanrooms. For example, Corynebacterium and Propionibacterium (Actinobacteria) but not Staphylococcus (Firmicutes) species are dominant on the ISS in terms of viable and total bacterial community composition. The results obtained will facilitate future studies to determine how stable the ISS environment is over time. The present results also demonstrate the value of measuring viable cell diversity and population size at any sampling site. This information can be used to identify sites that can be targeted for more stringent cleaning. Finally, the results will allow comparisons with other built sites and facilitate future improvements on the ISS that will ensure astronaut health.
Summary
The nuclear ribosomal internal transcribed spacer (ITS) region is the primary choice for molecular identification of fungi. Its two highly variable spacers (ITS1 and ITS2) are usually species ...specific, whereas the intercalary 5.8S gene is highly conserved. For sequence clustering and blast searches, it is often advantageous to rely on either one of the variable spacers but not the conserved 5.8S gene. To identify and extract ITS1 and ITS2 from large taxonomic and environmental data sets is, however, often difficult, and many ITS sequences are incorrectly delimited in the public sequence databases.
We introduce ITSx, a Perl‐based software tool to extract ITS1, 5.8S and ITS2 – as well as full‐length ITS sequences – from both Sanger and high‐throughput sequencing data sets. ITSx uses hidden Markov models computed from large alignments of a total of 20 groups of eukaryotes, including fungi, metazoans and plants, and the sequence extraction is based on the predicted positions of the ribosomal genes in the sequences.
ITSx has a very high proportion of true‐positive extractions and a low proportion of false‐positive extractions. Additionally, process parallelization permits expedient analyses of very large data sets, such as a one million sequence amplicon pyrosequencing data set. ITSx is rich in features and written to be easily incorporated into automated sequence analysis pipelines.
ITSx paves the way for more sensitive blast searches and sequence clustering operations for the ITS region in eukaryotes. The software also permits elimination of non‐ITS sequences from any data set. This is particularly useful for amplicon‐based next‐generation sequencing data sets, where insidious non‐target sequences are often found among the target sequences. Such non‐target sequences are difficult to find by other means and would contribute noise to diversity estimates if left in the data set.