Species are considered the fundamental unit in many ecological and evolutionary analyses, yet accurate, complete, accessible taxonomic frameworks with which to identify them are often unavailable to ...researchers. In such cases DNA sequence-based species delimitation has been proposed as a means of estimating species boundaries for further analysis. Several methods have been proposed to accomplish this. Here we present a Bayesian implementation of an evolutionary model-based method, the general mixed Yule-coalescent model (GMYC). Our implementation integrates over the parameters of the model and uncertainty in phylogenetic relationships using the output of widely available phylogenetic models and Markov-Chain Monte Carlo (MCMC) simulation in order to produce marginal probabilities of species identities.
We conducted simulations testing the effects of species evolutionary history, levels of intraspecific sampling and number of nucleotides sequenced. We also re-analyze the dataset used to introduce the original GMYC model. We found that the model results are improved with addition of DNA sequence and increased sampling, although these improvements have limits. The most important factor in the success of the model is the underlying phylogenetic history of the species under consideration. Recent and rapid divergences result in higher amounts of uncertainty in the model and eventually cause the model to fail to accurately assess uncertainty in species limits.
Our results suggest that the GMYC model can be useful under a wide variety of circumstances, particularly in cases where divergences are deeper, or taxon sampling is incomplete, as in many studies of ecological communities, but that, in accordance with expectations from coalescent theory, rapid, recent radiations may yield inaccurate results. Our implementation differs from existing ones in two ways: it allows for the accounting for important sources of uncertainty in the model (phylogenetic and in parameters specific to the model) and in the specification of informative prior distributions that can increase the precision of the model. We have incorporated this model into a user-friendly R package available on the authors' websites.
How to fail at species delimitation Carstens, Bryan C.; Pelletier, Tara A.; Reid, Noah M. ...
Molecular ecology,
September 2013, Letnik:
22, Številka:
17
Journal Article
Recenzirano
Odprti dostop
Species delimitation is the act of identifying species‐level biological diversity. In recent years, the field has witnessed a dramatic increase in the number of methods available for delimiting ...species. However, most recent investigations only utilize a handful (i.e. 2–3) of the available methods, often for unstated reasons. Because the parameter space that is potentially relevant to species delimitation far exceeds the parameterization of any existing method, a given method necessarily makes a number of simplifying assumptions, any one of which could be violated in a particular system. We suggest that researchers should apply a wide range of species delimitation analyses to their data and place their trust in delimitations that are congruent across methods. Incongruence across the results from different methods is evidence of either a difference in the power to detect cryptic lineages across one or more of the approaches used to delimit species and could indicate that assumptions of one or more of the methods have been violated. In either case, the inferences drawn from species delimitation studies should be conservative, for in most contexts it is better to fail to delimit species than it is to falsely delimit entities that do not represent actual evolutionary lineages.
Pleistocene glacial cycles drastically changed the distributions of taxa endemic to temperate rainforests in the Pacific Northwest, with many experiencing reduced habitat suitability during glacial ...periods. In this study, we investigate whether glacial cycles promoted intraspecific divergence and whether subsequent range changes led to secondary contact and gene flow. For seven invertebrate species endemic to the PNW, we estimated species distribution models (SDMs) and projected them onto current and historical climate conditions to assess how habitat suitability changed during glacial cycles. Using single nucleotide polymorphism (SNP) data from these species, we assessed population genetic structure and used a machine‐learning approach to compare models with and without gene flow between populations upon secondary contact after the last glacial maximum (LGM). Finally, we estimated divergence times and rates of gene flow between populations. SDMs suggest that there was less suitable habitat in the North Cascades and Northern Rocky Mountains during glacial compared to interglacial periods, resulting in reduced habitat suitability and increased habitat fragmentation during the LGM. Our genomic data identify population structure in all taxa, and support gene flow upon secondary contact in five of the seven taxa. Parameter estimates suggest that population divergences date to the later Pleistocene for most populations. Our results support a role of refugial dynamics in driving intraspecific divergence in the Cascades Range. In these invertebrates, population structure often does not correspond to current biogeographic or environmental barriers. Rather, population structure may reflect refugial lineages that have since expanded their ranges, often leading to secondary contact between once isolated lineages.
The gut microbiota of vertebrates are essential to host health. Most non-model vertebrates, however, lack even a basic description of natural gut microbiota biodiversity. Here, we sampled 116 ...intestines from 59 Neotropical bird species and used the V6 region of the 16S rRNA molecule as a microbial fingerprint (average coverage per bird ~80,000 reads). A core microbiota of Proteobacteria, Firmicutes, Bacteroidetes, and Actinobacteria was identified, as well as several gut-associated genera. We tested 18 categorical variables associated with each bird for significant correlation to the gut microbiota; host taxonomic categories were most frequently significant and explained the most variation. Ecological variables (e.g., diet, foraging stratum) were also frequently significant but explained less variation. Little evidence was found for a significant influence of geographic space. Finally, we suggest that microbial sampling during field collection of organisms would propel biological understanding of evolutionary history and ecological significance of host-associated microbiota.
Species are a fundamental unit for biological studies, yet no uniform guidelines exist for determining species limits in an objective manner. Given the large number of species concepts available, ...defining species can be both highly subjective and biased. Although morphology has been commonly used to determine species boundaries, the availability and prevalence of genetic data has allowed researchers to use such data to make inferences regarding species limits. Genetic data also have been used in the detection of cryptic species, where other lines of evidence (morphology in particular) may underestimate species diversity. In this study, we investigate species limits in a complex of morphologically conserved trapdoor spiders (Mygalomorphae, Antrodiaetidae, Aliatypus) from California. Multiple approaches were used to determine species boundaries in this highly genetically fragmented group, including both multilocus discovery and validation approaches (plus a chimeric approach). Additionally, we introduce a novel tree-based discovery approach using species trees. Results suggest that this complex includes multiple cryptic species, with two groupings consistently recovered across analyses. Due to incongruence across analyses for the remaining samples, we take a conservative approach and recognize a three species complex, and formally describe two new species (Aliatypus roxxiae, sp. nov. and Aliatypus starretti, sp. nov.). This study helps to clarify species limits in a genetically fragmented group and provides a framework for identifying and defining the cryptic lineage diversity that prevails in many organismal groups.
The conservation status of most plant species is currently unknown, despite the fundamental role of plants in ecosystem health. To facilitate the costly process of conservation assessment, we ...developed a predictive protocol using a machine-learning approach to predict conservation status of over 150,000 land plant species. Our study uses open-source geographic, environmental, and morphological trait data, making this the largest assessment of conservation risk to date and the only global assessment for plants. Our results indicate that a large number of unassessed species are likely at risk and identify several geographic regions with the highest need of conservation efforts, many of which are not currently recognized as regions of global concern. By providing conservation-relevant predictions at multiple spatial and taxonomic scales, predictive frameworks such as the one developed here fill a pressing need for biodiversity science.
Vocalizations in animals, particularly birds, are critically important behaviors that influence their reproductive fitness. While recordings of bioacoustic data have been captured and stored in ...collections for decades, the automated extraction of data from these recordings has only recently been facilitated by artificial intelligence methods. These have yet to be evaluated with respect to accuracy of different automation strategies and features. Here, we use a recently published machine learning framework to extract syllables from ten bird species ranging in their phylogenetic relatedness from 1 to 85 million years, to compare how phylogenetic relatedness influences accuracy. We also evaluate the utility of applying trained models to novel species. Our results indicate that model performance is best on conspecifics, with accuracy progressively decreasing as phylogenetic distance increases between taxa. However, we also find that the application of models trained on multiple distantly related species can improve the overall accuracy to levels near that of training and analyzing a model on the same species. When planning big-data bioacoustics studies, care must be taken in sample design to maximize sample size and minimize human labor without sacrificing accuracy.
Most approaches to species delimitation to date have considered divergence-only models. Although these models are appropriate for allopatric speciation, their failure to incorporate many of the ...population-level processes that drive speciation, such as gene flow (e.g., in sympatric speciation), places an unnecessary limit on our collective understanding of the processes that produce biodiversity. To consider these processes while inferring species boundaries, we introduce the R-package delimitR and apply it to identify species boundaries in the reticulate taildropper slug (Prophysaon andersoni). Results suggest that secondary contact is an important mechanism driving speciation in this system. By considering process, we both avoid erroneous inferences that can be made when population-level processes such as secondary contact drive speciation but only divergence is considered, and gain insight into the process of speciation in terrestrial slugs. Further, we apply delimitR to three published empirical datasets and find results corroborating previous findings. Finally, we evaluate the performance of delimitR using simulation studies, and find that error rates are near zero when comparing models that include lineage divergence and gene flow for three populations with a modest number of Single Nucleotide Polymorphisms (SNPs; 1500) and moderate divergence times (<100,000 generations). When we apply delimitR to a complex model set (i.e., including divergence, gene flow, and population size changes), error rates are moderate (~0.15; 10,000 SNPs), and, when present, misclassifications occur among highly similar models.
We describe a software package (SpedeSTEM) that allows researchers to conduct a species delimitation analysis using intraspecific genetic data. Our method operates under the assumption that a priori ...information regarding group membership is available, for example that samples are drawn from some number of described subspecies, races or distinct morphotypes. SpedeSTEM proceeds by calculating the maximum likelihood species tree from all hierarchical arrangements of the sampled alleles and uses information theory to quantify the model probability of each permutation. SpedeSTEM is tested here against empirical and simulated data; results indicate that evolutionary lineages that diverged as few as 0.5N generations in the past can be validated as distinct using sequence data from little as five loci. This work enables speciation investigations to identify lineages that are evolutionarily distinct and thus have the potential to form new species before these lineages acquire secondary characteristics such as reproductive isolation or morphological differentiation that are commonly used to define species.
Display omitted
► Applications of next-generation sequencing to phylogeography are few. ► We discuss methods for sample preparation and data analysis. ► Restriction enzyme digest methods are best for ...closely related lineages. ► Sequence capture using conserved probes is appropriate for phylogenetics. ► Data analysis is a challenging and evolving area of research.
This is a time of unprecedented transition in DNA sequencing technologies. Next-generation sequencing (NGS) clearly holds promise for fast and cost-effective generation of multilocus sequence data for phylogeography and phylogenetics. However, the focus on non-model organisms, in addition to uncertainty about which sample preparation methods and analyses are appropriate for different research questions and evolutionary timescales, have contributed to a lag in the application of NGS to these fields. Here, we outline some of the major obstacles specific to the application of NGS to phylogeography and phylogenetics, including the focus on non-model organisms, the necessity of obtaining orthologous loci in a cost-effective manner, and the predominate use of gene trees in these fields. We describe the most promising methods of sample preparation that address these challenges. Methods that reduce the genome by restriction digest and manual size selection are most appropriate for studies at the intraspecific level, whereas methods that target specific genomic regions (i.e., target enrichment or sequence capture) have wider applicability from the population level to deep-level phylogenomics. Additionally, we give an overview of how to analyze NGS data to arrive at data sets applicable to the standard toolkit of phylogeography and phylogenetics, including initial data processing to alignment and genotype calling (both SNPs and loci involving many SNPs). Even though whole-genome sequencing is likely to become affordable rather soon, because phylogeography and phylogenetics rely on analysis of hundreds of individuals in many cases, methods that reduce the genome to a subset of loci should remain more cost-effective for some time to come.