This article presents W-IQ-TREE, an intuitive and user-friendly web interface and server for IQ-TREE, an efficient phylogenetic software for maximum likelihood analysis. W-IQ-TREE supports multiple ...sequence types (DNA, protein, codon, binary and morphology) in common alignment formats and a wide range of evolutionary models including mixture and partition models. W-IQ-TREE performs fast model selection, partition scheme finding, efficient tree reconstruction, ultrafast bootstrapping, branch tests, and tree topology tests. All computations are conducted on a dedicated computer cluster and the users receive the results via URL or email. W-IQ-TREE is available at http://iqtree.cibiv.univie.ac.at It is free and open to all users and there is no login requirement.
In phylogenomics the analysis of concatenated gene alignments, the so-called supermatrix, is commonly accompanied by the assumption of partition models. Under such models each gene, or more generally ...partition, is allowed to evolve under its own evolutionary model. Although partition models provide a more comprehensive analysis of supermatrices, missing data may hamper the tree search algorithms due to the existence of phylogenetic (partial) terraces. Here, we introduce the phylogenetic terrace aware data structure for the efficient analysis under partition models. In the presence of missing data exploits (partial) terraces and induced partition trees to save computation time. We show that an implementation of in IQ-TREE leads to a substantial speedup of up to 4.5 and 8 times compared with the standard IQ-TREE and RAxML implementations, respectively. PTA is generally applicable to all types of partition models and common topological rearrangements thus can be employed by all phylogenomic inference software.
Defining direct targets of transcription factors and regulatory pathways is key to understanding their roles in physiology and disease. We combined SLAM-seq thiol(SH)-linked alkylation for the ...metabolic sequencing of RNA, a method for direct quantification of newly synthesized messenger RNAs (mRNAs), with pharmacological and chemical-genetic perturbation in order to define regulatory functions of two transcriptional hubs in cancer, BRD4 and MYC, and to interrogate direct responses to BET bromodomain inhibitors (BETis). We found that BRD4 acts as general coactivator of RNA polymerase II-dependent transcription, which is broadly repressed upon high-dose BETi treatment. At doses triggering selective effects in leukemia, BETis deregulate a small set of hypersensitive targets including MYC. In contrast to BRD4, MYC primarily acts as a selective transcriptional activator controlling metabolic processes such as ribosome biogenesis and de novo purine synthesis. Our study establishes a simple and scalable strategy to identify direct transcriptional targets of any gene or pathway.
The standard bootstrap (SBS), despite being computationally intensive, is widely used in maximum likelihood phylogenetic analyses. We recently proposed the ultrafast bootstrap approximation (UFBoot) ...to reduce computing time while achieving more unbiased branch supports than SBS under mild model violations. UFBoot has been steadily adopted as an efficient alternative to SBS and other bootstrap approaches. Here, we present UFBoot2, which substantially accelerates UFBoot and reduces the risk of overestimating branch supports due to polytomies or severe model violations. Additionally, UFBoot2 provides suitable bootstrap resampling strategies for phylogenomic data. UFBoot2 is 778 times (median) faster than SBS and 8.4 times (median) faster than RAxML rapid bootstrap on tested data sets. UFBoot2 is implemented in the IQ-TREE software package version 1.6 and freely available at http://www.iqtree.org.
Organoids enable in vitro modeling of complex developmental processes and disease pathologies. Like most 3D cultures, organoids lack sufficient oxygen supply and therefore experience cellular stress. ...These negative effects are particularly prominent in complex models, such as brain organoids, and can affect lineage commitment. Here, we analyze brain organoid and fetal single‐cell RNA sequencing (scRNAseq) data from published and new datasets, totaling about 190,000 cells. We identify a unique stress signature in the data from all organoid samples, but not in fetal samples. We demonstrate that cell stress is limited to a defined subpopulation of cells that is unique to organoids and does not affect neuronal specification or maturation. We have developed a computational algorithm, Gruffi, which uses granular functional filtering to identify and remove stressed cells from any organoid scRNAseq dataset in an unbiased manner. We validated our method using six additional datasets from different organoid protocols and early brains, and show its usefulness to other organoid systems including retinal organoids. Our data show that the adverse effects of cell stress can be corrected by bioinformatic analysis for improved delineation of developmental trajectories and resemblance to in vivo data.
Synopsis
Cellular stress in 3D organoids due to insufficient oxygen transport can affect faithful lineage commitment and disease modeling. This work identifies a subpopulation of stressed cells characterized by a distinct gene expression signature that can be removed from scRNAseq datasets for better evaluation of fetal developmental trajectories.
ER‐ and glycolytic stress are found accross organoid protocols.
Stressed cells in 3D organoids form a separate cell state that is not found in vivo and that can be separated bioinformatically.
The presence of stressed cells in organoids does not affect cell‐type specification or the maturation of non‐stressed neurons.
Granular functional filtering (Gruffi) removes stressed cells from the single‐cell datasets but retains cell types found in vivo.
Stress removal leads to clearer developmental trajectories.
A unique stress signature found in in brain organoid samples but not in fetal samples can be quantified and removed from scRNAseq datasets using a new algorithm.
GHOST Crotty, Stephen M.; Minh, Bui Quang; Bean, Nigel G. ...
Systematic biology,
03/2020, Volume:
69, Issue:
2
Journal Article
Peer reviewed
Open access
Molecular sequence data that have evolved under the influence of heterotachous evolutionary processes are known to mislead phylogenetic inference. We introduce the General Heterogeneous evolution On ...a Single Topology (GHOST) model of sequence evolution, implemented under a maximum-likelihood framework in the phylogenetic program IQ-TREE (http://www.iqtree.org). Simulations showthat using the GHOST model, IQ-TREE can accurately recover the tree topology, branch lengths, and substitution model parameters from heterotachously evolved sequences. We investigate the performance of the GHOST model on empirical data by sampling phylogenomic alignments of varying lengths from a plastome alignment. We then carry out inference under the GHOST model on a phylogenomic data set composed of 248 genes from 16 taxa, where we find the GHOST model concurs with the currently accepted view, placing turtles as a sister lineage of archosaurs, in contrast to results obtained using traditional variable rates-across-sites models. Finally, we apply themodel to a data set composed of a sodium channel gene of 11 fish taxa, finding that the GHOST model is able to elucidate a subtle component of the historical signal, linked to the previously established convergent evolution of the electric organ in two geographically distinct lineages of electric fish. We compare inference under the GHOST model to partitioning by codon position and show that, owing to the minimization of model constraints, the GHOST model offers unique biological insights when applied to empirical data.
Despite vast differences between organisms, some characteristics of their genomes are conserved, such as the nucleolus organizing region (NOR). The NOR is constituted of multiple, highly repetitive ...rDNA genes, encoding the catalytic ribosomal core RNAs which are transcribed from 45S rDNA units. Their precise sequence information and organization remain uncharacterized. Here, using a combination of long- and short-read sequencing technologies we assemble contigs of the Arabidopsis NOR2 rDNA domain. We identify several expressed rRNA gene variants which are integrated into translating ribosomes in a tissue-specific manner. These findings support the concept of tissue specific ribosome subpopulations that differ in their rRNA composition and provide insights into the higher order organization of NOR2.
EST sequencing is a versatile approach for rapidly gathering protein coding sequences. They provide direct access to an organism's gene repertoire bypassing the still error-prone procedure of gene ...prediction from genomic data. Therefore, ESTs are often the only source for biological sequence data from taxa outside mainstream interest. The widespread use of ESTs in evolutionary studies and particularly in molecular systematics studies is still hindered by the lack of efficient and reliable approaches for automated ortholog predictions in ESTs. Existing methods either depend on a known species tree or cannot cope with redundancy in EST data.
We present a novel approach (HaMStR) to mine EST data for the presence of orthologs to a curated set of genes. HaMStR combines a profile Hidden Markov Model search and a subsequent BLAST search to extend existing ortholog cluster with sequences from further taxa. We show that the HaMStR results are consistent with those obtained with existing orthology prediction methods that require completely sequenced genomes. A case study on the phylogeny of 35 fungal taxa illustrates that HaMStR is well suited to compile informative data sets for phylogenomic studies from ESTs and protein sequence data.
HaMStR extends in a standardized manner a pre-defined set of orthologs with ESTs from further taxa. In the same fashion HaMStR can be applied to protein sequence data, and thus provides a comprehensive approach to compile ortholog cluster from any protein coding data. The resulting orthology predictions serve as the data basis for a variety of evolutionary studies. Here, we have demonstrated the application of HaMStR in a molecular systematics study. However, we envision that studies tracing the evolutionary fate of individual genes or functional complexes of genes will greatly benefit from HaMStR orthology predictions as well.
SolariX is a compendium of DNA sequence tags from the nucleotide binding site (NBS) domain of disease resistance genes of the common potato, Solanum tuberosum Group Tuberosum. The sequences, which we ...call NBS tags, for nearly all NBS domains from 91 genomes-representing a wide range of historical and contemporary potato cultivars, 24 breeding programs and 200 years-were generated using just 16 amplification primers and high-throughput sequencing. The NBS tags were mapped to 587 NBS domains on the draft potato genome DM, where we detected an average, over all the samples, of 26 nucleotide polymorphisms on each locus. The total number of NBS domains observed, differed between potato cultivars. However, both modern and old cultivars possessed comparable levels of variability, and neither the individual breeder or country nor the generation or time appeared to correlate with the NBS domain frequencies. Our attempts to detect haplotypes (i.e., sets of linked nucleotide polymorphisms) frequently yielded more than the possible 4 alleles per domain indicating potential locus intermixing during the mapping of NBS tags to the DM reference genome. Mapping inaccuracies were likely a consequence of the differences of each cultivar to the reference genome used, coupled with high levels of NBS domain sequence similarity. We illustrate that the SolariX database is useful to search for polymorphism linked with NBS-LRR R gene alleles conferring specific disease resistance and to develop molecular markers for selection.