Detection of microbial DNA is an evolutionarily conserved mechanism that alerts the host immune system to mount a defense response to microbial infections. However, this detection mechanism also ...poses a challenge to the host as to how to distinguish foreign DNA from abundant self-DNA. Cyclic guanosine monophosphate (GMP)-adenosine monophosphate (AMP) synthase (cGAS) is a DNA sensor that triggers innate immune responses through production of the second messenger cyclic GMP-AMP (cGAMP), which binds and activates the adaptor protein STING. However, cGAS can be activated by double-stranded DNA irrespective of the sequence, including self-DNA. Although how cGAS is normally kept inactive in cells is still not well understood, recent research has provided strong evidence that genomic DNA damage leads to cGAS activation to stimulate inflammatory responses. This review summarizes recent findings on how genomic instability and DNA damage trigger cGAS activation and how cGAS serves as a link from DNA damage to inflammation, cellular senescence, and cancer.
Phylogenetics is a powerful tool for analyzing protein sequences, by inferring their evolutionary relationships to other proteins. However, phylogenetics analyses can be challenging: they are ...computationally expensive and must be performed carefully in order to avoid systematic errors and artifacts. Protein Analysis THrough Evolutionary Relationships (PANTHER; http://pantherdb.org) is a publicly available, user‐focused knowledgebase that stores the results of an extensive phylogenetic reconstruction pipeline that includes computational and manual processes and quality control steps. First, fully reconciled phylogenetic trees (including ancestral protein sequences) are reconstructed for a set of “reference” protein sequences obtained from fully sequenced genomes of organisms across the tree of life. Second, the resulting phylogenetic trees are manually reviewed and annotated with function evolution events: inferred gains and losses of protein function along branches of the phylogenetic tree. Here, we describe in detail the current contents of PANTHER, how those contents are generated, and how they can be used in a variety of applications. The PANTHER knowledgebase can be downloaded or accessed via an extensive API. In addition, PANTHER provides software tools to facilitate the application of the knowledgebase to common protein sequence analysis tasks: exploring an annotated genome by gene function; performing “enrichment analysis” of lists of genes; annotating a single sequence or large batch of sequences by homology; and assessing the likelihood that a genetic variant at a particular site in a protein will have deleterious effects.
Abstract
Analysis of anchored hybrid enrichment (AHE) data under a variety of analytical parameters for a broadly representative sample of taxa (136 species representing all extant families) ...recovered a well‐resolved and strongly supported tree for the higher phylogeny of Neuropterida that is highly concordant with previous estimates based on DNA sequence data. Important conclusions include: Megaloptera is sister to Neuroptera; Coniopterygidae is sister to all other lacewings; Osmylidae, Nevrorthidae and Sisyridae are recovered as a monophyletic Osmyloidea, and Rhachiberothidae and Berothidae were recovered within a paraphyletic Mantispidae. Contrary to previous studies, Chrysopidae and Hemerobiidae were not recovered as sister families and morphological similarities between larvae of both families supporting this assumption are reinterpreted as symplesiomorphies. Relationships among myrmeleontoid families are similar to recent studies except Ithonidae are placed as sister to Nymphidae. Notably, Ascalaphidae render Myrmeleontidae paraphyletic, again calling into question the status of Ascalaphidae as a separate family. Using statistical binning of partitioned loci based on a branch‐length proxy, we found that the diversity of phylogenetic signal across partitions was minimal from the slowest to the fastest evolving loci and varied little over time. Ancestral character‐state reconstruction of the sclerotization of the gular region in the larval head found that although it is present in Coleoptera, Raphidioptera and Megaloptera, it is lost early in lacewing evolution and then regained twice as a nonhomologous gula‐like sclerite in distantly related clades. Reconstruction of the ancestral larval habitat also indicates that the ancestral neuropteridan larva was aquatic, regardless of the assumed condition (i.e., aquatic or terrestrial) of the outgroup (Coleopterida).
The targeted deletion, replacement, integration or inversion of genomic sequences could be used to study or treat human genetic diseases, but existing methods typically require double-strand DNA ...breaks (DSBs) that lead to undesired consequences, including uncontrolled indel mixtures and chromosomal abnormalities. Here we describe twin prime editing (twinPE), a DSB-independent method that uses a prime editor protein and two prime editing guide RNAs (pegRNAs) for the programmable replacement or excision of DNA sequences at endogenous human genomic sites. The two pegRNAs template the synthesis of complementary DNA flaps on opposing strands of genomic DNA, which replace the endogenous DNA sequence between the prime-editor-induced nick sites. When combined with a site-specific serine recombinase, twinPE enabled targeted integration of gene-sized DNA plasmids (>5,000 bp) and targeted sequence inversions of 40 kb in human cells. TwinPE expands the capabilities of precision gene editing and might synergize with other tools for the correction or complementation of large or complex human pathogenic alleles.
The tufted puffin Fratercula cirrhata (Charadriiformes: Alcidae) is distributed throughout the boreal and low Arctic areas of the North Pacific, from California, USA to Hokkaido, Japan. Few studies ...have investigated the genetic diversity of this species. Therefore, we analyzed the genetic diversity of two captive populations using nucleotide sequences of two mitochondrial loci (COX1 and D-loop) and one nuclear locus (RHBG). We sequenced these loci for birds from Tokyo Sea Life Park (Kasai Rinkai Suizokuen), originally from Alaska, and birds from Aqua World Oarai, originally from far eastern Russia. We found five COX1 haplotypes and 17 D-loop haplotypes for the mitochondrial loci, and obtained 14 predicted haplotypes for the nuclear RHBG locus. The major haplotypes of all three loci occurred in individuals from both populations. Thus, there were no clear genetic differences between the populations with respect to these three loci. Although the breeding range of the tufted puffin covers the boreal and low Arctic from California to Hokkaido, our results suggest that the species has not genetically diverged within its breeding range.
Despite rapid advances in sequencing technologies, accurately calling genetic variants present in an individual genome from billions of short, errorful sequence reads remains challenging. Here we ...show that a deep convolutional neural network can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships between images of read pileups around putative variant and true genotype calls. The approach, called DeepVariant, outperforms existing state-of-the-art tools. The learned model generalizes across genome builds and mammalian species, allowing nonhuman sequencing projects to benefit from the wealth of human ground-truth data. We further show that DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, including deep whole genomes from 10X Genomics and Ion Ampliseq exomes, highlighting the benefits of using more automated and generalizable techniques for variant calling.
Clustal Omega is a widely used package for carrying out multiple sequence alignment. Here, we describe some recent additions to the package and benchmark some alternative ways of making alignments. ...These benchmarks are based on protein structure comparisons or predictions and include a recently described method based on secondary structure prediction. In general, Clustal Omega is fast enough to make very large alignments and the accuracy of protein alignments is high when compared to alternative packages. The package is freely available as executables or source code from www.clustal.org or can be run on‐line from a variety of sites, especially the EBI www.ebi.ac.uk.
The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective way. Here we combine Hi-C data with ...existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67× coverage). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Ae. aegypti and Culex quinquefasciatus, each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that almost all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, and accurate, and can be applied to many species.
Models for predicting phenotypic outcomes from genotypes have important applications to understanding genomic function and improving human health. Here, we develop a machine-learning system to ...predict cell-type-specific epigenetic and transcriptional profiles in large mammalian genomes from DNA sequence alone. By use of convolutional neural networks, this system identifies promoters and distal regulatory elements and synthesizes their content to make effective gene expression predictions. We show that model predictions for the influence of genomic variants on gene expression align well to causal variants underlying eQTLs in human populations and can be useful for generating mechanistic hypotheses to enable fine mapping of disease loci.