It has been proposed that patterns in the usage of synonymous codons provide evidence that individual tRNA molecules are recycled through the ribosome, translating several occurrences of the same ...amino acid before diffusing away. The claimed evidence is based on counting the frequency with which pairs of synonymous codons are used at nearby occurrences of the same amino acid, as compared to the frequency expected if each codon were chosen independently from a single genome-wide distribution. We show that such statistics simply measure variation in codon preferences across a genome. As a negative control on the potential contribution of pressure to exploit tRNA recycling on these signals, we examine correlations in the usage of codons that encode different amino acids. We find that these controls are statistically as strong as the claimed evidence and conclude that there is no informatic evidence that tRNA recycling is a force shaping codon usage.
Display omitted
•Synonymous codon usage patterns are known to vary across the genome and within genes•Mathematically, this variation implies a diagonal-positive local covariance signal•That signal is thus not evidence for molecular tRNA reuse by the ribosome•Rather, it reflects a complicated covariance structure across 61 codons
Hussmann and Press find that observed excesses in pairs of coding sequence occurrences of the same synonymous codon are a generic consequence of the existence of spatial variation in codon preferences and therefore cannot be interpreted as evidence for the recycling of individual tRNA molecules during translation.
Synthetic DNA is rapidly emerging as a durable, high-density information storage platform. A major challenge for DNA-based information encoding strategies is the high rate of errors that arise during ...DNA synthesis and sequencing. Here, we describe the HEDGES (Hash Encoded, Decoded by Greedy Exhaustive Search) error-correcting code that repairs all three basic types of DNA errors: insertions, deletions, and substitutions. HEDGES also converts unresolved or compound errors into substitutions, restoring synchronization for correction via a standard Reed–Solomon outer code that is interleaved across strands. Moreover, HEDGES can incorporate a broad class of user-defined sequence constraints, such as avoiding excess repeats, or too high or too low windowed guanine–cytosine (GC) content. We test our code both via in silico simulations and with synthesized DNA. From its measured performance, we develop a statistical model applicable to much larger datasets. Predicted performance indicates the possibility of error-free recovery of petabyte- and exabyte-scale data from DNA degraded with as much as 10% errors. As the cost of DNA synthesis and sequencing continues to drop, we anticipate that HEDGES will find applications in large-scale error-free information encoding.
Modeling, post COVID-19 Press, William H; Levin, Richard C
Science (American Association for the Advancement of Science),
2020-Nov-27, Letnik:
370, Številka:
6520
Journal Article
Engineered SpCas9s and AsCas12a cleave fewer off-target genomic sites than wild-type (wt) Cas9. However, understanding their fidelity, mechanisms and cleavage outcomes requires systematic profiling ...across mispaired target DNAs. Here we describe NucleaSeq-nuclease digestion and deep sequencing-a massively parallel platform that measures the cleavage kinetics and time-resolved cleavage products for over 10,000 targets containing mismatches, insertions and deletions relative to the guide RNA. Combining cleavage rates and binding specificities on the same target libraries, we benchmarked five SpCas9 variants and AsCas12a. A biophysical model built from these data sets revealed mechanistic insights into off-target cleavage. Engineered Cas9s, especially Cas9-HF1, dramatically increased cleavage specificity but not binding specificity compared to wtCas9. Surprisingly, AsCas12a cleavage specificity differed little from that of wtCas9. Initial DNA cleavage sites and end trimming varied by nuclease, guide RNA and the positions of mispaired nucleotides. More broadly, NucleaSeq enables rapid, quantitative and systematic comparisons of specificity and cleavage outcomes across engineered and natural nucleases.
Predefined sets of short DNA sequences are commonly used as barcodes to identify individual biomolecules in pooled populations. Such use requires either sufficiently small DNA error rates, or else an ...error-correction methodology. Most existing DNA error-correcting codes (ECCs) correct only one or two errors per barcode in sets of typically ≲10
barcodes. We here consider the use of random barcodes of sufficient length that they remain accurately decodable even with ≳6 errors and even at Formula: see text or 20% nucleotide error rates. We show that length ∼34 nt is sufficient even with ≳10
barcodes. The obvious objection to this scheme is that it requires comparing every read to every possible barcode by a slow Levenshtein or Needleman-Wunsch comparison. We show that several orders of magnitude speedup can be achieved by (i) a fast triage method that compares only trimer (three consecutive nucleotide) occurence statistics, precomputed in linear time for both reads and barcodes, and (ii) the massive parallelism available on today's even commodity-grade Graphics Processing Units (GPUs). With 10
barcodes of length 34 and 10% DNA errors (substitutions and indels), we achieve in simulation 99.9% precision (decode accuracy) with 98.8% recall (read acceptance rate). Similarly high precision with somewhat smaller recall is achievable even with 20% DNA errors. The amortized computation cost on a commodity workstation with two GPUs (2022 capability and price) is estimated as between US$ 0.15 and US$ 0.60 per million decoded reads.
Historically, the evolution of bats has been analyzed using a small number of genetic loci for many species or many genetic loci for a few species. Here we present a phylogeny of 18 bat species, each ...of which is represented in 1,107 orthologous gene alignments used to build the tree. We generated a transcriptome sequence of Hypsignathus monstrosus, the African hammer-headed bat, and additional transcriptome sequence for Rousettus aegyptiacus, the Egyptian fruit bat. We then combined these data with existing genomic and transcriptomic data from 16 other bat species. In the analysis of such datasets, there is no clear consensus on the most reliable computational methods for the curation of quality multiple sequence alignments since these public datasets represent multiple investigators and methods, including different source materials (chromosomal DNA or expressed RNA). Here we lay out a systematic analysis of parameters and produce an advanced pipeline for curating orthologous gene alignments from combined transcriptomic and genomic data, including a software package: the Mismatching Isoform eXon Remover (MIXR). Using this method, we created alignments of 11,677 bat genes, 1,107 of which contain orthologs from all 18 species. Using the orthologous gene alignments created, we assessed bat phylogeny and also performed a holistic analysis of positive selection acting in bat genomes. We found that 181 genes have been subject to positive natural selection. This list is dominated by genes involved in immune responses and genes involved in the production of collagens.
Create a COVID-19 commission Chyba, Christopher F; Cassel, Christine K; Graham, Susan L ...
Science (American Association for the Advancement of Science),
2021-Nov-19, Letnik:
374, Številka:
6570
Journal Article
Recenzirano
Odprti dostop
We need a definitive public reference for the history of events.