The MicroArray Quality Control consortium--a 16-year international effort led by the FDA and involving hundreds of scientists from academia, industry and government--helped make genomic medicine a ...reality.
Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle (GIAB) ...Consortium, we apply a reproducible, cloud-based pipeline to integrate multiple short- and linked-read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a 'first of its kind' resource that is available to the community for multiple downstream applications. We produce 17% more benchmark single nucleotide variations, 176% more indels and 12% larger benchmark regions than previously published GIAB benchmarks. We demonstrate that this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive. Finally, we identify strengths and weaknesses of callsets by stratifying performance according to variant type and genome context.
Introduction: Over the past decade, loop-mediated isothermal amplification (LAMP) technology has played an important role in molecular diagnostics. Amongst numerous nucleic acid amplification assays, ...LAMP stands out in terms of sample-to-answer time, sensitivity, specificity, cost, robustness, and accessibility, making it ideal for field-deployable diagnostics in resource-limited regions.
Areas covered: In this review, we outline the front-end LAMP design practices for point-of-care (POC) applications, including sample handling and various signal readout methodologies. Next, we explore existing LAMP technologies that have been validated with clinical samples in the field. We summarize recent work that utilizes reverse transcription (RT) LAMP to rapidly detect SARS-CoV-2 as an alternative to standard PCR protocols. Finally, we describe challenges in translating LAMP from the benchtop to the field and opportunities for future LAMP assay development and performance reporting.
Expert opinion: Despite the popularity of LAMP in the academic research community and a recent surge in interest in LAMP due to the COVID-19 pandemic, there are numerous areas for improvement in the fundamental understanding of LAMP, which are needed to elevate the field of LAMP assay development and characterization.
Chemoenzymatic modification of proteins is an attractive option to create highly specific conjugates for therapeutics, diagnostics, or materials under gentle biological conditions. However, these ...methods often suffer from expensive specialized substrates, bulky fusion tags, low yields, and extra purification steps to achieve the desired conjugate. Staphylococcus aureus sortase A and its engineered variants are used to attach oligoglycine derivatives to the C-terminus of proteins expressed with a minimal LPXTG tag. This strategy has been used extensively for bioconjugation in vitro and for protein–protein conjugation in living cells. Here we show that an enzyme variant recently engineered for higher activity on oligoglycine has promiscuous activity that allows proteins to be tagged using a diverse array of small, commercially available amines, including several bioorthogonal functional groups. This technique can also be carried out in living Escherichia coli, enabling simple, inexpensive production of chemically functionalized proteins with no additional purification steps.
Standardized benchmarking approaches are required to assess the accuracy of variants called from sequence data. Although variant-calling tools and the metrics used to assess their performance ...continue to improve, important challenges remain. Here, as part of the Global Alliance for Genomics and Health (GA4GH), we present a benchmarking framework for variant calling. We provide guidance on how to match variant calls with different representations, define standard performance metrics, and stratify performance by variant type and genome context. We describe limitations of high-confidence calls and regions that can be used as truth sets (for example, single-nucleotide variant concordance of two methods is 99.7% inside versus 76.5% outside high-confidence regions). Our web-based app enables comparison of variant calls against truth sets to obtain a standardized performance report. Our approach has been piloted in the PrecisionFDA variant-calling challenges to identify the best-in-class variant-calling methods within high-confidence regions. Finally, we recommend a set of best practices for using our tools and evaluating the results.
Thousands of experiments and studies use the human reference genome as a resource each year. This single reference genome, GRCh38, is a mosaic created from a small number of individuals, representing ...a very small sample of the human population. There is a need for reference genomes from multiple human populations to avoid potential biases.
Here, we describe the assembly and annotation of the genome of an Ashkenazi individual and the creation of a new, population-specific human reference genome. This genome is more contiguous and more complete than GRCh38, the latest version of the human reference genome, and is annotated with highly similar gene content. The Ashkenazi reference genome, Ash1, contains 2,973,118,650 nucleotides as compared to 2,937,639,212 in GRCh38. Annotation identified 20,157 protein-coding genes, of which 19,563 are > 99% identical to their counterparts on GRCh38. Most of the remaining genes have small differences. Forty of the protein-coding genes in GRCh38 are missing from Ash1; however, all of these genes are members of multi-gene families for which Ash1 contains other copies. Eleven genes appear on different chromosomes from their homologs in GRCh38. Alignment of DNA sequences from an unrelated Ashkenazi individual to Ash1 identified ~ 1 million fewer homozygous SNPs than alignment of those same sequences to the more-distant GRCh38 genome, illustrating one of the benefits of population-specific reference genomes.
The Ash1 genome is presented as a reference for any genetic studies involving Ashkenazi Jewish individuals.
New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to ...routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.
High-throughput sequencing of cDNA (RNA-seq) is a widely deployed transcriptome profiling and annotation technique, but questions about the performance of different protocols and platforms remain. We ...used a newly developed pool of 96 synthetic RNAs with various lengths, and GC content covering a 2(20) concentration range as spike-in controls to measure sensitivity, accuracy, and biases in RNA-seq experiments as well as to derive standard curves for quantifying the abundance of transcripts. We observed linearity between read density and RNA input over the entire detection range and excellent agreement between replicates, but we observed significantly larger imprecision than expected under pure Poisson sampling errors. We use the control RNAs to directly measure reproducible protocol-dependent biases due to GC content and transcript length as well as stereotypic heterogeneity in coverage across transcripts correlated with position relative to RNA termini and priming sequence bias. These effects lead to biased quantification for short transcripts and individual exons, which is a serious problem for measurements of isoform abundances, but that can partially be corrected using appropriate models of bias. By using the control RNAs, we derive limits for the discovery and detection of rare transcripts in RNA-seq experiments. By using data collected as part of the model organism and human Encyclopedia of DNA Elements projects (ENCODE and modENCODE), we demonstrate that external RNA controls are a useful resource for evaluating sensitivity and accuracy of RNA-seq experiments for transcriptome discovery and quantification. These quality metrics facilitate comparable analysis across different samples, protocols, and platforms.
Our understanding of translation underpins our capacity to engineer living systems. The canonical start codon (AUG) and a few near-cognates (GUG, UUG) are considered as the 'start codons' for ...translation initiation in Escherichia coli. Translation is typically not thought to initiate from the 61 remaining codons. Here, we quantified translation initiation of green fluorescent protein and nanoluciferase in E. coli from all 64 triplet codons and across a range of DNA copy number. We detected initiation of protein synthesis above measurement background for 47 codons. Translation from non-canonical start codons ranged from 0.007 to 3% relative to translation from AUG. Translation from 17 non-AUG codons exceeded the highest reported rates of non-cognate codon recognition. Translation initiation from non-canonical start codons may contribute to the synthesis of peptides in both natural and synthetic biological systems.