The Oxford Nanopore Technologies (ONT) MinION is used for sequencing a wide variety of sample types with diverse methods of sample extraction. Nanopore sequencers output FAST5 files containing signal ...data subsequently base called to FASTQ format. Optionally, ONT devices can collect data from all sequencing channels simultaneously in a bulk FAST5 file enabling inspection of signal in any channel at any point. We sought to visualize this signal to inspect challenging or difficult to sequence samples.
The BulkVis tool can load a bulk FAST5 file and overlays MinKNOW (the software that controls ONT sequencers) classifications on the signal trace and can show mappings to a reference. Users can navigate to a channel and time or, given a FASTQ header from a read, jump to its specific position. BulkVis can export regions as Nanopore base caller compatible reads. Using BulkVis, we find long reads can be incorrectly divided by MinKNOW resulting in single DNA molecules being split into two or more reads. The longest seen to date is 2 272 580 bases in length and reported in eleven consecutive reads. We provide helper scripts that identify and reconstruct split reads given a sequencing summary file and alignment to a reference. We note that incorrect read splitting appears to vary according to input sample type and is more common in 'ultra-long' read preparations.
The software is available freely under an MIT license at https://github.com/LooseLab/bulkvis.
Supplementary data are available at Bioinformatics online.
After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end ...to end, and hundreds of unresolved gaps persist
. Here we present a human genome assembly that surpasses the continuity of GRCh38
, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome
, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.
Abstract
Nosocomial severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections have severely affected bed capacity and patient flow. We utilized whole-genome sequencing (WGS) to identify ...outbreaks and focus infection control resources and intervention during the United Kingdom’s second pandemic wave in late 2020. Phylogenetic analysis of WGS and epidemiological data pinpointed an initial transmission event to an admission ward, with immediate prior community infection linkage documented. High incidence of asymptomatic staff infection with genetically identical viral sequences was also observed, which may have contributed to the propagation of the outbreak. WGS allowed timely nosocomial transmission intervention measures, including admissions ward point-of-care testing and introduction of portable HEPA14 filters. Conversely, WGS excluded nosocomial transmission in 2 instances with temporospatial linkage, conserving time and resources. In summary, WGS significantly enhanced understanding of SARS-CoV-2 clusters in a hospital setting, both identifying high-risk areas and conversely validating existing control measures in other units, maintaining clinical service overall.
SARS-CoV-2 whole-genome sequencing, combined with classical epidemiology, identified nosocomial transmission events, informing targeted infection prevention measures during the UK’s second epidemic wave. Conversely, nosocomial transmission was excluded by sequencing in 2 instances with temporospatial linkage, conserving time and resources.
Reactive oxygen species are bona fide intracellular second messengers that influence cell metabolism and aging by mechanisms that are incompletely resolved. Mitochondria generate superoxide that is ...dis-mutated to hydrogen peroxide, which in turn oxidises cysteine-based enzymes such as phosphatases, peroxiredoxins and redox-sensitive transcription factors to modulate their activity. Signal Transducer and Activator of Transcription 3 (Stat3) has been shown to participate in an oxidative relay with peroxiredoxin II but the impact of Stat3 oxidation on target gene expression and its biological consequences remain to be established. Thus, we created murine embryonic fibroblasts (MEFs) that express either WT-Stat3 or a redox-insensitive mutant of Stat3 (Stat3-C3S). The Stat3-C3S cells differed from WT-Stat3 cells in morphology, proliferation and resistance to oxidative stress; in response to cytokine stimulation, they displayed elevated Stat3 tyrosine phosphorylation and Socs3 expression, implying that Stat3-C3S is insensitive to oxidative inhibition. Comparative analysis of global gene expression in WT-Stat3 and Stat3-C3S cells revealed differential expression (DE) of genes both under basal conditions and during oxidative stress. Using differential gene regulation pattern analysis, we identified 199 genes clustered into 10 distinct patterns that were selectively responsive to Stat3 oxidation. GO term analysis identified down-regulated genes to be enriched for tissue/organ development and morphogenesis and up-regulated genes to be enriched for cell-cell adhesion, immune responses and transport related processes. Although most DE gene promoters contain consensus Stat3 inducible elements (SIEs), our chromatin immunoprecipitation (ChIP) and ChIP-seq analyses did not detect Stat3 binding at these sites in control or oxidant-stimulated cells, suggesting that oxidised Stat3 regulates these genes indirectly. Our further computational analysis revealed enrichment of hypoxia response elements (HREs) within DE gene promoters, implying a role for Hif-1. Experimental validation revealed that efficient stabilisation of Hif-1α in response to oxidative stress or hypoxia required an oxidation-competent Stat3 and that depletion of Hif-1α suppressed the inducible expression of Kcnb1, a representative DE gene. Our data suggest that Stat3 and Hif-1α cooperate to regulate genes involved in immune functions and developmental processes in response to oxidative stress.
Ribosomal DNA (rDNA) displays substantial inter-individual genetic variation in human and mouse. A systematic analysis of how this variation impacts epigenetic states and expression of the rDNA has ...thus far not been performed.
Using a combination of long- and short-read sequencing, we establish that 45S rDNA units in the C57BL/6J mouse strain exist as distinct genetic haplotypes that influence the epigenetic state and transcriptional output of any given unit. DNA methylation dynamics at these haplotypes are dichotomous and life-stage specific: at one haplotype, the DNA methylation state is sensitive to the in utero environment, but refractory to post-weaning influences, whereas other haplotypes entropically gain DNA methylation during aging only. On the other hand, individual rDNA units in human show limited evidence of genetic haplotypes, and hence little discernible correlation between genetic and epigenetic states. However, in both species, adjacent units show similar epigenetic profiles, and the overall epigenetic state at rDNA is strongly positively correlated with the total rDNA copy number. Analysis of different mouse inbred strains reveals that in some strains, such as 129S1/SvImJ, the rDNA copy number is only approximately 150 copies per diploid genome and DNA methylation levels are < 5%.
Our work demonstrates that rDNA-associated genetic variation has a considerable influence on rDNA epigenetic state and consequently rRNA expression outcomes. In the future, it will be important to consider the impact of inter-individual rDNA (epi)genetic variation on mammalian phenotypes and diseases.
Pluripotency defines the unlimited potential of individual cells of vertebrate embryos, from which all adult somatic cells and germ cells are derived. Understanding how the programming of ...pluripotency evolved has been obscured in part by a lack of data from lower vertebrates; in model systems such as frogs and zebrafish, the function of the pluripotency genes NANOG and POU5F1 have diverged. Here, we investigated how the axolotl ortholog of NANOG programs pluripotency during development. Axolotl NANOG is absolutely required for gastrulation and germ-layer commitment. We show that in axolotl primitive ectoderm (animal caps; ACs) NANOG and NODAL activity, as well as the epigenetic modifying enzyme DPY30, are required for the mass deposition of H3K4me3 in pluripotent chromatin. We also demonstrate that all 3 protein activities are required for ACs to establish the competency to differentiate toward mesoderm. Our results suggest the ancient function of NANOG may be establishing the competence for lineage differentiation in early cells. These observations provide insights into embryonic development in the tetrapod ancestor from which terrestrial vertebrates evolved.
Pregnancy represents a stage during which maternal physiology and homeostatic regulation undergo dramatic change and adaptation. The fundamental purpose of these adaptations is to ensure the survival ...of her offspring through adequate nutrient provision and an environment that is tolerant to the semi-allogenic foetus. While poor maternal diet during pregnancy is associated with perturbed maternal adaptations during pregnancy, the influence of paternal diet on maternal well-being is less clearly defined. We fed C57BL/6 male mice either a control (CD), low protein diet (LPD), a high fat/sugar Western diet (WD) or the LPD or WD supplemented with methyl donors (MD-LPD and MD-WD, respectively) for a minimum of 8 weeks prior to mating with C57BL/6 females. Mated females were culled at day 17 of gestation for the analysis of maternal metabolic, gut, cardiac and bone health. Paternal diet had minimal influences on maternal serum and hepatic metabolite levels or gut microbiota diversity. However, analysis of the maternal hepatic transcriptome revealed distinct profiles of differential gene expression in response to the diet of the father. Paternal LPD and MD-LPD resulted in differential expression of genes associated with lipid metabolism, transcription, ubiquitin conjugation and immunity in dams, while paternal WD and MD-WD modified the expression of genes associated with ubiquitin conjugation and cardiac morphology. Finally, we observed changes in maternal femur length, volume of trabecular bone, trabecular connectivity, volume of the cortical medullar cavity and thickness of the cortical bone in response to the father’s diets. Our current study demonstrates that poor paternal diet at the time of mating can influence the patterns of maternal metabolism and gestation-associated adaptations to her physiology.
The ongoing SARS-CoV-2 pandemic demonstrates the utility of real-time sequence analysis in monitoring and surveillance of pathogens. However, cost-effective sequencing requires that samples be PCR ...amplified and multiplexed
barcoding onto a single flow cell, resulting in challenges with maximising and balancing coverage for each sample. To address this, we developed a real-time analysis pipeline to maximise flow cell performance and optimise sequencing time and costs for any amplicon based sequencing. We extended our nanopore analysis platform MinoTour to incorporate ARTIC network bioinformatics analysis pipelines. MinoTour predicts which samples will reach sufficient coverage for downstream analysis and runs the ARTIC networks Medaka pipeline once sufficient coverage has been reached. We show that stopping a viral sequencing run earlier, at the point that sufficient data has become available, has no negative effect on subsequent down-stream analysis. A separate tool, SwordFish, is used to automate adaptive sampling on Nanopore sequencers during the sequencing run. This enables normalisation of coverage both within (amplicons) and between samples (barcodes) on barcoded sequencing runs. We show that this process enriches under-represented samples and amplicons in a library as well as reducing the time taken to obtain complete genomes without affecting the consensus sequence.
Improved tuberculosis control and the need to contain the spread of drug-resistant strains provide a strong rationale for exploring tuberculosis transmission dynamics at the population level. ...Whole-genome sequencing provides optimal strain resolution, facilitating detailed mapping of potential transmission pathways.
We sequenced 22 isolates from a Mycobacterium tuberculosis cluster in New South Wales, Australia, identified during routine 24-locus mycobacterial interspersed repetitive unit typing. Following high-depth paired-end sequencing using the Illumina HiSeq 2000 platform, two independent pipelines were employed for analysis, both employing read mapping onto reference genomes as well as de novo assembly, to control biases in variant detection. In addition to single-nucleotide polymorphisms, the analyses also sought to identify insertions, deletions and structural variants.
Isolates were highly similar, with a distance of 13 variants between the most distant members of the cluster. The most sensitive analysis classified the 22 isolates into 18 groups. Four of the isolates did not appear to share a recent common ancestor with the largest clade; another four isolates had an uncertain ancestral relationship with the largest clade.
Whole genome sequencing, with analysis of single-nucleotide polymorphisms, insertions, deletions, structural variants and subpopulations, enabled the highest possible level of discrimination between cluster members, clarifying likely transmission pathways and exposing the complexity of strain origin. The analysis provides a basis for targeted public health intervention and enhanced classification of future isolates linked to the cluster.
A sexual cycle was described in 2009 for the opportunistic fungal pathogen
, opening up for the first time the possibility of using techniques reliant on sexual crossing for genetic analysis. The ...present study was undertaken to evaluate whether the technique 'bulk segregant analysis' (BSA), which involves detection of differences between pools of progeny varying in a particular trait, could be applied in conjunction with next-generation sequencing to investigate the underlying basis of monogenic traits in
. Resistance to the azole antifungal itraconazole was chosen as a model, with a dedicated bioinformatic pipeline developed to allow identification of SNPs that differed between the resistant progeny pool and resistant parent compared to the sensitive progeny pool and parent. A clinical isolate exhibiting monogenic resistance to itraconazole of unknown basis was crossed to a sensitive parent and F1 progeny used in BSA. In addition, the use of backcrossing and increasing the number in progeny pools was evaluated as ways to enhance the efficiency of BSA. Use of F1 pools of 40 progeny led to the identification of 123 candidate genes with SNPs distributed over several contigs when aligned to an A1163 reference genome. Successive rounds of backcrossing enhanced the ability to identify specific genes and a genomic region, with BSA of progeny (using 40 per pool) from a third backcross identifying 46 genes with SNPs, and BSA of progeny from a sixth backcross identifying 20 genes with SNPs in a single 292 kb region of the genome. The use of an increased number of 80 progeny per pool also increased the resolution of BSA, with 29 genes demonstrating SNPs between the different sensitive and resistant groupings detected using progeny from just the second backcross with the majority of variants located on the same 292 kb region. Further bioinformatic analysis of the 292 kb region identified the presence of a
gene variant resulting in a methionine to lysine (M220K) change in the CYP51A protein, which was concluded to be the causal basis of the observed resistance to itraconazole. The future use of BSA in genetic analysis of
is discussed.