Several new genomics technologies have become available that offer long-read sequencing or long-range mapping with higher throughput and higher resolution analysis than ever before. These long-range ...technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overcoming their complex errors and modalities. Here, we discuss several of the most important applications of the new technologies, focusing on both the currently available bioinformatics tools and opportunities for future research.
Azo-coupled macrocyclic chromoionophores incorporating benzene (L(1)) and pyridine (L(2)) subunits were synthesized, respectively. In a cation-induced color change experiment, both receptors showed ...Hg(2+) selectivity. However, L(1) gave a larger cation-induced hypsochromic shift than L(2), suggesting that the presence of the pyridine unit in L(2) may inhibit the Hg...N-azo interaction. The observed Hg(2+)-selective color changes for L(1) and L(2) were found to be controlled by anion-coordination ability. NMR titration of the proposed receptor ligand with Hg(II) salt was accomplished.
Epigenetic landscapes can shape physiologic and disease phenotypes. We used integrative, high resolution multi-omics methods to delineate the methylome landscape and characterize the oncogenic ...drivers of esophageal squamous cell carcinoma (ESCC). We found 98% of CpGs are hypomethylated across the ESCC genome. Hypo-methylated regions are enriched in areas with heterochromatin binding markers (H3K9me3, H3K27me3), while hyper-methylated regions are enriched in polycomb repressive complex (EZH2/SUZ12) recognizing regions. Altered methylation in promoters, enhancers, and gene bodies, as well as in polycomb repressive complex occupancy and CTCF binding sites are associated with cancer-specific gene dysregulation. Epigenetic-mediated activation of non-canonical WNT/β-catenin/MMP signaling and a YY1/lncRNA ESCCAL-1/ribosomal protein network are uncovered and validated as potential novel ESCC driver alterations. This study advances our understanding of how epigenetic landscapes shape cancer pathogenesis and provides a resource for biomarker and target discovery.
Rhododendron sobayakiense is an endemic and near-threatened species (Korean Red List, NT) found in the alpine regions of South Korea that requires conservation. This study investigated the species’ ...genetic variations and seed germination characteristics and predicted its potential habitat change according to climate change scenarios. The genetic diversity of R. sobayakiense at the species level (P = 88.6%; S.I. = 0.435; h = 0.282) was somewhat similar to that observed for the same genus. The inter-population genetic differentiation was 19% and revealed a relatively stable level of gene exchange at 1.22 in each population. The main cause of gene flow and genetic differentiation was presumed to be the Apis mellifera pollinator. Seed germination characteristics indicated non-deep physiological dormancy, with germination at ≥10 °C and the highest percent germination (PG) of ≥60% at 15–25 °C, while the PG was ≥50% at 30 °C. The PG increased at constant temperature than at variable temperatures; the mean germination time decreased as temperature increased. The climate scenarios SSP3 and SSP5 were analyzed to predict future R. sobayakiense habitat changes. The variables of the main effects were Identified as follows: elevation > temperature seasonality > mean diurnal range.
Long-read and short-read sequencing technologies offer competing advantages for eukaryotic genome sequencing projects. Combinations of both may be appropriate for surveys of within-species genomic ...variation.
We developed a hybrid assembly pipeline called "Alpaca" that can operate on 20X long-read coverage plus about 50X short-insert and 50X long-insert short-read coverage. To preclude collapse of tandem repeats, Alpaca relies on base-call-corrected long reads for contig formation.
Compared to two other assembly protocols, Alpaca demonstrated the most reference agreement and repeat capture on the rice genome. On three accessions of the model legume Medicago truncatula, Alpaca generated the most agreement to a conspecific reference and predicted tandemly repeated genes absent from the other assemblies.
Our results suggest Alpaca is a useful tool for investigating structural and copy number variation within de novo assemblies of sampled populations.
Berberis koreana Palibin is an endemic plant native to Korea. In this study, we aimed to study the seed germination of this species using a water imbibition experiment, gibberellic acid (GA3) ...treatment (0, 10, 100, or 1000 mg·L−1), cold stratification (0, 2, 4, 8, or 12 weeks at 4 °C), move-along experiment, and phenology studies. In the water imbibition experiment, the weight of the seeds increased by more than 120% in 24 h. An analysis of the internal and external morphological characteristics of the seed revealed that the embryo was already fully grown from the fruit and did not grow thereafter. The final germination percentages for the cold stratification at 0, 2, 4, 8, and 12 weeks at 4 °C were 12 ± 3.65, 32 ± 9.09, 59 ± 1.00, 59 ± 9.59, and 71 ± 1.91%, respectively. In the move-along experiment and phenology studies, a longer low-temperature treatment period resulted in a higher germination percentage. However, the GA3 treatment had little effect on the seed germination. Our results indicate that B. koreana exhibits an intermediate physiological seed dormancy.
Essential tremor (ET) is the most common adult-onset movement disorder. In the present study, we performed whole exome sequencing of a large ET-affected family (10 affected and 6 un-affected family ...members) and identified a TUB p.V431I variant (rs75594955) segregating in a manner consistent with autosomal-dominant inheritance. Subsequent targeted re-sequencing of TUB in 820 unrelated individuals with sporadic ET and 630 controls revealed significant enrichment of rare nonsynonymous TUB variants (e.g. rs75594955: p.V431I, rs1241709665: p.Ile20Phe, rs55648406: p.Arg49Gln) in the ET cohort (SKAT-O test p-value = 6.20e-08). TUB encodes a transcription factor predominantly expressed in neuronal cells and has been previously implicated in obesity. ChIP-seq analyses of the TUB transcription factor across different regions of the mouse brain revealed that TUB regulates the pathways responsible for neurotransmitter production as well thyroid hormone signaling. Together, these results support the association of rare variants in TUB with ET.
Genomics is expanding from a single reference per species paradigm into a more comprehensive pan-genome approach that analyzes multiple individuals together. A compressed de Bruijn graph is a ...sophisticated data structure for representing the genomes of entire populations. It robustly encodes shared segments, simple single-nucleotide polymorphisms and complex structural variations far beyond what can be represented in a collection of linear sequences alone.
We explore deep topological relationships between suffix trees and compressed de Bruijn graphs and introduce an algorithm, splitMEM, that directly constructs the compressed de Bruijn graph in time and space linear to the total number of genomes for a given maximum genome size. We introduce suffix skips to traverse several suffix links simultaneously and use them to efficiently decompose maximal exact matches into graph nodes. We demonstrate the utility of splitMEM by analyzing the nine-strain pan-genome of Bacillus anthracis and up to 62 strains of Escherichia coli, revealing their core-genome properties.
Genome resequencing and short read mapping are two of the primary tools of genomics and are used for many important applications. The current state-of-the-art in mapping uses the quality values and ...mapping quality scores to evaluate the reliability of the mapping. These attributes, however, are assigned to individual reads and do not directly measure the problematic repeats across the genome. Here, we present the Genome Mappability Score (GMS) as a novel measure of the complexity of resequencing a genome. The GMS is a weighted probability that any read could be unambiguously mapped to a given position and thus measures the overall composition of the genome itself.
We have developed the Genome Mappability Analyzer to compute the GMS of every position in a genome. It leverages the parallelism of cloud computing to analyze large genomes, and enabled us to identify the 5-14% of the human, mouse, fly and yeast genomes that are difficult to analyze with short reads. We examined the accuracy of the widely used BWA/SAMtools polymorphism discovery pipeline in the context of the GMS, and found discovery errors are dominated by false negatives, especially in regions with poor GMS. These errors are fundamental to the mapping process and cannot be overcome by increasing coverage. As such, the GMS should be considered in every resequencing project to pinpoint the 'dark matter' of the genome, including of known clinically relevant variations in these regions.
The source code and profiles of several model organisms are available at http://gma-bio.sourceforge.net