Gaps in colonoscopy skills among endoscopists, primarily due to experience, have been identified, and solutions are critically needed. Hence, the development of a real-time robust detection system ...for colorectal neoplasms is considered to significantly reduce the risk of missed lesions during colonoscopy. Here, we develop an artificial intelligence (AI) system that automatically detects early signs of colorectal cancer during colonoscopy; the AI system shows the sensitivity and specificity are 97.3% (95% confidence interval CI = 95.9%-98.4%) and 99.0% (95% CI = 98.6%-99.2%), respectively, and the area under the curve is 0.975 (95% CI = 0.964-0.986) in the validation set. Moreover, the sensitivities are 98.0% (95% CI = 96.6%-98.8%) in the polypoid subgroup and 93.7% (95% CI = 87.6%-96.9%) in the non-polypoid subgroup; To accelerate the detection, tensor metrics in the trained model was decomposed, and the system can predict cancerous regions 21.9 ms/image on average. These findings suggest that the system is sufficient to support endoscopists in the high detection against non-polypoid lesions, which are frequently missed by optical colonoscopy. This AI system can alert endoscopists in real-time to avoid missing abnormalities such as non-polypoid polyps during colonoscopy, improving the early detection of this disease.
Ribonucleotides incorporated in the genome are a source of endogenous DNA damage and also serve as signals for repair. Although recent advances of ribonucleotide detection by sequencing, the balance ...between incorporation and repair of ribonucleotides has not been elucidated. Here, we describe a competitive sequencing method, Ribonucleotide Scanning Quantification sequencing (RiSQ‐seq), which enables absolute quantification of misincorporated ribonucleotides throughout the genome by background normalization and standard adjustment within a single sample. RiSQ‐seq analysis of cells harboring wild‐type DNA polymerases revealed that ribonucleotides were incorporated nonuniformly in the genome with a 3′‐shifted distribution and preference for GC sequences. Although ribonucleotide profiles in wild‐type and repair‐deficient mutant strains showed a similar pattern, direct comparison of distinct ribonucleotide levels in the strains by RiSQ‐seq enabled evaluation of ribonucleotide excision repair activity at base resolution and revealed the strand bias of repair. The distinct preferences of ribonucleotide incorporation and repair create vulnerable regions associated with indel hotspots, suggesting that repair at sites of ribonucleotide misincorporation serves to maintain genome integrity and that RiSQ‐seq can provide an estimate of indel risk.
Ribonucleotide Scanning Quantification sequencing (RiSQ‐seq) enables absolute quantification of misincorporated ribonucleotides (rNMPs) in the genome by competitive sequencing between rNMPs and backgrounds. The rNMP quantification revealed non‐uniform incorporation and repair activity of rNMP. The Yin‐yang pattern of rNMP repair activity creates vulnerable regions associated with hotspots of insert‐deletion (indel) mutations. RiSQ‐seq can provide an estimate of indel risk.
The notion of species as reproductively isolated units related through a bifurcating tree implies that gene trees should generally agree with the species tree and that sister taxa should not share ...polymorphisms unless they diverged recently and should be equally closely related to outgroups. It is now possible to evaluate this model systematically. We sequenced multiple individuals from 27 described taxa representing the entire Arabidopsis genus. Cluster analysis identified seven groups, corresponding to described species that capture the structure of the genus. However, at the level of gene trees, only the separation of Arabidopsis thaliana from the remaining species was universally supported, and, overall, the amount of shared polymorphism demonstrated that reproductive isolation was considerably more recent than the estimated divergence times. We uncovered multiple cases of past gene flow that contradict a bifurcating species tree. Finally, we showed that the pattern of divergence differs between gene ontologies, suggesting a role for selection.
Genome duplication with hybridization, or allopolyploidization, occurs commonly in plants, and is considered to be a strong force for generating new species. However, genome-wide quantification of ...homeolog expression ratios was technically hindered because of the high homology between homeologous gene pairs. To quantify the homeolog expression ratio using RNA-seq obtained from polyploids, a new method named HomeoRoq was developed, in which the genomic origin of sequencing reads was estimated using mismatches between the read and each parental genome. To verify this method, we first assembled the two diploid parental genomes of Arabidopsis halleri subsp. gemmifera and Arabidopsis lyrata subsp. petraea (Arabidopsis petraea subsp. umbrosa), then generated a synthetic allotetraploid, mimicking the natural allopolyploid Arabidopsis kamchatica. The quantified ratios corresponded well to those obtained by Pyrosequencing. We found that the ratios of homeologs before and after cold stress treatment were highly correlated (r = 0.870). This highlights the presence of nonstochastic polyploid gene regulation despite previous research identifying stochastic variation in expression. Moreover, our new statistical test incorporating overdispersion identified 226 homeologs (1.11% of 20 369 expressed homeologs) with significant ratio changes, many of which were related to stress responses. HomeoRoq would contribute to the study of the genes responsible for polyploid-specific environmental responses.
When only a small number of labeled samples are available, supervised dimensionality reduction methods tend to perform poorly because of overfitting. In such cases, unlabeled samples could be useful ...in improving the performance. In this paper, we propose a semi-supervised dimensionality reduction method which preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other. The proposed method, which we call
SEmi-supervised Local Fisher discriminant analysis
(SELF), has an analytic form of the globally optimal solution and it can be computed based on eigen-decomposition. We show the usefulness of SELF through experiments with benchmark and real-world document classification datasets.
The habitats of polyploid species are generally distinct from their parental species. Stebbins described polyploids as ‘general purpose genotypes’, which can tolerate a wide range of environmental ...conditions. However, little is known about its molecular basis because of the complexity of polyploid genomes. We hypothesized that allopolyploid species might utilize the expression patterns of both parents depending on environments (polyploid plasticity hypothesis). We focused on hydrological niche segregation along fine‐scale soil moisture and waterlogging gradients. Two diploid species, Cardamine amara and Cardamine hirsuta, grew best in submerged and unsubmerged conditions, respectively, consistent with their natural habitats. Interestingly, the allotetraploid Cardamine flexuosa derived from them grew similarly in fluctuating as well as submerged and unsubmerged conditions, consistent with its wide environmental tolerance. A similar pattern was found in another species trio: allotetraploid Cardamine scutata and its parents. Using the close relatedness of Cardamine and Arabidopsis, we quantified genomewide expression patterns following dry and wet treatments using an Arabidopsis microarray. Hierarchical clustering analysis revealed that the expression pattern of C. flexuosa clustered with C. hirsuta in the dry condition and with C. amara in the wet condition, supporting our hypothesis. Furthermore, the induction levels of most genes in the allopolyploid were lower than in a specialist diploid species. This reflects a disadvantage of being allopolyploid arising from fixed heterozygosity. We propose that recurrent allopolyploid speciation along soil moisture and waterlogging gradients confers niche differentiation and reproductive isolation simultaneously and serves as a model for studying the molecular basis of ecological speciation and adaptive radiation.
The exposure of germ cells to radiation introduces mutations in the genomes of offspring, and a previous whole-genome sequencing study indicated that the irradiation of mouse sperm induces ...insertions/deletions (indels) and multisite mutations (clustered single nucleotide variants and indels). However, the current knowledge on the mutation spectra is limited, and the effects of radiation exposure on germ cells at stages other than the sperm stage remain unknown. Here, we performed whole-genome sequencing experiments to investigate the exposure of spermatogonia and mature oocytes. We compared de novo mutations in a total of 24 F1 mice conceived before and after the irradiation of their parents. The results indicated that radiation exposure, 4 Gy of gamma rays, induced 9.6 indels and 2.5 multisite mutations in spermatogonia and 4.7 indels and 3.1 multisite mutations in mature oocytes in the autosomal regions of each F1 individual. Notably, we found two types of deletions, namely, small deletions (mainly 1~12 nucleotides) in non-repeat sequences, many of which showed microhomology at the breakpoint junction, and single-nucleotide deletions in mononucleotide repeat sequences. The results suggest that these deletions and multisite mutations could be a typical signature of mutations induced by parental irradiation in mammals.
Statistical significance of combinatorial regulations Terada, Aika; Okada-Hatakeyama, Mariko; Tsuda, Koji ...
Proceedings of the National Academy of Sciences - PNAS,
08/2013, Volume:
110, Issue:
32
Journal Article
Peer reviewed
Open access
More than three transcription factors often work together to enable cells to respond to various signals. The detection of combinatorial regulation by multiple transcription factors, however, is not ...only computationally nontrivial but also extremely unlikely because of multiple testing correction. The exponential growth in the number of tests forces us to set a strict limit on the maximum arity. Here, we propose an efficient branch-and-bound algorithm called the “limitless arity multiple-testing procedure” (LAMP) to count the exact number of testable combinations and calibrate the Bonferroni factor to the smallest possible value. LAMP lists significant combinations without any limit, whereas the family-wise error rate is rigorously controlled under the threshold. In the human breast cancer transcriptome, LAMP discovered statistically significant combinations of as many as eight binding motifs. This method may contribute to uncover pathways regulated in a coordinated fashion and find hidden associations in heterogeneous data.
One of the major issues in genome-wide association studies is to solve the missing heritability problem. While considering epistatic interactions among multiple SNPs may contribute to solving this ...problem, existing software cannot detect statistically significant high-order interactions. We propose software named LAMPLINK, which employs a cutting-edge method to enumerate statistically significant SNP combinations from genome-wide case-control data. LAMPLINK is implemented as a set of additional functions to PLINK, and hence existing procedures with PLINK can be applicable. Applied to the 1000 Genomes Project data, LAMPLINK detected a combination of five SNPs that are statistically significantly accumulated in the Japanese population.
LAMPLINK is available at http://a-terada.github.io/lamplink/ CONTACT: terada@cbms.k.u-tokyo.ac.jp or sese.jun@aist.go.jpSupplementary information: Supplementary data are available at Bioinformatics online.
Recently, next-generation sequencing techniques have been applied for the detection of RNA secondary structures, which is referred to as high-throughput RNA structural (HTS) analyses, and many ...different protocols have been used to detect comprehensive RNA structures at single-nucleotide resolution. However, the existing computational analyses heavily depend on the experimental methodology to generate data, which results in difficulties associated with statistically sound comparisons or combining the results obtained using different HTS methods.
Here, we introduced a statistical framework, reactIDR, which can be applied to the experimental data obtained using multiple HTS methodologies. Using this approach, nucleotides are classified into three structural categories, loop, stem/background, and unmapped. reactIDR uses the irreproducible discovery rate (IDR) with a hidden Markov model to discriminate between the true and spurious signals obtained in the replicated HTS experiments accurately, and it is able to incorporate an expectation-maximization algorithm and supervised learning for efficient parameter optimization. The results of our analyses of the real-life HTS data showed that reactIDR had the highest accuracy in the classification of ribosomal RNA stem/loop structures when using both individual and integrated HTS datasets, and its results corresponded the best to the three-dimensional structures.
We have developed a novel software, reactIDR, for the prediction of stem/loop regions from the HTS analysis datasets. For the rRNA structure analyses, reactIDR was shown to have robust accuracy across different datasets by using the reproducibility criterion, suggesting its potential for increasing the value of existing HTS datasets. reactIDR is publicly available at https://github.com/carushi/reactIDR .