Today, with the development of technology, the decision-making capabilities of machines have also increased. With their high analytical skills, computers can easily catch points and relationships ...that may escape the human eye. Thanks to these capabilities, machines are also widely used in the field of health. For example, many machine learning techniques developed on cancer prediction have been successfully applied. Early detection of cancer is crucial to survival. In the early diagnosis of cancer, the rates of drug treatment, chemotherapy or radiotherapy that the person will be exposed to are significantly reduced and the patient gets through this process with the least amount of wear and tear. Gene Expression Cancer RNA-Seq Dataset was used in this study. This data set includes gene expression values of 5 cancer types (BRCA, KIRC, LUAD, LUSC, UCEC). DNA sequences in the dataset were analyzed using k-means and hierarchical clustering algorithms, which are unsupervised machine learning methods. The aim of the study is to develop a usable machine learning model for early detection of cancer at the gene level. Adjusted Rand Index (ARI), Silhouette Score, and Accuracy metrics were used to evaluate the analysis results. The rand index calculates similarity between clusters by counting the binaries assigned to clusters. The adjusted Rand Index is a randomly adjusted version of the Rand Index. The silhouette score indicates how well a data point fits within its own set among separated datasets. The accuracy metric is obtained as a percentage of correctly clustered data points divided by all predictions. Different connection methods are used in the hierarchical clustering algorithm. These are 'complete', 'ward', 'average' and 'single'. As a result of the study, the accuracy in the k-means algorithm was 0.990, the Adjusted Rand Index was 0.79, and the Silhouette Score was 0.14. Looking at the hierarchical clustering, ward performed the best of the four linkage methods, with an ARI score of 0.76 and a silhouette score of 0.13. As a result of the study, the accuracy of in the hierarchical clustering algorithm was 0.999.
Bayesian relaxed-clock dating has significantly influenced our understanding of the timeline of biotic evolution. This approach requires the use of priors on the branching process, yet little is ...known about their impact on divergence time estimates. We investigated the effect of branching priors using the iconic cycads. We conducted phylogenetic estimations for 237 cycad species using three genes and two calibration strategies incorporating up to six fossil constraints to (i) test the impact of two different branching process priors on age estimates, (ii) assess which branching prior better fits the data, (iii) investigate branching prior impacts on diversification analyses, and (iv) provide insights into the diversification history of cycads.
Using Bayes factors, we compared divergence time estimates and the inferred dynamics of diversification when using Yule versus birth-death priors. Bayes factors were calculated with marginal likelihood estimated with stepping-stone sampling. We found striking differences in age estimates and diversification dynamics depending on prior choice. Dating with the Yule prior suggested that extant cycad genera diversified in the Paleogene and with two diversification rate shifts. In contrast, dating with the birth-death prior yielded Neogene diversifications, and four rate shifts, one for each of the four richest genera. Nonetheless, dating with the two priors provided similar age estimates for the divergence of cycads from Ginkgo (Carboniferous) and their crown age (Permian). Of these, Bayes factors clearly supported the birth-death prior.
These results suggest the choice of the branching process prior can have a drastic influence on our understanding of evolutionary radiations. Therefore, all dating analyses must involve a model selection process using Bayes factors to select between a Yule or birth-death prior, in particular on ancient clades with a potential pattern of high extinction. We also provide new insights into the history of cycad diversification because we found (i) periods of extinction along the long branches of the genera consistent with fossil data, and (ii) high diversification rates within the Miocene genus radiations.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Soybean (Glycine max) is a major contributor to the world oilseed production. Its seed oil content has been increased through soybean domestication and improvement. However, the genes underlying the ...selection are largely unknown.
The present contribution analyzed the expression patterns of genes in the seed oil quantitative trait loci with strong selective sweep signals, then used association, functional study and population genetics to reveal a sucrose efflux transporter gene, GmSWEET39, controlling soybean seed oil content and under selection.
GmSWEET39 is highly expressed in soybean seeds and encodes a plasma membrane-localized protein. Its expression level is positively correlated with soybean seed oil content. The variation in its promoter and coding sequence leads to different natural alleles of this gene. The GmSWEET39 allelic effects on total oil content were confirmed in the seeds of soybean recombinant inbred lines, transgenic Arabidopsis, and transgenic soybean hairy roots. The frequencies of its superior alleles increased from wild soybean to cultivated soybean, and are much higher in released soybean cultivars.
The findings herein suggest that the sequence variation in GmSWEET39 affects its relative expression and oil content in soybean seeds, and GmSWEET39 has been selected to increase seed oil content during soybean domestication and improvement.
Characterization of somatic mutations at single-cell resolution is essential to study cancer evolution, clonal mosaicism and cell plasticity. Here, we describe SComatic, an algorithm designed for the ...detection of somatic mutations in single-cell transcriptomic and ATAC-seq (assay for transposase-accessible chromatin sequence) data sets directly without requiring matched bulk or single-cell DNA sequencing data. SComatic distinguishes somatic mutations from polymorphisms, RNA-editing events and artefacts using filters and statistical tests parameterized on non-neoplastic samples. Using >2.6 million single cells from 688 single-cell RNA-seq (scRNA-seq) and single-cell ATAC-seq (scATAC-seq) data sets spanning cancer and non-neoplastic samples, we show that SComatic detects mutations in single cells accurately, even in differentiated cells from polyclonal tissues that are not amenable to mutation detection using existing methods. Validated against matched genome sequencing and scRNA-seq data, SComatic achieves F1 scores between 0.6 and 0.7 across diverse data sets, in comparison to 0.2-0.4 for the second-best performing method. In summary, SComatic permits de novo mutational signature analysis, and the study of clonal heterogeneity and mutational burdens at single-cell resolution.
Selective breeding is increasingly recognized as a key component of sustainable production of aquaculture species. The uptake of genomic technology in aquaculture breeding has traditionally lagged ...behind terrestrial farmed animals. However, the rapid development and application of sequencing technologies has allowed aquaculture to narrow the gap, leading to substantial genomic resources for all major aquaculture species. While high‐density single‐nucleotide polymorphism (SNP) arrays for some species have been developed recently, direct genotyping by sequencing (GBS) techniques have underpinned many of the advances in aquaculture genetics and breeding to date. In particular, restriction‐site associated DNA sequencing (RAD‐Seq) and subsequent variations have been extensively applied to generate population‐level SNP genotype data. These GBS techniques are not dependent on prior genomic information such as a reference genome assembly for the species of interest. As such, they have been widely utilized by researchers and companies focussing on nonmodel aquaculture species with relatively small research communities. Applications of RAD‐Seq techniques have included generation of genetic linkage maps, performing genome‐wide association studies, improvements of reference genome assemblies and, more recently, genomic selection for traits of interest to aquaculture like growth, sex determination or disease resistance. In this review, we briefly discuss the history of GBS, the nuances of the various GBS techniques, bioinformatics approaches and application of these techniques to various aquaculture species.
Simultaneous multiplexed analysis can provide comprehensive information for disease diagnosis. However, the current multiplex methods rely on sophisticated barcode technology, which hinders its wider ...application. In this study, an ultrasimple size encoding method is proposed for multiplex detection using a wedge-shaped microfluidic chip. Driving by negative pressure, microparticles are naturally arranged in distinct stripes based on their sizes within the chip. This size encoding method demonstrates a high level of precision, allowing for accuracy in distinguishing 3–5 sizes of microparticles with a remarkable accuracy rate of up to 99%, even the microparticles with a size difference as small as 0.5 μm. The entire size encoding process is completed in less than 5 min, making it ultrasimple, reliable, and easy to operate. To evaluate the function of this size encoding microfluidic chip, three commonly co-infectious viruses’ nucleic acid sequences (including complementary DNA sequences of HIV and HCV, and DNA sequence of HBV) are employed for multiplex detection. Results indicate that all three DNA sequences can be sensitively detected without any cross-interference. This size-encoding microfluidic chip-based multiplex detection method is simple, rapid, and high-resolution, its successful application in serum samples renders it highly promising for potential clinical promotion.
•This work proposed a size encoding method for multiplex detection using a wedgeshaped microfluidic chip.•This size encoding method demonstrates a high level of precision, allowing for accurately distinguishing 3-5 sizes of microparticles with a remarkable accuracy rate of up to 99%.•This chip can even separate the microparticles with a size difference as small as 0.5 μm.•The entire size coding process is completed in less than 5 minutes, making it simple and easy to operate.•Three commonly co-infectious viruses’ nucleic acid sequences (including complementary DNA sequences of HIV and HCV, and DNA sequence of HBV) were employed to verify the efficacy of this size encoding method, and the results showed the detection is simple, rapid, specific, and reliable.
An image encryption scheme is proposed using high-dimensional chaotic systems and cycle operation for DNA sequences. In the scheme, the pixels of the original image are encoded randomly with the DNA ...coding rule controlled by a key stream produced from Chen’s hyper-chaos. In addition to confusion on the DNA sequence matrix with Lorenz system, a cycle operation for DNA sequences is projected to diffuse the pixel values of the image. In order to enhance the diffusion effect, a bitwise exclusive-OR operation is carried out for the decoded matrices with a binary key stream, and then the cipher-image is obtained. Simulation results demonstrate that the proposed image encryption scheme with acceptable robustness is secure against exhaustive attack, statistical attack and differential attack.
•cDNA and genomic DNA sequences of the novel S59 allele were determined.•S59-specific and S10-specific PCR detection methods were developed for rapid identification from genomic ...DNA.•Self-incompatibility genotypes of over twenty accessions were characterized.
Apples (Malus × domestica Borkh.) are an economically important crop in many temperate growing regions around the world. Apple being characterized by gametophytic self-incompatibility (GSI) requires cross-pollination with compatible apple pollen during bloom to achieve sufficient fruit set for commercial production. For this reason, in commercial orchards it is common practice to plant pollinizer trees, which can be either crabapples or different apple cultivars, at a density of 5–10%. Crabapple trees, in addition to their ornamental traits, are valued as a source of genetic diversity for pollinating apple orchards. The genetics underlying cross-compatible responses among crabapples and domesticated apples, especially recently released cultivars, have been largely understudied. In this study, we characterized one novel S-RNase allele from the crabapple Malus ‘Doubloons’, named S59, and report an allele-specific PCR method for detection of this allele from genomic DNA. Further, we characterized the self-incompatibility genotypes (S-genotypes) of over twenty previously unreported Malus accessions, including some recent releases from the breeding programs of the University of Minnesota and Washington State University. The results of this work aim to provide new information about cross-compatibility of cultivars and pollinizers and may be used to aid parent selection in apple breeding programs, as well as pollinizer selection for the commercial orchard.
Analyzing phylogenetic relationships using mathematical methods has always been of importance in bioinformatics. Quantitative research may interpret the raw biological data in a precise way. Multiple ...Sequence Alignment (MSA) is used frequently to analyze biological evolutions, but is very time-consuming. When the scale of data is large, alignment methods cannot finish calculation in reasonable time. Therefore, we present a new method using moments of cumulative Fourier power spectrum in clustering the DNA sequences. Each sequence is translated into a vector in Euclidean space. Distances between the vectors can reflect the relationships between sequences. The mapping between the spectra and moment vector is one-to-one, which means that no information is lost in the power spectra during the calculation. We cluster and classify several datasets including Influenza A, primates, and human rhinovirus (HRV) datasets to build up the phylogenetic trees. Results show that the new proposed cumulative Fourier power spectrum is much faster and more accurately than MSA and another alignment-free method known as k-mer. The research provides us new insights in the study of phylogeny, evolution, and efficient DNA comparison algorithms for large genomes. The computer programs of the cumulative Fourier power spectrum are available at GitHub (https://github.com/YaulabTsinghua/cumulative-Fourier-power-spectrum).
•A novel method using moments of cumulative Fourier power spectrum (CPS) in clustering the DNA sequences.•Each sequence is translated into a vector and the distances between vectors represent the relationships between sequences.•The mapping between the spectra and moment vector is one-to-one, thus much information is kept in this way.•CPS outperforms the traditional MSA and another alignment-free method on both the accuracy and the calculation speed.•We upload the code for CPS on GitHub to help people apply and analyze our method in practice.