Polyploidization plays a critical role in producing new gene functions and promoting species evolution. Effective identification of polyploid types can be helpful in exploring the evolutionary ...mechanism. However, current methods for detecting polyploid types have some major limitations, such as being time-consuming and strong subjectivity, etc. In order to objectively and scientifically recognize collinearity fragments and polyploid types, we developed PolyReco method, which can automatically label collinear regions and recognize polyploidy events based on the
K
S
dotplot. Combining with whole-genome collinearity analysis, PolyReco uses DBSCAN clustering method to cluster
K
S
dots. According to the distance information in the
x
-axis and
y
-axis directions between the categories, the clustering results are merged based on certain rules to obtain the collinear regions, automatically recognize and label collinear fragments. According to the information of the labeled collinear regions on the
y
-axis, the polyploidization recognition algorithm is used to exhaustively combine and obtain the genetic collinearity evaluation index of each combination, and then draw the genetic collinearity evaluation index graph. Based on the inflection point on the graph, polyploid types and related chromosomes with polyploidy signal can be detected. The validation experiments showed that the conclusions of PolyReco were consistent with the previous study, which verified the effectiveness of this method. It is expected that this approach can become a reference architecture for other polyploid types classification methods.
The DBSCAN algorithm is a well-known cluster method that is density-based and has the advantage of finding clusters of different shapes, but it also has certain shortcomings, one of which is that it ...cannot determine the two important parameters Eps (neighborhood of a point) and Mints (minimum number of points) by itself, and the other is that it takes a long time to traverse all points when dataset is large. In this paper, we propose an improved method which is named as K-DBSCAN to improve the running efficiency based on self-adaptive determination of parameters and this method changes the way of traversing and only deals with core points. Experiments show that it outperforms DBSCAN algorithms in terms of running time efficiency.
Density based clustering methods are proposed for clustering spatial databases with noise. Density Based Spatial Clustering of Applications with Noise (DBSCAN) can discover clusters of arbitrary ...shape and also handles outliers effectively. DBSCAN obtains clusters by finding the number of points within the specified distance from a given point. It involves computing distances from given point to all other points in the dataset. The conventional index based methods construct a hierarchical structure over the dataset to speed-up the neighbor search operations. The hierarchical index-structures fail to scale for datasets of dimensionality above 20. In this paper, we propose a novel graph-based index structure method Groups that accelerates the neighbor search operations and also scalable for high dimensional datasets. Experimental results show that the proposed method improves the speed of DBSCAN by a factor of about 1.5–2.2 on benchmark datasets. The performance of DBSCAN degrades considerably with noise due to unnecessary distance computations introduced by noise points while the proposed method is robust to noise by pruning out noise points early and eliminating the unnecessary distance computations. The cluster results produced by our method are exactly similar to that of DBSCAN but executed at a much faster pace.
•A graph-based index structure is built for speeding up neighbor search operations.•No additional inputs are required to build the index structure.•Proposed method is scalable for high-dimensional datasets.•Handles noise effectively to improve the performance of DBSCAN.
With escalating flood risks due to global warming and frequent extreme rainfall events, it is crucial to highlight the importance of flood risk assessment for devising prudent mitigation strategies ...and promoting sustainable development. Against this backdrop, this study proposes a novel regional flood risk grading assessment method, namely the Density-Based Spatial Clustering of Applications with Noise (DBSCAN)-FlowSort method, aimed at comprehensively assessing flood risks in county-level regions of Anhui Province. The innovation of this method lies in its consideration of interactions among the hazard, exposure, and vulnerability subsystems, as well as the comprehensive determination of assessment indicator weights through the Probability Language Term Set-Decision Making Trial and Evaluation Laboratory (PLTS-DEMATEL) and Entropy Weight Method (EWM). Thus, the subjectivity and objectivity of the weights of the indicators are integratedly taken into account. Additionally, this study introduces DBSCAN to generate reference profiles, improving the reliance on expert input in the traditional FlowSort method and enhancing the automation and objectivity of the evaluation process. The results of the study demonstrate that the DBSCAN-FlowSort method exhibits superior classification performance in predicting flood hazards, particularly in accurately identifying and assessing high-risk areas when considering interactions among indicators in different subsystems. This method provides a new scientific tool for flood risk assessment and management, which is crucial for devising flood resource allocation and risk mitigation measures in both theory and practice.
•Flood risk grading assessment in county-level regions is conducted.•PLTS-DEMATEL deals with indicator interactions to obtain more objective weights.•DBSCAN-FlowSort, proposed for flood risk assessment, yields more reasonable results.•Interactions among indicators from different subsystems markedly effect outcomes.
Traditional magnetotelluric (MT) denoising methods often encounter limitations in various scenarios. However, with its robust adaptability and high precision, deep learning has exhibited outstanding ...denoising performance when applied to MT exploration time series data. Recent researches have mainly focused on developing advanced single deep learning models to enhance MT denoising effectiveness. This article introduces a lightweight ensemble learning approach for MT denoising, aiming to enhance denoising performance via a single deep convolutional network. Our ensemble learning strategy uses a sliding window technique to generate overlapping MT time series segments, thereby providing multiple inputs for a specialized noise-fitting network. This variety of inputs enables a comprehensive understanding of MT data, thereby increasing the probability of identifying complex noise patterns. Then, the outputs from these inputs are integrated using a method that combines shifting averages and adaptive thresholding to obtain more accurate fitted noise contours. Furthermore, we apply a three-layer density-based spatial clustering of applications with noise (DBSCAN) methodology to identify the real noise contours among the fitted noise contours and then to get the residual signal by subtracting those real noise contours. Subsequently, the residual signal is further processed by the pretrained denoising network to eliminate noise artifacts. The efficacy of our approach is validated through experiments conducted with both synthetic and field data, demonstrating substantial improvements in denoising, particularly within mid- and low-frequency ranges. Several interrelated parameters exhibit notable improvements, including apparent resistivity and phase curves, time-frequency domain curves, and so on.
Epidemic diffusion is a space-time process, and showing time-series disease maps is a common way to demonstrate an epidemic progression in time and space. Previous studies used time-series maps to ...demonstrate the animation of diffusion process. Epidemic diffusion patterns were determined subjectively by visual inspection, however. There currently are still methodological concerns in developing effective analytical approaches for profiling diffusion dynamics of disease clustering and epidemic propagation. The objective of this study is to develop a geocomputational algorithm, the modified space-time density-based spatial clustering of application with noise (MST-DBSCAN), for detecting, identifying, and visualizing disease cluster evolution, which takes the effect of the incubation period into account. We also map the MST-DBSCAN algorithm output to visualize the diffusion process. Dengue fever case data from 2014 were used as an illustrative case study. Our results show that compared to kernel-smoothed mapping, the MST-DBSCAN algorithm can better identify the evolution type of any cluster at any epoch. Furthermore, using only one two-dimensional map (and graphs), our approach can demonstrate the same diffusion process that time-series maps or three-dimensional space-time kernel plotting displays but in an easy-to-read manner. We conclude that our MST-DBSCAN algorithm can profile the spatial pattern of epidemic diffusion in detail by identifying disease cluster evolution.
This correspondence proposes to estimate the angle of arrival (AOA) and source number with quantized phase-only (PO) measurements extracted via multiple one-bit analog-to-digital converters (ADCs), ...thereby significantly reducing the power consumption. A density-based spatial clustering of applications with noise (DBSCAN-) enhanced expectation-maximization (EM-) generalized approximate message passing (GAMP-) based estimator is developed. Firstly, the AOA estimation problem is converted as detecting supports of cluster-sparse signals and then solved by modified EM-GAMP in a single snapshot. Secondly, the coarse AOA estimates from multiple snapshots are clustered by DBSCAN to estimate the source number and improve the AOA estimation accuracy. Simulation results show that the quantized PO measurements scheme is more energy-efficient than the conventional complex-valued measurements scheme. The AOA and source number estimation performance of this scheme is superior to that of the one-bit quantized measurements scheme due to the extreme quantization loss of the latter. Furthermore, the DBSCAN-enhanced estimator incorporates the AOA estimate results from multiple snapshots and effectively eliminates outlier AOA estimates, thereby improving the AOA estimation performance, particularly at low signal-to-noise ratios.
Explainable clustering provides human-understandable reasons for decisions in black-box learning models. In a previous work, a decision tree built on the set of dimensions was used to define ranges ...of values for k-means clusters. For explainable graph clustering, we use expander graphs instead of dense subgraphs since powering an expander graph is guaranteed to result in a clique after at most a logarithmic number of steps.
Consider a set of multi-dimensional points labeled with k labels. We introduce the heat map sorting problem as reordering the rows and columns of an input matrix (each point is a column and each row is a dimension) such that the labels of the entries of the matrix form connected components (clusters). A cluster is preserved if it remains connected, i.e., if it is not split into several clusters and no two clusters are merged. In the massively parallel computation model (MPC), each machine has a sublinear memory and the total memory of the machines is linear.
We prove the problem is NP-hard. We give a fixed-parameter algorithm in MPC and an approximation algorithm based on expander decomposition. We empirically compare our algorithm with explainable k-means on several graphs of email and computer networks.
•A general method for explainable clustering of high-dimensional data.•A fixed-parameter algorithms for explainable graph clustering.•A Massively Parallel Computation (MPC) algorithm for explainable clustering.•An approximation algorithm for graph clustering on expander graphs.
•Nearest neighbor graph can indicate the samples that lying within the local dense regions of dataset without any input parameter.•A clustering algorithm named ADBSCAN is developed based on the ...nearest neighbor graph properties.•Experiments on different types of datasets demonstrate the superior performance and the robust to parameters of ADBSCAN.
Density-based clustering has several desirable properties, such as the abilities to handle and identify noise samples, discover clusters of arbitrary shapes, and automatically discover of the number of clusters. Identifying the core samples within the dense regions of a dataset is a significant step of the density-based clustering algorithm. Unlike many other algorithms that estimate the density of each samples using different kinds of density estimators and then choose core samples based on a threshold, in this paper, we present a novel approach for identifying local high-density samples utilizing the inherent properties of the nearest neighbor graph (NNG). After using the density estimator to filter noise samples, the proposed algorithm ADBSCAN in which “A” stands for “Adaptive” performs a DBSCAN-like clustering process. The experimental results on artificial and real-world datasets have demonstrated the significant performance improvement over existing density-based clustering algorithms.
•The deep reinforcement learning is firstly employed to solve source searching problem.•The belief state is maintained by the particle filter.•The feature of belief state is extracted by the DBSCAN ...algorithm.•Transfer learning is employed to reuse the trained Q-network in heterogeneous tasks.
The localization of hazardous sources (e.g. poisonous gas sources) is an important task regarding the security of human society. To find the unknown source in time, various autonomous source searching methods have mushroomed and been employed over the past decade. This paper designs a fresh source searching approach, namely particle clustering-deep Q-network, PC-DQN, which applies the deep reinforcement learning (DRL) techniques as a source searching approach for the first time. Specifically, the search process is formulated as the partially observable Markov decision process, then converted into the Markov decision process based on the belief state (represented by the particle filter). PC-DQN leverages the density-based spatial clustering of applications with noise (DBSCAN) algorithm to extract the feature of belief state, and employ the deep Q-network (DQN) algorithm to find the optimal policy for the source searching task. Through the comparison with two baseline methods (i.e. RANDOM and Entrotaxis algorithm) under various experimental conditions, the viability of our proposed PC-DQN is testified. Results explicitly reveal that the success rate of the PC-DQN maintains at a high level (beyond 99.6%) in all scenarios in this paper, and the mean search step shows evident superiority over baseline methods in most scenarios. Significantly, we also introduce the transfer learning concept to reuse the well-trained Q-network into new scenarios. These findings show important implications of the DRL-based approach as an alternative and more effective source searching approach.