In this paper, we apply the machine learning clustering algorithm Density Based Spatial Clustering of Applications with Noise (DBSCAN) to study the membership of stars in twelve open clusters (NGC ...2264, NGC 2682, NGC 2244, NGC 3293, NGC 6913, NGC 7142, IC 1805, NGC 6231, NGC 2243, NGC 6451, NGC 6005 and NGC 6583) based on Gaia DR3 Data. This sample of clusters spans a variety of parameters like age, metallicity, distance, extinction and a wide parameter space in proper motions and parallaxes. We obtain reliable cluster members using DBSCAN as faint as G∼20 mag and also in the outer regions of clusters. With our revised membership list, we plot color-magnitude diagrams and we obtain cluster parameters for our sample using ASteCA and compare it with the catalog values. We also validate our membership sample by spectroscopic data from APOGEE and GALAH for the available data. This paper demonstrates the effectiveness of DBSCAN in membership determination of clusters.
Density-based spatial clustering of applications with noise (DBSCAN) is a density-based clustering algorithm that has the characteristics of being able to discover clusters of any shape, effectively ...distinguishing noise points and naturally supporting spatial databases. DBSCAN has been widely used in the field of spatial data mining. This paper studies the parallelization design and realization of the DBSCAN algorithm based on the Spark platform, and solves the following problems that arise when computing macro data: the requirement of a great deal of calculation using the single-node algorithm; the low level of resource-utilization with the multi-node algorithm; the large time consumption; and the lack of instantaneity. The experimental results indicate that the proposed parallel algorithm design is able to achieve more stable speedup at an increased involved spatial data scale.
The classification of radar emitters in a naval electronic-warfare (EW) context is a challenging problem. Furthermore, over the last few years, new technologies and the reduction of the fabrication ...costs have led to an increase in the number of radars. Thus, EW systems need to improve their capabilities of radar identification. The common operational methods, which consist of clustering pulses with similar parameters, are no longer able to solve the classification task efficiently. The main challenges of this data clustering of unknown radar emitters from passive measurements are the high dimension of the received radar pulse samples, the small sample group size, and closely located radar pulse clusters. In this paper, we propose modeling the instantaneous frequency law of the recorded pulses using Bézier curves. The spatial distribution of the control points of the Bézier curves is used as a new feature for the classification process based on the common density-based spatial clustering of applications with noise algorithms. Real data recorded near Brest, France, are used to highlight the interest in using the proposed approach. We also show the potential of the approach for specific emitter identification.
Extracting “high ranking” or “prime protein targets” (PPTs) as potent MRSA drug candidates from a given set of ligands is a key challenge in efficient molecular docking. This study combines ...protein-versus-ligand matching molecular docking (MD) data extracted from 10 independent molecular docking (MD) evaluations — ADFR, DOCK, Gemdock, Ledock, Plants, Psovina, Quickvina2, smina, vina, and vinaxb to identify top MRSA drug candidates. Twenty-nine active protein targets (APT) from the enhanced DUD-E repository (
http://DUD-E.decoys.org
) are matched against 1040 ligands using “forward modeling” machine learning for initial “data mining and modeling” (DDM) to extract PPTs and the corresponding high affinity ligands (HALs). K-means clustering (KMC) is then performed on 400 ligands matched against 29 PTs, with each cluster accommodating HALs, and the corresponding PPTs. Performance of KMC is then validated against randomly chosen head, tail, and middle active ligands (ALs). KMC outcomes have been validated against two other clustering methods, namely, Gaussian mixture model (GMM) and density based spatial clustering of applications with noise (DBSCAN). While GMM shows similar results as with KMC, DBSCAN has failed to yield more than one cluster and handle the noise (outliers), thus affirming the choice of KMC or GMM. Databases obtained from ADFR to mine PPTs are then ranked according to the number of the corresponding HAL-PPT combinations (HPC) inside the derived clusters, an approach called “reverse modeling” (RM). From the set of 29 PTs studied, RM predicts high fidelity of 5 PPTs (17%) that bind with 76 out of 400, i.e., 19% ligands leading to a prediction of next-generation MRSA drug candidates:
PPT2
(average HPC is 41.1%) is the top choice, followed by
PPT14
(average HPC 25.46%), and then
PPT15
(average HPC 23.12%). This algorithm can be generically implemented irrespective of pathogenic forms and is particularly effective for sparse data.
Graphical Abstract
Helium bubbles, which are typical radiation microstructures observed in metals or alloys, are usually investigated using transmission electron microscopy (TEM). However, the investigation requires ...human inputs to locate and mark the bubbles in the acquired TEM images, rendering this task laborious and prone to error. In this paper, a machine learning method capable of automatically identifying and analyzing TEM images of helium bubbles is proposed, thereby improving the efficiency and reliability of the investigation. In the proposed technique, helium bubble clusters are first determined via the density-based spatial clustering of applications with noise algorithm after removing the background and noise pixels. For each helium bubble cluster, the number of helium bubbles is determined based on the cluster size depending on the specific image resolution. Finally, the helium bubble clusters are analyzed using a Gaussian mixture model, yielding the location and size information on the helium bubbles. In contrast to other approaches that require training using numerous annotated images to establish an accurate classifier, the parameters used in the established model are determined using a small number of TEM images. The results of the model formulated according to the proposed approach achieved a higher
F
1 score validated through some helium bubble images manually marked. Furthermore, the established model can identify bubble-like objects that humans cannot facilely identify. This computationally efficient method achieves object recognition for material structure identification that may be advantageous to scientific work.
With the increasing popularity of electric vehicles, energy consumption has become a key performance indicator for electric vehicle drivers, automakers and policy makers. Accurate and real-time ...prediction of energy consumption under real-world driving conditions is critical to reducing “range anxiety” and can support optimization of battery size, energy-saving route planning and charging facility operation. In this paper, data collected from 988 electric vehicles of the same model for one year in Zhengzhou, China, are obtained to study the energy consumption of electric vehicles in actual driving conditions. An improved Density-Based Spatial Clustering of Applications with Noise (DBSCAN) model were established to classify the driving behaviors of the drivers. Then the key factors of energy consumption including velocity, accelerated velocity and temperature are studied and modeled. With that, an improved density-based clustering multiple linear regression model for energy prediction were established with driving behavior classification. The density-based clustering multiple linear regression model (DBC-MLR) has better prediction accuracy and can grasp the training features in energy consumption prediction in real driving. The proposed method shows a root mean error (RMSE) of 3.008 kwh/100 km, which is reduced by 11.3 % and 18.4 % compared to conventional machine learning method and multiple linear regression method respectively.
The aim of this paper is to provide an extended analysis of the outlier detection, using probabilistic and AI techniques, applied in a demo pilot demand response in blocks of buildings project, based ...on real experiments and energy data collection with detected anomalies. A numerical algorithm was created to differentiate between natural energy peaks and outliers, so as to first apply a data cleaning. Then, a calculation of the impact in the energy baseline for the demand response computation was implemented, with improved precision, as related to other referenced methods and to the original data processing. For the demo pilot project implemented in the Technical University of Cluj-Napoca block of buildings, without the energy baseline data cleaning, in some cases it was impossible to compute the established key performance indicators (peak power reduction, energy savings, cost savings, CO
emissions reduction) or the resulted values were far much higher (>50%) and not realistic. Therefore, in real case business models, it is crucial to use outlier's removal. In the past years, both companies and academic communities pulled their efforts in generating input that consist in new abstractions, interfaces, approaches for scalability, and crowdsourcing techniques. Quantitative and qualitative methods were created with the scope of error reduction and were covered in multiple surveys and overviews to cope with outlier detection.