We analyze the drawbacks of DBSCAN and its variants, and find the grid technique, which is used in Fast-DBSCAN and ρ-approximate DBSCAN, is almost useless in high dimensional data space. Because it ...usually yields considerable redundant distance computations. In order to tame these problems, two techniques are proposed: one is to use ϵ2-norm ball to identify Inner Core Blocks within which all points are core points, it has higher efficiency than grid technique for finding more core points at one time; the other is a fast approximate algorithm for judging whether two Inner Core Blocks are density-reachable from each other. Besides, cover tree is also used to accelerate the process of density computations. Based on the three techniques, an approximate approach, namely BLOCK-DBSCAN, is proposed for large scale data, which runs in about O(nlog (n)) expected time and obtains almost the same result as DBSCAN. BLOCK-DBSCAN has two versions, i.e., L2 version can work well for relatively high dimensional data, and L∞ version is suitable for high dimensional data. Experimental results show that BLOCK-DBSCAN is promising and outperforms NQDBSCAN, ρ-approximate DBSCAN and AnyDBC.
•The underlying idea is: point p and point q should have similar neighbors, provided p and q are close to each other; given a certain eps, the closer they are, the more similar their neighbors ...are.•NQ-DBSCAN is an exact algorithm that may return the same result as DBSCAN if the parameters are same. While ρ-Approximate DBSCAN is an approximate algorithm.•The best complexity of NQ-DBSCAN can be O(n), and the average complexity of NQ-DBSCAN is proved to be O(n log(n)) provided the parameters are properly chosen. While ρ-Approximate DBSCAN runs only in O(n2) in high dimension.•NQ-DBSCAN is suitable for clustering data with a lot of noise.
Clustering is an important technique to deal with large scale data which are explosively created in internet. Most data are high-dimensional with a lot of noise, which brings great challenges to retrieval, classification and understanding. No current existing approach is “optimal” for large scale data. For example, DBSCAN requires O(n2) time, Fast-DBSCAN only works well in 2 dimensions, and ρ-Approximate DBSCAN runs in O(n) expected time which needs dimension D to be a relative small constant for the linear running time to hold. However, we prove theoretically and experimentally that ρ-Approximate DBSCAN degenerates to an O(n2) algorithm in very high dimension such that 2D > > n. In this paper, we propose a novel local neighborhood searching technique, and apply it to improve DBSCAN, named as NQ-DBSCAN, such that a large number of unnecessary distance computations can be effectively reduced. Theoretical analysis and experimental results show that NQ-DBSCAN averagely runs in O(n*log(n)) with the help of indexing technique, and the best case is O(n) if proper parameters are used, which makes it suitable for many realtime data.
Inverting a matrix is time-consuming, and many works focus on accelerating the inversion of a single large matrix by GPU. However, the problem of parallelizing the inversion of a large number of ...small matrices has received little attention. These problems are widely applied in computer science, including accelerating cryptographic algorithms and image processing algorithms. In this paper, we propose a Revised In-Place Inversion algorithm for inverting a large number of small matrices on the CUDA platform, which adopts a more refined parallelization scheme and outperforms other algorithms, achieving a speedup of up to 20.9572 times over the batch matrix inverse kernel in CUBLAS. Additionally, we found that there is an upper bound on the input data size for each GPU device, and the performance will degrade if the input data size is too large. Therefore, we propose the Saturation Size Curve based on this finding to divide matrices into batches and improve the algorithm performance. Experimental results show that this strategy increases the algorithm’s performance by 1.75 times and effectively alleviates the problem of performance degradation.
The existing rail station passenger flow prediction models are inefficient, due to that most of them use single-source data to predict. In this paper, a novel method is proposed based on multi-layer ...LSTM, which integrates multi-source traffic data and multi-techniques (including feature selection based on Spearman correlation and time feature clustering), to improve the performance of predicting passenger flow. The experimental results show that the multi-source data and the techniques integrated in the model are helpful, and the proposed method obtains a higher prediction accuracy which outperforms other methods (e.g. SARIMA, SVR and BP network) greatly.
This paper investigates pulse-profile-maintaining characteristics of a gain-switched nanosecond Raman fiber laser (RFL) and its Raman fiber amplifier (RFA). It is found that the pulse profiles can be ...precisely maintained after the gain-switched Raman conversion, only if the pump pulse repetition frequency (PRF) is an integer multiple of, a unit fraction of, or multiple unit fractions of the fundamental cavity-defined PRF. The produced Raman pulses are then amplified in an RFA, and the pulse profiles can also be precisely maintained. However, it is further noticed that the detuning-induced defects can be amplified if they are with the pulse itself. All these observed phenomena should originate from the transient property of Raman gain, not involving any notable delaying or saturation effects that are typically seen with rare-earth-doped gain fibers. Our results demonstrate that, in the nanosecond scale, the Raman-converted pulse can well maintain its profile and shows strong resistance to detune between PRFs of the pump and Raman pulses. Benefiting from that, we can reasonably neglect the pump depletion effect due to walk-off effect typically seen in various short pulse regimes, which can largely simplify the design of nanosecond pulsed pumping Raman sources.
In this paper, a nonlinear absorbing-loop mirror (NAbLM) is constructed in a polarization-maintaining (PM) version and applied in the all-normal dispersion (ANDi) regime, for the first time to our ...best knowledge. To investigate the properties of NAbLM in contrast with typical nonlinear optical loop mirror (NOLM), an asymmetric optical coupler is used in the NAbLM. In experiment, both the NAbLM and NOLM can be used to mode-lock a Yb-doped fiber (YDF) laser and enable a type of noise-like pulse (NLP) generation. Noticeably, however, it is further found that, in comparison to NOLM, NAbLM evidently enables both temporal and radio frequency (RF) noise suppressions, as well as elimination of RF variations. Further calculations show that the saturable property of the incorporated absorber plays a vital role in differing the NAbLM from typical NOLM. Our results demonstrate some new features of NAbLM, especially in respect to temporal pulse shaping, indicating that NAbLM can be a useful type of fiber loop mirror in nonlinear fiber optics.
Herein, an ultrafast mode-locked Tm-doped fiber laser with all polarization-maintaining (PM) fiber relying on nonlinear amplifying loop mirror was firstly presented. The simple all-PM mode-locked ...oscillator can deliver output power of ∼1.20 mW and pulse repetition of ∼7.267 MHz at central wavelength of ∼1950.2 nm. The pulse duration was measured after boosting the power to ∼20 mW by a fiber amplifier, which shows the full width at half maximum (FWHM) of ∼357 fs. Moreover, the all-PM mode-locked oscillator has exhibited a good power stability with fluctuation of ∼1.08 % during 36 h monitoring. This stable fiber laser might be served as a potential ultrashort pulse seed source for the applications of medical treatment, precision processing and mid-infrared light source.
Cross-modal hashing has recently gained significant popularity to facilitate multimedia retrieval across different modalities. Since the acquisition of large-scale labeled training data are very ...labor intensive, most supervised cross-modal hashing methods are uncompetitive for real applications. With limited label available, this paper presents a novel
S
emi-
S
upervised
D
iscrete
H
ashing (
SSDH
) for efficient cross-modal retrieval. In contrast to most semi-supervised cross-modal hashing works that need to predict the label of unlabeled data, our proposed approach groups the labeled and unlabeled data together, and exploits the informative unlabeled data to promote hashing code learning directly. Specifically, the proposed SSDH approach utilizes the relaxed hash representations to characterize each modality, and learns the semi-supervised semantic-preserving regularization to correlate the semantic consistency between the heterogeneous modalities. Accordingly, an efficient objective function is proposed to learn the hash representation, while designing an efficient optimization algorithm to optimize the hash codes for both labeled and unlabeled data. Without sacrificing the retrieval performance, the proposed SSDH method is adaptive to benefit various kinds of retrieval tasks, i.e., unsupervised, semi-supervised and supervised. Experimental results compared with several competitive algorithms show the effectiveness of the proposed method and its superiority over state-of-the-arts.