Time complexity in rejang language stemming Wibowo, Sastya Hendri; Toyib, Rozali; Muntahanah, Muntahanah ...
Jurnal infotel (Online),
08/2022, Letnik:
14, Številka:
3
Journal Article
Recenzirano
Odprti dostop
Stemming is the process of separating the root word from an affixed word in a sentence by separating the base word and affixes which can consist of prefixes (prefixes), insertions (infixes), and ...suffixes (suffixes). Between one language and another, there are differences in the algorithm, especially the stemming process, in morphology. The time complexity of the Rejang algorithm is determined based on the affix group. To find out the time complexity of the stemming algorithm in the Rejang language using the method of making a digital word dictionary of the Rejang language, studying and analyzing the morphology of the Rejang language, making the Rejang language stemming algorithm based on the results of the Rejang language morphology analysis, analyzing the algorithm's performance and calculating the time complexity of the stemming results. The result of this research is to produce an efficient and effective Rejang Language stemming algorithm, where efficiency is indicated by the algorithm's time complexity of O(log n), and the effectiveness is shown from the results of accuracy of 99% against the test of 9000 affixed words. This accuracy value indicates that the over stemming and under stemming processes are 1%. Test results on 15 text documents with an average stemming failure rate of 1%.
Graph Convolutional Network (GCN) has achieved extraordinary success in learning representations of nodes in graphs. However, regarding Heterogeneous Information Network (HIN), existing HIN-oriented ...GCN methods still suffer from two deficiencies: (1) they cannot flexibly explore all possible meta-paths and extract the most useful ones for each target object, which hinders both effectiveness and interpretability; (2) before performing aggregation, they often require some additional time-consuming pre-processing operations, which increase the computational complexity. To address the above issues, we propose an interpretable and efficient Heterogeneous Graph Convolutional Network (ie-HGCN) to learn the representations of objects in HINs. It is designed as a hierarchical aggregation architecture, i.e., object-level aggregation and type-level aggregation. The new architecture can automatically evaluate all possible meta-paths within a length limit, and discover and exploit the most useful ones for each target object, i.e., at fine granularity. It also reduces the computational cost by avoiding additional time-consuming pre-processing operations. Theoretical analysis shows its ability to evaluate the usefulness of all possible meta-paths, its connection to the spectral graph convolution on HINs, and its quasi-linear time complexity. Extensive experiments on four real network datasets demonstrate its interpretability, efficiency as well as its superiority against thirteen baselines.
In emergency responses to natural disasters, actionable information provided by remote sensing images is crucial to help emergency managers become aware of the situation and assess the magnitude of ...the damage. Without the accurate prediction of time consumption, choosing an algorithm for land use/land cover (LULC) classification under these emergency circumstances could be blind and subjective. Here, we proposed a full parameter time complexity (FPTC) analysis and the corresponding coefficient ω to estimate the actual running time of the LULC classification without actually running the code. The FPTC of five general algorithms is derived in this article. After derivation, the FPTC of k-nearest neighbors (kNN) is F(nv + nlog 2 u), the FPTC of logistic regression (LR) is F(Qm 2 vn), the FPTC of classification and regression tree (CART) is F((m + 1)nvlog 2 n), the FPTC of random forest (RF) is F(s(m + 1)nvlog 2 n), and the FPTC of support vector machine (SVM) is F(m 2 Qv (n + k)). The results show a strong linear relationship between the actual running time and FPTC R-squared: kNN (0.991), LR (0.997), CART (0.999), RF (1.000), and SVM (0.999), with different data size. The average root-mean-squared error between the real running time and the estimated running time is 3.34 s, which demonstrates the effectiveness of FPTC. Combining FPTC with the corresponding coefficient ω, the running time of the classification can be precisely predicted, which will help emergency managers quickly choose algorithms in response to natural disasters with available remote sensing data and limited time.
Kitaev's toric code is arguably the most studied quantum code and is expected to be implemented in future generations of quantum computers. The renormalisation decoders introduced by Duclos-Cianci ...and Poulin exhibit one of the best trade-offs between accuracy and efficiency, with a time complexity in O ( n log 2 n ). One question that was left open is how they handle worst-case or adversarial errors, i.e. what is the order of magnitude of the smallest weight of an error pattern that will be wrongly decoded. We initiate such a study involving a simple hard-decision and deterministic version of a renormalisation decoder. We exhibit an uncorrectable error pattern whose weight scales like d 1/2 and prove that the decoder corrects all error patterns of weight less than 5/6 d log 2 (6/5) , where d is the minimum distance of the toric code.
Multi-view subspace clustering has attracted intensive attention to effectively fuse multi-view information by exploring appropriate graph structures. Although existing works have made impressive ...progress in clustering performance, most of them suffer from the cubic time complexity which could prevent them from being efficiently applied into large-scale applications. To improve the efficiency, anchor sampling mechanism has been proposed to select vital landmarks to represent the whole data. However, existing anchor selecting usually follows the heuristic sampling strategy, e.g. <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>-means or uniform sampling. As a result, the procedures of anchor selecting and subsequent subspace graph construction are separated from each other which may adversely affect clustering performance. Moreover, the involved hyper-parameters further limit the application of traditional algorithms. To address these issues, we propose a novel subspace clustering method termed Fast Parameter-free Multi-view Subspace Clustering with Consensus Anchor Guidance (FPMVS-CAG). Firstly, we jointly conduct anchor selection and subspace graph construction into a unified optimization formulation. By this way, the two processes can be negotiated with each other to promote clustering quality. Moreover, our proposed FPMVS-CAG is proved to have linear time complexity with respect to the sample number. In addition, FPMVS-CAG can automatically learn an optimal anchor subspace graph without any extra hyper-parameters. Extensive experimental results on various benchmark datasets demonstrate the effectiveness and efficiency of the proposed method against the existing state-of-the-art multi-view subspace clustering competitors. These merits make FPMVS-CAG more suitable for large-scale subspace clustering. The code of FPMVS-CAG is publicly available at https://github.com/wangsiwei2010/FPMVS-CAG .
In a variety of research areas, the weighted bag of vectors and the histogram are widely used descriptors for complex objects. Both can be expressed as discrete distributions. D2-clustering pursues ...the minimum total within-cluster variation for a set of discrete distributions subject to the Kantorovich-Wasserstein metric. D2-clustering has a severe scalability issue, the bottleneck being the computation of a centroid distribution, called Wasserstein barycenter, that minimizes its sum of squared distances to the cluster members. In this paper, we develop a modified Bregman ADMM approach for computing the approximate discrete Wasserstein barycenter of large clusters. In the case when the support points of the barycenters are unknown and have low cardinality, our method achieves high accuracy empirically at a much reduced computational cost. The strengths and weaknesses of our method and its alternatives are examined through experiments, and we recommend scenarios for their respective usage. Moreover, we develop both serial and parallelized versions of the algorithm. By experimenting with large-scale data, we demonstrate the computational efficiency of the new methods and investigate their convergence properties and numerical stability. The clustering results obtained on several datasets in different domains are highly competitive in comparison with some widely used methods in the corresponding areas.
The hypercube is a very popular topology for the interconnection networks of parallel systems, and the folded hypercube is a variant of the hypercube. The folded hypercube attains much higher ...performance by introducing one additional link to each processing element. Therefore, there are many research activities regarding the folded hypercube. We focus on this topology and address an unresolved problem, that is, the set-to-set disjoint paths problem in it. In this paper, we show an algorithm that solves the problem in a folded hypercube in polynomial time. We prove the correctness of the algorithm. Moreover, we show that the time complexity of the algorithm is O(ν3logν) and the maximum length of the paths is 2ν+2 if the algorithm is applied to a ν-dimensional folded hypercube.
•This study proposes an algorithm that solves the set-to-set disjoint paths problem in a folded hypercube in polynomial time.•(v+1) paths are generated between two node sets, each of which has (v+1) nodes, in a v-dimensional folded hypercube.•It is proved that the proposed algorithm solves the problem in a v-dimensional folded hypercube in O(v3logv) time.•Also, it is proved that the lengths of the paths generated by the proposed algorithm are at most 2v+2.
Evolutionary algorithms have been shown to be powerful for solving multiobjective optimization problems, in which nondominated sorting is a widely adopted technique in selection. This technique, ...however, can be computationally expensive, especially when the number of individuals in the population becomes large. This is mainly because in most existing nondominated sorting algorithms, a solution needs to be compared with all other solutions before it can be assigned to a front. In this paper we propose a novel, computationally efficient approach to nondominated sorting, termed efficient nondominated sort (ENS). In ENS, a solution to be assigned to a front needs to be compared only with those that have already been assigned to a front, thereby avoiding many unnecessary dominance comparisons. Based on this new approach, two nondominated sorting algorithms have been suggested. Both theoretical analysis and empirical results show that the ENS-based sorting algorithms are computationally more efficient than the state-of-the-art nondominated sorting methods.
Feature reduction is an important aspect of Big Data analytics on today's ever-larger datasets. Rough sets are a classical method widely applied in attribute reduction. Most rough set algorithms use ...the priori domain knowledge of a dataset to process continuous attributes through using a membership function. Neighborhood rough sets (NRS) replace the membership function with the concept of neighborhoods, allowing NRS to handle scenarios where no a priori knowledge is available. However, the neighborhood radius of each object in NRS is fixed, and the optimization of the radius depends on grid searching. This diminishes both the efficiency and effectiveness, leading to a time complexity of not lower than <inline-formula><tex-math notation="LaTeX">O(N^2)</tex-math> <mml:math><mml:mrow><mml:mi>O</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="xia-ieq1-2997039.gif"/> </inline-formula>. To resolve these limitations, granular ball neighborhood rough sets (GBNRS), a novel NRS method with time complexity <inline-formula><tex-math notation="LaTeX">O(N)</tex-math> <mml:math><mml:mrow><mml:mi>O</mml:mi><mml:mo>(</mml:mo><mml:mi>N</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="xia-ieq2-2997039.gif"/> </inline-formula>, is proposed. GBNRS adaptively generates a different neighborhood for each object, resulting in greater generality and flexibility in comparison to standard NRS methods. GBNRS is compared with the current state-of-the-art NRS method, FARNeMF, and find that GBNRS obtains both higher performance and higher classification accuracy on public benchmark datasets. All code has been released in the open source GBNRS library at http://www.cquptshuyinxia.com/GBNRS.html .
BTM: Topic Modeling over Short Texts Cheng, Xueqi; Yan, Xiaohui; Lan, Yanyan ...
IEEE transactions on knowledge and data engineering,
2014-Dec.-1, 2014-12-1, Letnik:
26, Številka:
12
Journal Article
Recenzirano
Short texts are popular on today's web, especially with the emergence of social media. Inferring topics from large scale short texts becomes a critical but challenging task for many content analysis ...tasks. Conventional topic models such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) learn topics from document-level word co-occurrences by modeling each document as a mixture of topics, whose inference suffers from the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a novel way for short text topic modeling, referred as biterm topic model (BTM). BTM learns topics by directly modeling the generation of word co-occurrence patterns (i.e., biterms) in the corpus, making the inference effective with the rich corpus-level information. To cope with large scale short text data, we further introduce two online algorithms for BTM for efficient topic learning. Experiments on real-word short text collections show that BTM can discover more prominent and coherent topics, and significantly outperform the state-of-the-art baselines. We also demonstrate the appealing performance of the two online BTM algorithms on both time efficiency and topic learning.