Social Internet of Things (SIoT), an IoT where things are autonomously capable of establishing relationships with other smart objects related to humans, allows them to interact within a social ...structure based on relationships. Importantly, exploiting the social structures of smart objects in SIoT is important for supervision and management of various services. Diversified top-k maximal clique, as a novel social structure, can be used for anomaly detection, and smart community detection from SIoT. However, the scalability of the existing approaches for detecting diversified top-k maximal cliques is becoming a significant challenge faced in the big graph. To this end, this paper proposes a novel diversified top-k maximal clique detection approach based on formal concept analysis. Specifically, we firstly prove the existence of equivalence relation between maximal cliques and equiconcepts which are a class of special concepts where the extent and intent are the same. Based on this equivalence relation, an efficient and innovative approach based on formal concept analysis for identifying diversified top-k maximal cliques is then further presented. Finally, three real-world social network datasets are adopted in experiments for the validation of effectiveness of our approach in SIoT.
•Diversified top-k maximal clique detection problem is formulated.•An equivalence relation between maximal cliques and equiconcepts is proved.•An efficient approach for identifying diversified top-k maximal cliques is presented
Clustering and community structure is crucial for many network systems and the related dynamic processes. It has been shown that communities are usually overlapping and hierarchical. However, ...previous methods investigate these two properties of community structure separately. This paper proposes an algorithm (EAGLE) to detect both the overlapping and hierarchical properties of complex community structure together. This algorithm deals with the set of maximal cliques and adopts an agglomerative framework. The quality function of modularity is extended to evaluate the goodness of a cover. The examples of application to real world networks give excellent results.
Detecting community structure has become one important technique for studying complex networks. Although many community detection algorithms have been proposed, most of them focus on separated ...communities, where each node can belong to only one community. However, in many real-world networks, communities are often overlapped with each other. Developing overlapping community detection algorithms thus becomes necessary. Along this avenue, this paper proposes a maximal clique based multiobjective evolutionary algorithm (MOEA) for overlapping community detection. In this algorithm, a new representation scheme based on the introduced maximal-clique graph is presented. Since the maximal-clique graph is defined by using a set of maximal cliques of original graph as nodes and two maximal cliques are allowed to share the same nodes of the original graph, overlap is an intrinsic property of the maximal-clique graph. Attributing to this property, the new representation scheme allows MOEAs to handle the overlapping community detection problem in a way similar to that of the separated community detection, such that the optimization problems are simplified. As a result, the proposed algorithm could detect overlapping community structure with higher partition accuracy and lower computational cost when compared with the existing ones. The experiments on both synthetic and real-world networks validate the effectiveness and efficiency of the proposed algorithm.
The feature selection process plays an important role in different fields, particularly in bioinformatics and microarray gene expression data analysis, for choosing discriminative genes from ...high-dimensional datasets and selecting a subset of highly relevant features with low redundancy that may lead to build improved prediction models. Consequently, this study proposes a new feature selection method that integrates Preordonnances theory in terms of new Relevance and Complementarity criteria introduced here and also connectivity in undirected Weighted Graphs (PCRWG). The method can handle high-dimensional data. PCRWG retains the relevant and complementary features to select effective features in large scale gene datasets. The proposed algorithm operates in two phases: filtering and wrapping. The strength of the first phase is that it is preceded by a step that further reduces the number of predictors by removing those in disagreement with the target based on the new proposed relevance criterion. Then, the proposed heuristic uses the relevance-complementarity ratio between preordonnances to automatically update the compromise rule between relevance and complementarity. In the wrapping phase, the suggested graph-based approach using maximal clique is based on a powerful relevance-complementarity matrix to consolidate edges, two connected interdependent features are complementary to each other, and it is possible to have high discriminative power when they serve as a group. We highlight the fact that existing graph-based feature selection algorithms do not consider relevance and complementarity simultaneously. The experiments were carried out on three simulated scenarios and the thirteen most popular cancer microarray gene datasets. Formally, they are eight binary and five multi-class microarray data. A 10-fold cross validation was used to evaluate the Support Vector Machine (SVM), Naive Bayes (NB) and artificial Neural Network (NN) classifiers. The empirical results demonstrate the high performance of the proposed hybrid approach when compared to the most recently published articles.
Generally, for high-dimensional datasets, only some features are relevant, while others are irrelevant or redundant. In the machine learning field, the use of a strategy for eliminating insignificant ...features from a dataset is very important for the classification task. Feature selection is the process of identifying the most informative features that help in predicting sample classes efficiently in order to achieve better classification performance. In this research paper, a new hybrid feature selection strategy for high-dimensional datasets is proposed to find the most discriminative subset of features for the dataset with the irrelevant and redundant features discarded. The proposed algorithm is called Maximal Clique based on the coefficients Ψ (MaCΨ algorithm). The MaCΨ method has the capability to handle categorical, numerical, and hybrid datasets. Furthermore, it can be applied either to binary or multi-class classification problems. The global structure of the MaCΨ algorithm can be described by three steps. In the first step, a weight is proposed to evaluate the importance of each feature in the dataset by balancing the trade-off between two novel measures of relevance and redundancy, and then the K most important features are selected to form the candidate subset, where K is taken as user input. In the second phase, a wrapper method based on graph theory is applied to the subset retained from the first step to extract the optimal subset of features. In the last stage, the final subset of features with the highest classification performance and the lowest number of features is obtained by applying the backward elimination algorithm to the optimal subset. The performance of the MaCΨ methodology is investigated on artificial as well as real-world datasets with different dimensionalities. The statistical analysis of the experimental results clearly indicates that the MaCΨ approach achieves competitive results in terms of the classification accuracy and the number of selected features compared with some state-of-the-art approaches.
•Two novel measures are defined to evaluate the relevance and redundancy of each predictor of any type.•A new hybrid filter–wrapper feature selection approach is proposed to select the most important features.•The filter phase is based on a novel feature evaluation criterion (MaCΨ weight) that is related to the defined relevance and redundancy measures simultaneously.•The wrapper phase is based on graph theory and also on sequential backward selection.
The emerging massive noisy and incomplete data is transforming the conventional graph to the uncertain graph. In this paper, we study rough maximal cliques enumeration (RMCE) in incomplete graphs, ...which we define as the novel problem of enumerating all maximal cliques where some of edges are unknown to users. The hardness of RMCE is proved to NP-complete. To tackle this problem, an efficient framework for obtaining the rough maximal cliques based on Partially-Known Concept Learning (PKCL). With this framework, a given incomplete graph is initially represented as an incomplete formal context. Then, the partially-known SE-ISI concept lattice is generated through the constructed incomplete formal context. Based on the constructed SE-ISI concept lattice, an equivalence theorem between SE-ISI equiconcepts and rough maximal cliques is presented. The detailed topological structural analysis from the point of views of roughness and SE-ISI concept stability of rough maximal cliques are separately discussed. The evaluation results demonstrate that our proposed PKCL algorithm can better identify the rough maximal cliques under different probability distribution models of links compared to the existing baseline algorithms.
Spatial co-location pattern (SCP) mining discovers subsets of spatial feature types whose objects frequently co-locate in a geographic space. Many existing methods treat the space as homogeneous, use ...absolute Euclidean distance to measure the neighbor relationship between objects and use a participation index to measure the prevalence of SCPs. Several issues arise: (1) it may be that the distance between objects cannot be accurately defined since it is a relative and fuzzy concept; (2) the degree of neighborliness and sharing relationships between objects are neglected; (3) current methods for collecting participating objects by generating candidate table instances utilizing combined search techniques are computationally expensive. In this paper, we propose a method based on fuzzy grid cliques to find all prevalent SCPs. Specifically, fuzzy theory is introduced to define the proximity between objects. The fuzzy participating contribution index (FPCI) is defined to measure the prevalence of SCPs, and it considers both the neighbor degree and sharing relationship between objects. Based on the defined proximity, a basic mining framework based on fuzzy grid cliques is proposed. We first design a naive algorithm based on the participating objects’ filtering and verification called POFV, which uses a fuzzy grid clique search technology instead of combination search to collect participating objects and avoids enumerating all table instances. To solve a dilemma within POFV, we develop a maximal fuzzy grid cliques search based algorithm called MFGC, which can effectively reuse information. Experiments on both real and synthetic data sets verify the superiority of our proposed approaches, by showing that MFGC greatly outperforms the baseline algorithm and more efficiently captures SCPs.
Let q be an odd prime power. Denote by r(q) the value of q modulo 4. In this paper, we establish a linear fractional correspondence between two types of maximal cliques of size q+r(q)2 in the Paley ...graph of order q2.
Spatial co-location pattern (SCP) mining aims to mine the implicit relationships between different spatial features. These features often have certain connections and co-occur in close geographical ...proximity. Regional co-location pattern (RCP) mining is a branch of SCP mining, which is usually used to discover some sets of spatial features that do not often co-occur in large spatial scales but co-occur in local regions. Discovering RCPs is still very challenging, because different RCPs will be obtained under different region partitions. However, existing region division methods still suffer from ignoring the influence of the density of individual feature instances, low recognition rate of regions with low density distribution but containing RCPs, and lack of semantic information in the delineated regions. To this end, first, we propose a novel multi-density clustering method based on maximal cliques (MCs) during the partitioning phase of RCP mining. Second, we design a two-stage mining algorithm based on MCs in the mining phase, which fully exploits the advantages of the MC to improve the mining efficiency, and the algorithm can quickly obtain new mining results when changing the prevalence threshold. Third, regional similarity is defined based on RCPs over regions to merge similar sub-regions. Finally, the proposed method is compared with a state-of-the-art method on both synthetic and real datasets. The experimental results show that the proposed method can not only effectively solve the issues of existing methods to make the divided sub-regions more closely matched with the real spatial distribution, but also quickly obtain new mining results when changing the prevalence threshold.