Geo-Social Influence Spanning Maximization Jianxin Li; Sellis, Timos; Culpepper, J. Shane ...
IEEE transactions on knowledge and data engineering,
08/2017, Letnik:
29, Številka:
8
Journal Article
Recenzirano
Odprti dostop
Influence maximization is a recent but well-studied problem which helps identify a small set of users that are most likely to "influence" the maximum number of users in a social network. The problem ...has attracted a lot of attention as it provides a way to improve marketing, branding, and product adoption. However, existing studies rarely consider the physical locations of the users, but location is an important factor in targeted marketing. In this paper, we propose and investigate the problem of influence maximization in location-aware social networks, or, more generally, Geo-social Influence Spanning Maximization. Given a query q composed of a region R, a regional acceptance rate p, and an integer k as a seed selection budget, our aim is to find the maximum geographic spanning regions (MGSR). We refer to this as the MGSR problem. Our approach differs from previous work as we focus more on identifying the maximum spanning geographical regions within a region R, rather than just the number of activated users in the given network like the traditional influence maximization problem 14. Our research approach can be effectively used for online marketing campaigns that depend on the physical location of social users. To address the MGSR problem, we first prove NP-Hardness. Next, we present a greedy algorithm with a 1 - 1=e approximation ratio to solve the problem, and further improve the efficiency by developing an upper bounded pruning approach. Then, we propose the OIR*-Tree index, which is a hybrid index combining ordered influential node lists with an R*-tree. We show that our index based approach is significantly more efficient than the greedy algorithm and the upper bounded pruning algorithm, especially when k is large. Finally, we evaluate the performance for all of the proposed approaches using three real datasets.
The size of textual data continues to grow along with the need for timely and cost-effective analysis, while the growth of computation power cannot keep up with the growth of data. The delays when ...processing huge textual data can negatively impact user activity and insight. This calls for a paradigm shift from blocking fashion to progressive processing. In this paper, we propose a sample-based progressive processing model that focuses on term frequency calculation on text. The model is based on an incremental execution engine and will calculate a series of approximate results for a single query in a progressive way to provide a smooth trade-off between accuracy and latency. As a part, we proposed a new variant of the bootstrap technique to quantify result error progressively. We implemented this method in our system called Parrot on top of Apache Spark and used real-world data to test its performance. Experiments demonstrate that our method is 2.4×–19.7× faster to get a result within 1% error while the confidence interval always covers the accurate results very well.
Data exploration task is usually quite time-consuming. Analysts who want to find interests or verify their hypothesis may prefer a lower response time while tolerating a bounded error. Approximate ...query processing (AQP) is a convincing way to achieve this goal by leveraging some pre-computed samples to speed up this process. Existing sampling based AQP systems usually take a single sampling strategy on the whole dataset. However, during the data exploration tasks, various potential interests may distribute in different parts of dataset. To explore these interests, queries submitted by users thus show a rich diversity for separate sub-datasets. Therefore, only one single sampling strategy is obviously not competent for all queries accessing various sub-datasets. In this paper, we proposed a flexible and effective sampling system POLYTOPE especially designed for the data exploration tasks. To achieve this, we take the following three key ideas: (1) split the dataset into sampling blocks according to the user query patterns, (2) individually generate a set of optimized samples for each sampling block, and (3) automatically select an optimal sample at run time. We utilize both user query patterns and underlying data distribution to fulfill these ideas. We have implemented our system on the Spark platform and our comprehensive experimental results show that our system improved the accuracy performance up to 46% under the same time constraint for the data exploration tasks.
In this work, a novel ternary nanocomposite of PEI/RuSi-MWCNTs was designed and synthesized for the first time, which an ultrasensitive and self-enhanced electrochemiluminescent (ECL) aptasensor was ...developed for the detection of profenofos residues in vegetables. The self-enhanced complex PEI-Ru (II) enhanced the emission and stability of ECL, and the multi-walled carbon nanotubes (MWCNTs) acted as an excellent carrier and signal amplification. The PEI/RuSi-MWCNTs were characterized by scanning electron microscope (SEM), transmission electron microscope (TEM) and energy dispersive spectrometer (EDS). The incorporation of gold nanoparticles (AuNPs) improved the performance of the sensor and provided a platform for the immobilization of the aptamer. The results of the experiment showed that the presence of profenofos significantly suppressed the electrochemiluminescence intensity of the sensor. The detection sensitivity of the aptamer sensor was in the range of 1 × 10−2 to 1 × 103 ng/mL. Under optimal conditions, the limit of detection (LOD) of the sensor for profenofos was 1.482 × 10−3 ng/mL. The sensor had excellent stability, reproducibility and specificity. The recoveries of the sensor ranged from 92.29 % to 106.47 % in real sample tests.
•Ternary PEI/RuSi-MWCNTs self-enhanced nanocomposites were prepared.•Preparation of sensors based on PEI/RuSi-MWCNTs for the detection of profenofos.•The sensor could detect profenofos in the range of 1 × 10−2∼1 × 103 ng/mL.•Profenofos levels in three vegetables were detected using sensors.
Geo-Social Influence Spanning Maximization Jianxin Li; Sellis, Timos; Culpepper, J. Shane ...
2018 IEEE 34th International Conference on Data Engineering (ICDE)
Conference Proceeding
Odprti dostop
The problem of influence maximization has attracted a lot of attention as it provides a way to improve marketing, branding, and product adoption. However, existing studies rarely consider the ...physical locations of the social users, although location is an important factor in targeted marketing. In this paper, we investigate the problem of influence spanning maximization in location-aware social networks. Our target is to identify the maximum spanning geographical regions in a query region, which is very different from the existing methods that focus on the quantity of the activated users in the query region. Since the problem is NP-hard, we develop one greedy algorithm with a 1-1/e approximation ratio and further improve its efficiency by developing an upper bound based approach. Then, we propose the OIR index by combining ordered influential node lists and an R*-tree and design the index based solution. The efficiency and effectiveness of our proposed solutions and index have been verified using three real datasets.
Nowadays, many applications, like the Internet of Things and Industrial Internet, collect data points from sensors continuously to form long time series. Finding the correlation between time series ...is a fundamental task for many time series mining problems. However, it is meaningless to directly measure the global correlation between two long time series due to concept shift or noise data. To tackle this challenge, in this paper, we formulate the novel problem of finding maximal significant linear representation. The major idea is that, given two time series and a quality constraint, we want to find the longest gapped time interval on which a time series can be linearly represented by the other within the quality constraint requirement. We develop both exact and approximate algorithms (with approximation quality guarantees), which exploit a novel representation of the linear correlation between time series on subsequences, and transform the problem into a geometric search. Moreover, we propose an online approach to find this correlation in each sliding window incrementally for the streaming data. We present a systematic empirical study to verify the efficiency and effectiveness of our approaches.
The learning-enhanced data structure has inspired the development of the range filter, bringing significantly better false positive rate (FPR) than traditional non-learned range filters. Its core ...idea is to employ piece-wise linear functions that uniformly map the entire key space into a bitmap sequentially. Nonetheless, such uniform mapping can be space-ineffective, impacting FPRs. This paper introduces Oasis, a novel learned range filter that divides the key space into disjointed intervals by excluding large empty ranges explicitly and optimally maps those unpruned intervals into a compressed bitmap. The configuration optimality in Oasis is guaranteed by a careful theoretical analysis. To enhance the versatility of Oasis, we further propose Oasis+, which integrates the design space of both learned and non-learned filters, delivering robust performance across a wide range of workloads. We evaluate the performance of both Oasis and Oasis+ when integrated into the key-value system RocksDB, using a diverse set of real-world and synthetic datasets and workloads. In RocksDB, Oasis and Oasis+ improve the performance by up to 1.4× and 6.2× when compared to state-of-the-art learned and non-learned range filters.
In the interactive data exploration, approximate query processing (AQP) can be used to quickly return query results at the cost of accuracy. For online AQP, the sampler can be treated as an operator ...in the query plan. During the query optimization for AQP, heuristic rules are usually used to guide the sampler push-down. However, due to the complexity and changes of data distribution, the heuristic rule-based optimization methods cannot meet the users' query accuracy requirements. In this article, we propose a learning-based query optimization method for online AQP. We first introduce the weak equivalence concept and propose a series of push-down rules to guide the sampler push-down during the query optimization. Then, to enable more queries to meet the users' query accuracy requirements, we propose a deep learning model to further optimize the query plan. By using this model during each push-down process of the sampler, we try to avoid the negative effect of inappropriate sampler push-down on query accuracy, especially when there is an inconsistency between the underlying and intermediate data distribution. Extensive experiments show that the method proposed in this paper can outperform the state-of-the-art online sampling-based AQP method by 1.2×−7.9× in query accuracy.
Nowadays the demands for managing and analyzing substantially increasing collections of time series are becoming more challenging. Subsequence matching, as a core subroutine in time series analysis, ...has drawn significant research attention. Most of the previous works only focus on matching the subsequences with equal length to the query. However, many scenarios require support for efficient variable-length subsequence matching. In this paper, we propose a new representation, Uniform Piecewise Aggregate Approximation (UPAA) with the capability of aligning features for variable-length time series while remaining the lower bounding property. Based on UPAA, we present a compact index structure by grouping adjacent subsequences and similar subsequences respectively. Moreover, we propose an index pruning algorithm and a data filtering strategy to efficiently support variable-length subsequence matching without false dismissals. The experiments conducted on both real and synthetic datasets demonstrate that our approach achieves considerably better efficiency, scalability, and effectiveness than existing approaches.
For interactive data exploration, approximate query processing (AQP) is a useful approach that usually uses samples to provide a timely response for queries by trading query accuracy. Existing AQP ...systems often materialize samples in the memory for reuse to speed up query processing. How to tune the samples according to the workload is one of the key problems in AQP. However, since the data exploration workload is so complex that it cannot be accurately predicted, existing sample tuning approaches cannot adapt to the changing workload very well. To address this problem, this paper proposes a deep reinforcement learning-based sample tuner, RL-STuner . When tuning samples, RL-STuner considers the workload changes from a global perspective and uses a Deep Q-learning Network (DQN) model to select an optimal sample set that has the maximum utility for the current workload. In addition, this paper proposes a set of optimization mechanisms to reduce the sample tuning cost. Experimental results on both real-world and synthetic datasets show that RL-STuner outperforms the existing sample tuning approaches and achieves 1.6×-5.2× improvements on query accuracy with a low tuning cost.