On Group Nearest Group Query Processing Ke Deng; Sadiq, S.; Xiaofang Zhou ...
IEEE transactions on knowledge and data engineering,
02/2012, Letnik:
24, Številka:
2
Journal Article
Recenzirano
Given a data point set D, a query point set Q, and an integer k, the Group Nearest Group (GNG) query finds a subset ω (|ω| ≤ k)of points from Dsuch that the total distance from all points in Q to the ...nearest point in ω is not greater than any other subset ω' (|ω'| ≤ k) of points in D. GNG query is a partition-based clustering problem which can be found in many real applications and is NP-hard. In this paper, Exhaustive Hierarchical Combination (EHC) algorithm and Subset Hierarchial Refinement (SHR) algorithm are developed for GNG query processing. While EHC is capable to provide the optimal solution for k = 2, SHR is an efficient approximate approach that combines database techniques with local search heuristic. The processing focus of our approaches is on minimizing the access and evaluation of subsets of cardinality k in D since the number of such subsets is exponentially greater than |D|. To do that, the hierarchical blocks of data points at high level are used to find an intermediate solution and then refined by following the guided search direction at low level so as to prune irrelevant subsets. The comprehensive experiments on both real and synthetic data sets demonstrate the superiority of SHR in terms of efficiency and quality.
As travel is taking more significant part in our life, route recommendation service becomes a big business and attracts many major players in IT industry. Given a pair of user-specified origin and ...destination, a route recommendation service aims to provide users with the routes of best travelling experience according to criteria, such as travelling distance, travelling time, traffic condition, etc. However, previous research shows that even the routes recommended by the big-thumb service providers can deviate significantly from the routes travelled by experienced drivers. It means travellers' preferences on route selection are influenced by many latent and dynamic factors that are hard to model exactly with pre-defined formulas. In this work we approach this challenging problem with a very different perspective- leveraging crowds' knowledge to improve the recommendation quality. In this light, CrowdPlanner - a novel crowd-based route recommendation system has been developed, which requests human workers to evaluate candidate routes recommended by different sources and methods, and determine the best route based on their feedbacks. In this paper, we particularly focus on two important issues that affect system performance significantly: (1) how to efficiently generate tasks which are simple to answer but possess sufficient information to derive user-preferred routes; and (2) how to quickly identify a set of appropriate domain experts to answer the questions timely and accurately. Specifically, the task generation component in our system generates a series of informative and concise questions with optimized ordering for a given candidate route set so that workers feel comfortable and easy to answer. In addition, the worker selection component utilizes a set of selection criteria and an efficient algorithm to find the most eligible workers to answer the questions with high accuracy. A prototype system has been deployed to many voluntary mobile clients and extensive tests on real-scenario queries have shown the superiority of CrowdPlanner in comparison with the results given by map services and popular route mining algorithms.
Compared with traditional Fast Fourier Transform (FFT) algorithms, FFT pruning is more computationally efficient in those cases where some of the input values are zero and/or some of the output ...components are not needed. In this letter, a novel pruning scheme is developed for mixed-radix and high-radix FFT pruning. The proposed approach is applicable over a wide range of FFT lengths and input/output pruning patterns. In addition, it can effectively employ the benefits of high-radix FFT algorithms that have lower computational complexity.
Many studies of spatiotemporal pattern discovery partition data space into disjoint cells for effective processing. However, the discovery accuracy of the space-partitioning schemes highly depends on ...space granularity. Moreover, it cannot describe data statistics well when data spreads over not only one but many cells. In this study, we introduce a novel approach which takes advantages of the effectiveness of space-partitioning methods but overcomes those problems. Specifically, we uncover frequent regions where an object frequently visits from its trajectories. This process is unaffected by the space-partitioning problems. We then explain the relationships between the frequent regions and the partitioned cells using trajectory pattern models based on hidden Markov process. Under this approach, an object’s movements are still described by the partitioned cells, however, its patterns are explained by the frequent regions which are more precise. Our experiments show the proposed method is more effective and accurate than existing space-partitioning methods.
30K proteins are a major group of nutrient storage proteins in the silkworm hemolymph. Previous studies have shown that 30K proteins are involved in the anti-fungal immunity; however, the molecular ...mechanism involved in this immunity remains unclear.
We investigated the transcriptional expression of five 30K proteins, including BmLP1, BmLP2, BmLP3, BmLP4, and BmLP7. The five recombinant 30K proteins were expressed in an
expression system, and used for binding assays with fungal cells and hemocytes.
The transcriptional expression showed that the five 30K proteins were significantly upregulated after injection of pathogen-associated molecular patterns to the fifth instar larvae, indicating the possibility of their involvement in immune response. The binding assay showed that only BmLP1 and BmLP4 can bind to both fungal cells and silkworm hemocytes. Furthermore, we found that BmLP1-coated and BmLP4-coated agarose beads promote encapsulation of hemocytes in vitro. The hemocyte encapsulation was blocked when the BmLP1-coated beads were preincubated with BmLP1 specific polyclonal antibodies.
These results demonstrate that 30K proteins are involved in the cellular immunity of silkworms by acting as pattern recognition molecules to directly recruit hemocytes to the fungal surface.
Graph Pattern Matching (GPM) plays a significant role in social network analysis, which has been widely used in, for example, experts finding, social community mining and social position detection. ...Given a pattern graph G Q and a data graph G D , a GPM algorithm finds those subgraphs, G M , that match G Q in G D . However, the existing GPM methods do not consider the multiple constraints on edges in G Q , which are commonly exist in various applications such as, crowdsourcing travel, social network based e-commerce and study group selection, etc. In this paper, we first conceptually extend Bounded Simulation to Multi-Constrained Simulation (MCS), and propose a novel NP-Complete Multi-Constrained Graph Pattern Matching (MC-GPM) problem. Then, to address the efficiency issue in large-scale MC-GPM, we propose a new concept called Strong Social Component (SSC), consisting of participants with strong social connections. We also propose an approach to identify SSCs, and propose a novel index method and a graph compression method for SSC. Moreover, we devise a heuristic algorithm to identify MC-GPM results effectively and efficiently without decompressing graphs. An extensive empirical study on five real-world large-scale social graphs has demonstrated the effectiveness, efficiency and scalability of our approach.
Abstract
Background
Diffusion and perfusion MRI can invasively define physical properties and angiogenic features of tumors, and guide the individual treatment. The purpose of this study was to ...investigate whether the diffusion and perfusion MRI parameters of primary central nervous system lymphomas (PCNSLs) are related to the tumor locations.
Methods
We retrospectively reviewed the diffusion, perfusion, and conventional MRI of 68 patients with PCNSLs at different locations (group 1: cortical gray matter, group 2: white matter, group 3: deep gray matter). Relative maximum cerebral blood volume (rCBV
max
) from perfusion MRI, minimum apparent diffusion coefficients (ADC
min
) from DWI of each group were calculated and compared by one-way ANOVA test. In addition, we compared the mean apparent diffusion coefficients (ADC
mean
) in three different regions of control group.
Results
The rCBV
max
of PCNSLs yielded the lowest value in the white matter group, and the highest value in the cortical gray matter group (
P
< 0.001). However, the ADC
min
of each subgroup was not statistically different. The ADC
mean
of each subgroup in control group was not statistically different.
Conclusion
Our study confirms that rCBV
max
of PCNSLs are related to the tumor location, and provide simple but effective information for guiding the clinical practice of PCNSLs.
Through the quantitative and qualitative analysis of native English speakers' narratives in the interlanguage database, the following results were obtained: In the third person narratives, the usage ...ratios of the three types of anaphora of English speakers at different levels are not the same as Chinese natives, which manifest, for instance, in over usage of nouns and pronouns, short referential distance, and scarce usage of zero anaphora. In the first person narratives, pronouns are used less with the improving of Chinese proficiency, but they are still more frequently used than Chinese natives. Zero anaphora is employed more frequently with the improving of Chinese proficiency, but still less than native Chinese speakers. Adapted from the source document
Due to the prevalence of GPS-enabled devices and wireless communication technology, spatial trajectories that describe the movement history of moving objects are being generated and accumulated at an ...unprecedented pace. However, a raw trajectory in the form of sequence of timestamped locations does not make much sense for humans without semantic representation. In this work we aim to facilitate human's understanding of a raw trajectory by automatically generating a short text to describe it. By formulating this task as the problem of adaptive trajectory segmentation and feature selection, we propose a partition-and-summarization framework. In the partition phase, we first define a set of features for each trajectory segment and then derive an optimal partition with the aim to make the segments within each partition as homogeneous as possible in terms of their features. In the summarization phase, for each partition we select the most interesting features by comparing against the common behaviours of historical trajectories on the same route and generate short text description for these features. For empirical study, we apply our solution to a real trajectory dataset and have found that the generated text can effectively reflect the important parts in a trajectory.
Online video content is surging to an unprecedented level. Massive video publishing and sharing impose heavy demands on online near-duplicate detection for many novel video applications. This paper ...presents an accurate and practical system for online near-duplicate subsequence detection over continuous video streams. We propose to transform a video stream into a one-dimensional video distance trajectory (VDT) monitoring the continuous changes of consecutive frames with respect to a reference point, which is further segmented and represented by a sequence of compact signatures called linear smoothing functions (LSFs). LSFs of each subsequence of the incoming video stream are continuously generated and temporally stored in a buffer for comparison with query LSFs. LSF adopts compound probability to combine three independent video factors for effective segment similarity measure, which is then utilized to compute sequence similarity for near-duplicate detection. To avoid unnecessary sequence similarity computations, an efficient sequence skipping strategy is also embedded. Experimental results on detecting diverse near-duplicates of TV commercials in real video streams show the superior performance of our system on both effectiveness and efficiency over existing methods.