In the private information retrieval (PIR) problem, a user wishes to retrieve, as efficiently as possible, one out of K messages from N non-communicating databases (each holds all K messages) while ...revealing nothing about the identity of the desired message index to any individual database. The information theoretic capacity of PIR is the maximum number of bits of desired information that can be privately retrieved per bit of downloaded information. For K messages and N databases, we show that the PIR capacity is (1+1/N+1/N 2 +· · ·+1/N K-1 ) -1 . A remarkable feature of the capacity achieving scheme is that if we eliminate any subset of messages (by setting the message symbols to zero), the resulting scheme also achieves the PIR capacity for the remaining subset of messages.
We consider the problem of private information retrieval (PIR) over a distributed storage system. The storage system consists of N non-colluding databases, each storing an MDS-coded version of M ...messages. In the PIR problem, the user wishes to retrieve one of the available messages without revealing the message identity to any individual database. We derive the information-theoretic capacity of this problem, which is defined as the maximum number of bits of the desired message that can be privately retrieved per one bit of downloaded information. We show that the PIR capacity in this case is C = (1 + K/N + K2/N2 + ⋯ + KM-1/NM-1)-1 = (1 + Rc + Rc2 + ⋯ + RcM-1)-1 = (1 - Rc)/(1 - RcM), where Rc is the rate of the (N, K) MDS code used. The capacity is a function of the code rate and the number of messages only regardless of the explicit structure of the storage code. The result implies a fundamental tradeoff between the optimal retrieval cost and the storage cost when the storage code is restricted to the class of MDS codes. The result generalizes the achievability and converse results for the classical PIR with replicated databases to the case of MDS-coded databases.
Private Information Retrieval With Side Information Kadhe, Swanand; Garcia, Brenden; Heidarzadeh, Anoosheh ...
IEEE transactions on information theory,
2020-April, 2020-4-00, Letnik:
66, Številka:
4
Journal Article
Recenzirano
Odprti dostop
We study the problem of Private Information Retrieval (PIR) in the presence of prior side information. The problem setup includes a database of <inline-formula> <tex-math notation="LaTeX">K ...</tex-math></inline-formula> independent messages possibly replicated on several servers, and a user that needs to retrieve one of these messages. In addition, the user has some prior side information in the form of a subset of <inline-formula> <tex-math notation="LaTeX">M </tex-math></inline-formula> messages, not containing the desired message and unknown to the servers. This problem is motivated by practical settings in which the user can obtain side information opportunistically from other users or has previously downloaded some messages using classical PIR schemes. The objective of the user is to retrieve the required message with downloading minimum amount of data from the servers while achieving information-theoretic privacy in one of the following two scenarios: (i) the user wants to protect jointly the identities of the demand and the side information; (ii) the user wants to protect only the identity of the demand, but not necessarily the side information. To highlight the role of side information, we focus first on the case of a single server (single database). In the first scenario, we prove that the minimum download cost is <inline-formula> <tex-math notation="LaTeX">K-M </tex-math></inline-formula> messages, and in the second scenario it is <inline-formula> <tex-math notation="LaTeX">\lceil K/(M+1)\rceil </tex-math></inline-formula> messages, which should be compared to <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula> messages-the minimum download cost in the case of no side information. Then, we extend some of our results to the case of the database replicated on multiple servers. Our proof techniques relate PIR with side information to the index coding problem. We leverage this connection to prove converse results, as well as to design achievability schemes.
Private information retrieval (PIR) is the problem of retrieving as efficiently as possible, one out of <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula> messages from ...<inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula> non-communicating replicated databases (each holds all <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula> messages) while keeping the identity of the desired message index a secret from each individual database. The information theoretic capacity of PIR (equivalently, the reciprocal of minimum download cost) is the maximum number of bits of desired information that can be privately retrieved per bit of downloaded information. <inline-formula> <tex-math notation="LaTeX">T </tex-math></inline-formula>-private PIR is a generalization of PIR to include the requirement that even if any <inline-formula> <tex-math notation="LaTeX">T </tex-math></inline-formula> of the <inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula> databases collude, the identity of the retrieved message remains completely unknown to them. Robust PIR is another generalization that refers to the scenario where we have <inline-formula> <tex-math notation="LaTeX">M \geq N </tex-math></inline-formula> databases, out of which any <inline-formula> <tex-math notation="LaTeX">M - N </tex-math></inline-formula> may fail to respond. For <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula> messages and <inline-formula> <tex-math notation="LaTeX">M\geq N </tex-math></inline-formula> databases out of which at least some <inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula> must respond, we show that the capacity of <inline-formula> <tex-math notation="LaTeX">T </tex-math></inline-formula>-private and Robust PIR is <inline-formula> <tex-math notation="LaTeX">(1+T/N+T^{2}/N^{2}+\cdots +T^{K-1}/N^{K-1})^{-1} </tex-math></inline-formula>. The result includes as special cases the capacity of PIR without robustness (<inline-formula> <tex-math notation="LaTeX">M=N </tex-math></inline-formula>) or <inline-formula> <tex-math notation="LaTeX">T </tex-math></inline-formula>-privacy constraints (<inline-formula> <tex-math notation="LaTeX">T=1 </tex-math></inline-formula>).
Web-Scale Responsive Visual Search at Bing Hu, Houdong; Wang, Yan; Yang, Linjun ...
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,
07/2018
Conference Proceeding
Odprti dostop
In this paper, we introduce a web-scale general visual search system deployed in Microsoft Bing. The system accommodates tens of billions of images in the index, with thousands of features for each ...image, and can respond in less than 200 ms. In order to overcome the challenges in relevance, latency, and scalability in such large scale of data, we employ a cascaded learning-to-rank framework based on various latest deep learning visual features, and deploy in a distributed heterogeneous computing platform. Quantitative and qualitative experiments show that our system is able to support various applications on Bing website and apps.
BLINKS He, Hao; Wang, Haixun; Yang, Jun ...
International Conference on Management of Data: Proceedings of the 2007 ACM SIGMOD international conference on Management of data; 11-14 June 2007,
06/2007
Conference Proceeding
Query processing over graph-structured data is enjoying a growing number of applications. A top-k keyword search query on a graph finds the top k answers according to some ranking criteria, where ...each answer is a substructure of the graph containing all query keywords. Current techniques for supporting such queries on general graphs suffer from several drawbacks, e.g., poor worst-case performance, not taking full advantage of indexes, and high memory requirements. To address these problems, we propose BLINKS, a bi-level indexing and query processing scheme for top-k keyword search on graphs. BLINKS follows a search strategy with provable performance bounds, while additionally exploiting a bi-level index for pruning and accelerating the search. To reduce the index space, BLINKS partitions a data graph into blocks: The bi-level index stores summary information at the block level to initiate and guide search among blocks, and more detailed information for each block to accelerate search within blocks. Our experiments show that BLINKS offers orders-of-magnitude performance improvement over existing approaches.
Turboiso Han, Wook-Shin; Lee, Jinsoo; Lee, Jeong-Hoon
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data,
06/2013
Conference Proceeding
Given a query graph q and a data graph g, the subgraph isomorphism search finds all occurrences of q in g and is considered one of the most fundamental query types for many real applications. While ...this problem belongs to NP-hard, many algorithms have been proposed to solve it in a reasonable time for real datasets. However, a recent study has shown, through an extensive benchmark with various real datasets, that all existing algorithms have serious problems in their matching order selection. Furthermore, all algorithms blindly permutate all possible mappings for query vertices, often leading to useless computations. In this paper, we present an efficient and robust subgraph search solution, called TurboISO, which is turbo-charged with two novel concepts, candidate region exploration and the combine and permute strategy (in short, Comb/Perm). The candidate region exploration identifies on-the-fly candidate subgraphs (i.e, candidate regions), which contain embeddings, and computes a robust matching order for each candidate region explored. The Comb/Perm strategy exploits the novel concept of the neighborhood equivalence class (NEC). Each query vertex in the same NEC has identically matching data vertices. During subgraph isomorphism search, Comb/Perm generates only combinations for each NEC instead of permutating all possible enumerations. Thus, if a chosen combination is determined to not contribute to a complete solution, all possible permutations for that combination will be safely pruned. Extensive experiments with many real datasets show that TurboISO consistently and significantly outperforms all competitors by up to several orders of magnitude.
Deep Learning for Audio Signal Processing Purwins, Hendrik; Li, Bo; Virtanen, Tuomas ...
IEEE journal of selected topics in signal processing,
05/2019, Letnik:
13, Številka:
2
Journal Article
Recenzirano
Odprti dostop
Given the recent surge in developments of deep learning, this paper provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental ...sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e., audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.
Query-based moment retrieval aims to localize the most relevant moment in an untrimmed video according to the given natural language query. Existing works often only focus on one aspect of this ...emerging task, such as the query representation learning, video context modeling or multi-modal fusion, thus fail to develop a comprehensive system for further performance improvement. In this paper, we introduce a novel Cross-Modal Interaction Network (CMIN) to consider multiple crucial factors for this challenging task, including (1) the syntactic structure of natural language queries; (2) long-range semantic dependencies in video context and (3) the sufficient cross-modal interaction. Specifically, we devise a syntactic GCN to leverage the syntactic structure of queries for fine-grained representation learning, propose a multi-head self-attention to capture long-range semantic dependencies from video context, and next employ a multi-stage cross-modal interaction to explore the potential relations of video and query contents. The extensive experiments demonstrate the effectiveness of our proposed method.
PSGAN Lu, Shuqi; Dou, Zhicheng; Jun, Xu ...
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval,
07/2019
Conference Proceeding
Personalized search aims to adapt document ranking to user's personal interests. Traditionally, this is done by extracting click and topical features from historical data in order to construct a user ...profile. In recent years, deep learning has been successfully used in personalized search due to its ability of automatic feature learning. However, the small amount of noisy personal data poses challenges to deep learning models to learn the personalized classification boundary between relevant and irrelevant results. In this paper, we propose PSGAN, a Generative Adversarial Network (GAN) framework for personalized search. By means of adversarial training, we enforce the model to pay more attention to training data that are difficult to distinguish. We use the discriminator to evaluate personalized relevance of documents and use the generator to learn the distribution of relevant documents. Two alternative ways to construct the generator in the framework are tested: based on the current query or based on a set of generated queries. Experiments on data from a commercial search engine show that our models can yield significant improvements over state-of-the-art models.