Users often fail to formulate their complex information needs in a single query. As a consequence, they may need to scan multiple result pages or reformulate their queries, which may be a frustrating ...experience. Alternatively, systems can improve user satisfaction by proactively asking questions of the users to clarify their information needs. Asking clarifying questions is especially important in conversational systems since they can only return a limited number of (often only one) result(s).
In this paper, we formulate the task of asking clarifying questions in open-domain information-seeking conversational systems. To this end, we propose an offline evaluation methodology for the task and collect a dataset, called Qulac, through crowdsourcing. Our dataset is built on top of the TREC Web Track 2009-2012 data and consists of over 10K question-answer pairs for 198 TREC topics with 762 facets. Our experiments on an oracle model demonstrate that asking only one good question leads to over 170% retrieval performance improvement in terms of P@1, which clearly demonstrates the potential impact of the task. We further propose a retrieval framework consisting of three components: question retrieval, question selection, and document retrieval. In particular, our question selection model takes into account the original query and previous question-answer interactions while selecting the next question. Our model significantly outperforms competitive baselines. To foster research in this area, we have made Qulac publicly available.
In this paper, we present a new multimedia retrieval paradigm to innovate large-scale search of heterogenous multimedia data. It is able to return results of different media types from heterogeneous ...data sources, e.g., using a query image to retrieve relevant text documents or images from different data sources. This utilizes the widely available data from different sources and caters for the current users' demand of receiving a result list simultaneously containing multiple types of data to obtain a comprehensive understanding of the query's results. To enable large-scale inter-media retrieval, we propose a novel inter-media hashing (IMH) model to explore the correlations among multiple media types from different data sources and tackle the scalability issue. To this end, multimedia data from heterogeneous data sources are transformed into a common Hamming space, in which fast search can be easily implemented by XOR and bit-count operations. Furthermore, we integrate a linear regression model to learn hashing functions so that the hash codes for new data points can be efficiently generated. Experiments conducted on real-world large-scale multimedia datasets demonstrate the superiority of our proposed method compared with state-of-the-art techniques.
Private information retrieval (PIR) is the problem of retrieving, as efficiently as possible, one out of <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula> messages from ...<inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula> non-communicating replicated databases (each holds all <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula> messages) while keeping the identity of the desired message index a secret from each individual database. Symmetric PIR (SPIR) is a generalization of PIR to include the requirement that beyond the desired message, the user learns nothing about the other <inline-formula> <tex-math notation="LaTeX">K-1 </tex-math></inline-formula> messages. The information theoretic capacity of SPIR (equivalently, the reciprocal of minimum download cost) is the maximum number of bits of desired information that can be privately retrieved per bit of downloaded information. We show that the capacity of SPIR is <inline-formula> <tex-math notation="LaTeX">1-1/N </tex-math></inline-formula> regardless of the number of messages <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>, if the databases have access to common randomness (not available to the user) that is independent of the messages, in the amount that is at least <inline-formula> <tex-math notation="LaTeX">1/(N-1) </tex-math></inline-formula> bits per desired message bit. Otherwise, if the amount of common randomness is less than <inline-formula> <tex-math notation="LaTeX">1/(N-1) </tex-math></inline-formula> bits per message bit, then the capacity of SPIR is zero. Extensions to the capacity region of SPIR and the capacity of finite length SPIR are provided.
The milestone improvements brought about by deep representation learning and pre-training techniques have led to large performance gains across downstream NLP, IR and Vision tasks. Multimodal ...modeling techniques aim to leverage large high-quality visio-linguistic datasets for learning complementary information across image and text modalities. In this paper, we introduce the Wikipedia-based Image Text (WIT) Dataset to better facilitate multimodal, multilingual learning. WIT is composed of a curated set of 37.5 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia languages. Its size enables WIT to be used as a pretraining dataset for multimodal models, as we show when applied to downstream tasks such as image-text retrieval. WIT has four main and unique advantages. First, WIT is the largest multimodal dataset by the number of image-text examples by 3x (at the time of writing). Second, WIT is massively multilingual (first of its kind) with coverage over 100+ languages (each of which has at least 12K examples) and provides cross-lingual texts for many images. Third, WIT represents a more diverse set of concepts and real world entities relative to what previous datasets cover. Lastly, WIT provides a very challenging real-world test set, as we empirically illustrate using an image-text retrieval task as an example. WIT Dataset is available for download and use via a Creative Commons license here: https://github.com/google-research-datasets/wit.
User behavior data, such as ratings and clicks, has been widely used to build personalizing models for recommender systems. However, many unflattering factors (e.g., popularity, ranking position, ...users’ selection) significantly affect the performance of the learned recommendation model. Most existing work on unbiased recommendation addressed these biases from sample granularity (e.g., sample reweighting, data augmentation) or from the perspective of representation learning (e.g., bias-modeling). However, these methods are usually designed for a specific bias, lacking the universal capability to handle complex situations where multiple biases co-exist. Besides, rare work frees itself from laborious and sophisticated debiasing configurations (e.g., propensity scores, imputed values, or user behavior-generating process). Towards this research gap, in this article, we propose a universal G enerative framework for B ias D isentanglement termed as GBD , constantly generating calibration perturbations for the intermediate representations during training to keep them from being affected by the bias. Specifically, a bias-identifier that tries to retrieve the bias-related information from the representations is first introduced. Subsequently, the calibration perturbations are generated to significantly deteriorate the bias-identifier’s performance, making the bias gradually disentangled from the calibrated representations. Therefore, without relying on notorious debiasing configurations, a bias-agnostic model is obtained under the guidance of the bias identifier. We further present its universality by subsuming the representative biases and their mixture under the proposed framework. Finally, extensive experiments on the real-world, synthetic, and semi-synthetic datasets have demonstrated the superiority of the proposed approach against a wide range of recommendation debiasing methods. The code is available at https://github.com/Zhidan-Wang/GBD .
The first-stage retrieval aims to retrieve a subset of candidate documents from a huge collection both effectively and efficiently. Since various matching patterns can exist between queries and ...relevant documents, previous work tries to combine multiple retrieval models to find as many relevant results as possible. The constructed ensembles, whether learned independently or jointly, do not care which component model is more suitable to an instance during training. Thus, they cannot fully exploit the capabilities of different types of retrieval models in identifying diverse relevance patterns. Motivated by this observation, in this paper, we propose a Mixture-of-Experts (MoE) model consisting of representative matching experts and a novel competitive learning mechanism to let the experts develop and enhance their expertise during training. Specifically, our MoE model shares the bottom layers to learn common semantic representations and uses differently structured upper layers to represent various types of retrieval experts. Our competitive learning mechanism has two stages: (1) a standardized learning stage to train the experts equally to develop their capabilities to conduct relevance matching; (2) a specialized learning stage where the experts compete with each other on every training instance and get rewards and updates according to their performance to enhance their expertise on certain types of samples. Experimental results on retrieval benchmark datasets show that our method significantly outperforms the state-of-the-art baselines in the in-domain and out-of-domain settings.
Intelligent assistants are increasingly being used on smart speaker devices, such as Amazon Echo, Google Home, Apple Homepod, and Harmon Kardon Invoke with Cortana. Typically, user satisfaction ...measurement relies on user interaction signals, such as clicks and scroll movements, in order to determine if a user was satisfied. However, these signals do not exist for smart speakers, which creates a challenge for user satisfaction evaluation on these devices. In this paper, we propose a new signal, user intent, as a means to measure user satisfaction. We propose to use this signal to model user satisfaction in two ways: 1) by developing intent sensitive word embeddings and then using sequences of these intent sensitive query representations to measure user satisfaction; 2) by representing a user's interactions with a smart speaker as a sequence of user intents and thus using this sequence to identify user satisfaction. Our experimental results indicate that our proposed user satisfaction models based on the intent-sensitive query representations have statistically significant improvements over several baselines in terms of common classification evaluation metrics. In particular, our proposed task satisfaction prediction model based on intent-sensitive word embeddings has a 11.81% improvement over a generative model baseline and 6.63% improvement over a user satisfaction prediction model based on Skip-gram word embeddings in terms of the F1 metric. Our findings have implications for the evaluation of Intelligent Assistant systems.
This paper studies the problem of \em answering Why-questions for graph pattern queries. Given a query Q, its answers $Q(G)$ in a graph G, and an exemplar $\E$ that describes desired answers, it aims ...to compute a query rewrite $Q'$, such that $Q'(G)$ incorporates relevant entities and excludes irrelevant ones wrt $\E$ under a closeness measure. (1) We characterize the problem by \em Q-Chase. It rewrites Q by applying a sequence of applicable operators guided by $\E$, and backtracks to derive optimal query rewrite. (2) We develop feasible Q-Chase-based algorithms, from anytime solutions to fixed-parameter approximations to compute query rewrites. These algorithms implement Q-Chase by detecting picky operators at run time, which discriminately enforce $\E$ to retain answers that are closer to exemplars, and effectively prune both operators and irrelevant matches, by consulting a cache of star patterns (called \em star views ). Using real-world graphs, we experimentally verify the efficiency and effectiveness of \qchase techniques and their applications.
We propose a new capacity-achieving code for the private information retrieval (PIR) problem, and show that it has the minimum message size (being one less than the number of servers) and the minimum ...upload cost (being roughly linear in the number of messages) among a general class of capacity-achieving codes, and in particular, among all capacity-achieving linear codes. Different from existing code constructions, the proposed code is asymmetric, and this asymmetry appears to be the key factor leading to the optimal message size and the optimal upload cost. The converse results on the message size and the upload cost are obtained by an analysis of the information theoretic proof of the PIR capacity, from which a set of critical properties of any capacity-achieving code in the code class of interest is extracted. The symmetry structure of the PIR problem is then analyzed, which allows us to construct symmetric codes from asymmetric ones, yielding a meaningful bridge between the proposed code and existing ones in the literature.
In the private information retrieval (PIR) problem, a user wishes to retrieve, as efficiently as possible, one out of K messages from N non-communicating databases (each holds all K messages) while ...revealing nothing about the identity of the desired message index to any individual database. The information theoretic capacity of PIR is the maximum number of bits of desired information that can be privately retrieved per bit of downloaded information. For K messages and N databases, we show that the PIR capacity is (1+1/N+1/N 2 +· · ·+1/N K-1 ) -1 . A remarkable feature of the capacity achieving scheme is that if we eliminate any subset of messages (by setting the message symbols to zero), the resulting scheme also achieves the PIR capacity for the remaining subset of messages.