Relevance feedback (RF) has been studied under laboratory conditions using test collections and either test persons or simple simulation. These studies have given mixed results. Automatic (or pseudo) ...RF and intellectual RF, both leading to query reformulation, are the main approaches to explicit RF. In the present study we perform RF with the help of classification of search results. We conduct our experiments in a comprehensive collection, namely various TREC ad-hoc collections with 250 topics. We also studied various term space reduction techniques for the classification process. The research questions are: given RF on top results of pseudo RF (PRF) query results, is it possible to learn effective classifiers for the following results? What is the effectiveness of various classification methods? Our findings indicate that this approach of applying RF is significantly more effective than PRF with short (title) queries and long (title and description) queries.
In enterprise information systems (EISs) it is necessary to model, integrate and compute very diverse data. In advanced EISs the stored data often are based both on structured (e.g. relational) and ...semi-structured (e.g. XML) data models. In addition, the ad hoc information needs of end-users may require the manipulation of data-oriented (structural), behavioural and deductive aspects of data. Contemporary languages capable of treating this kind of diversity suit only persons with good programming skills. In this paper we present a concept-oriented query language approach to manipulate this diversity so that the programming skill requirements are considerably reduced. In our query language, the features which need technical knowledge are hidden in application-specific concepts and structures. Therefore, users need not be aware of the underlying technology. Application-specific concepts and structures are represented by the modelling primitives of the extended RDOOM (relational deductive object-oriented modelling) which contains primitives for all crucial real world relationships (is-a relationship, part-of relationship, association), XML documents and views. Our query language also supports intensional and extensional-intensional queries, in addition to conventional extensional queries. In its query formulation, the end-user combines available application-specific concepts and structures through shared variables.
In an earlier study, we presented a query key goodness scheme, which can be used to separate between good and bad query keys. The scheme is based on the relative average term frequency (RATF) values ...of query keys. In the present paper, we tested the effectiveness of the scheme in Finnish to English cross-language retrieval in several experiments. Query keys were weighted and queries were reduced based on the RATF values of keys. The tests were carried out in TREC and CLEF document collections using the InQuery retrieval system. The TREC tests indicated that the best RATF-based queries delivered substantial and statistically significant performance improvements, and performed as well as syn-structured queries shown to be effective in many CLIR studies. The CLEF tests indicated the limitations of the use of RATF in CLIR. However, the best RATF-based queries performed better than baseline queries also in the CLEF collection.
The paper analyzes the citation impact of Library and Information Science (LIS) research articles published in 31 leading international LIS journals in 2015. The main research question is: to what ...degree do authors’ disciplinary composition in association with topic, methodology, and type of contribution affect their citation impact? The impact is analyzed in terms of the number of citations received and their authority, using outlier normalization and subfield normalization. Quantitative content analysis is used to analyze article characteristics including topic, methodology, type of contribution, and the disciplinary composition of their author teams. The citations received by the articles are traced from 2015 to May 2021. Citing document authority is measured by the citations they had received up to May 2021. The overall finding was that authors’ disciplinary composition is significantly associated with citation scores. The differences in citation scores between disciplinary compositions appeared typically within information retrieval and scientific communication. In both topics LIS and computer science jointly received significantly higher citation scores than many disciplines like LIS alone or humanities in information retrieval; or natural sciences, medicine, or social sciences alone in scientific communication. The paper is original in reporting a joint analysis of content characteristics, authorship composition, and impact.
It is nowadays generally agreed that a person's information seeking depends on his or her tasks and the problems encountered in performing them. The relationships of broad job types and ...information-seeking characteristics have been analyzed both conceptually and empirically, mostly through questionnaires after task performance rather than during task performance. In this article, the relationships of task complexity, necessary information types, information channels, and sources are analyzed at the task level on the basis of a qualitative investigation. Tasks were categorized in five complexity classes and information into problem information, domain information, and problem-solving information. Moreover, several classifications of information channels and sources were utilized. The data were collected in a public administration setting through diaries, which were written during task performance, and questionnaires. The findings were structured into work charts for each task and summarized in qualitative process description tables for each task complexity category. Quantitative indices further summarizing the results were also computed. The findings indicate systematic and logical relationships among task complexity, types of information, information channels, and sources.
n-grams have been used widely and successfully for approximate string matching in many areas.
s-grams have been introduced recently as an
n-gram based matching technique, where di-grams are formed of ...both adjacent and non-adjacent characters.
s-grams have proved successful in approximate string matching across language boundaries in Information Retrieval (IR).
s-grams however lack precise definitions. Also their similarity comparison lacks precise definition. In this paper, we give precise definitions for both. Our definitions are developed in a bottom-up manner, only assuming character strings and elementary mathematical concepts. Extending established practices, we provide novel definitions of
s-gram profiles and the
L
1 distance metric for them. This is a stronger string proximity measure than the popular Jaccard similarity measure because Jaccard is insensitive to the counts of each
n-gram in the strings to be compared. However, due to the popularity of Jaccard in IR experiments, we define the reduction of
s-gram profiles to binary profiles in order to precisely define the (extended) Jaccard similarity function for
s-grams. We also show that
n-gram similarity/distance computations are special cases of our generalized definitions.
There are several kinds of conceptual models for information seeking and retrieval (IS&R). The paper suggests that some models are of a summary type and others more analytic. Such models serve ...different research purposes. The purpose of this paper is to discuss the functions of conceptual models in scientific research, in IS&R research in particular. What kind of models are there and in what ways may they help the investigators? What kinds of models are needed for various purposes? In particular, we are looking for models that provide guidance in setting research questions, and formulation of hypotheses. As a example, the paper discusses at length one analytical model of task-based information seeking and its contribution to the development of the research area.
The article presents the search situation transition (SST) method for analysing Web information search (WIS) processes. The idea of the method is to analyse searching behaviour, the process, in ...detail and connect both the searchers' actions (captured in a log) and his/her intentions and goals, which log analysis never captures. On the other hand, ex post factor surveys, while popular in WIS research, cannot capture the actual search processes. The method is presented through three facets: its domain, its procedure, and its justification. The method's domain is presented in the form of a conceptual framework which maps five central categories that influence WIS processes; the searcher, the social/organisational environment, the work task, the search task, and the process itself. The method's procedure includes various techniques for data collection and analysis. The article presents examples from real WIS processes and shows how the method can be used to identify the interplay of the categories during the processes. It is shown that the method presents a new approach in information seeking and retrieval by focusing on the search process as a phenomenon and by explicating how different information seeking factors directly affect the search process.
The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough that an IR system is ...able to make a distinction between relevant and non-relevant documents. The reduction of information overload requires that IR systems provide the capability of screening the most valuable documents out of the mass of potentially or marginally relevant documents. This paper introduces a new concept-based method to analyse the text characteristics of documents at varying relevance levels. The results of the document analysis were applied in an experiment on query expansion (QE) in a probabilistic IR system. Statistical differences in textual characteristics of highly relevant and less relevant documents were investigated by applying a facet analysis technique. In highly relevant documents a larger number of aspects of the request were discussed, searchable expressions for the aspects were distributed over a larger set of text paragraphs, and a larger set of unique expressions were used per aspect than in marginally relevant documents. A query expansion experiment verified that the findings of the text analysis can be exploited in formulating more effective queries for best match retrieval in the search for highly relevant documents. The results revealed that expanded queries with concept-based structures performed better than unexpanded queries or Ñnatural languageÒ queries. Further, it was shown that highly relevant documents benefit essentially more from the concept-based QE in ranking than marginally relevant documents.