A recent “third wave” of neural network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language ...processing. Because these modern NNs often comprise multiple interconnected layers, work in this area is often referred to as
deep learning
. Recent years have witnessed an explosive growth of research into NN-based approaches to information retrieval (IR). A significant body of work has now been created. In this paper, we survey the current landscape of
Neural IR
research, paying special attention to the use of learned distributed representations of textual units. We highlight the successes of neural IR thus far, catalog obstacles to its wider adoption, and suggest potentially promising directions for future research.
When APIs evolve, clients make corresponding changes to their applications to utilize new or updated APIs. Despite the benefits of new or updated APIs, developers are often slow to adopt the new ...APIs. As a first step toward understanding the impact of API evolution on software ecosystems, we conduct an in-depth case study of the co-evolution behavior of Android API and dependent applications using the version history data found in github. Our study confirms that Android is evolving fast at a rate of 115 API updates per month on average. Client adoption, however, is not catching up with the pace of API evolution. About 28% of API references in client applications are outdated with a median lagging time of 16 months. 22% of outdated API usages eventually upgrade to use newer API versions, but the propagation time is about 14 months, much slower than the average API release interval (3 months). Fast evolving APIs are used more by clients than slow evolving APIs but the average time taken to adopt new versions is longer for fast evolving APIs. Further, API usage adaptation code is more defect prone than the one without API usage adaptation. This may indicate that developers avoid API instability.
When collecting item ratings from human judges, it can be difficult to measure and enforce data quality due to task subjectivity and lack of transparency into how judges make each rating decision. To ...address this, we investigate asking judges to provide a specific form of rationale supporting each rating decision. We evaluate this approach on an information retrieval task in which human judges rate the relevance of Web pages for different search topics. Cost-benefit analysis over 10,000 judgments collected on Amazon’s Mechanical Turk suggests a win-win. Firstly, rationales yield a multitude of benefits: more reliable judgments, greater transparency for evaluating both human raters and their judgments, reduced need for expert gold, the opportunity for dual-supervision from ratings and rationales, and added value from the rationales themselves. Secondly, once experienced in the task, crowd workers provide rationales with almost no increase in task completion time. Consequently, we can realize the above benefits with minimal additional cost.
Surrogate-assisted neuroevolution Greenwood, Bryson; McDonnell, Tyler
Proceedings of the Genetic and Evolutionary Computation Conference,
07/2022
Conference Proceeding
Though Neuroevolution (NE) and Neural Architecture Search (NAS) have emerged as techniques for automating the design of neural networks, they are expensive and time consuming: they require training ...many neural networks and have largely resisted the benefits of surrogate-based optimization approaches, as it is difficult to model the performance of variable network architectures. We propose a novel and general framework for surrogate-assisted search of neural architectures consisting of two components: (1) an algorithm which leverages grammars to generate tensor representations of variable neural network topologies; and an evolutionary algorithm which employs a surrogate model to expedite architecture search using active learning. We demonstrate that our model can produce accurate performance predictions for unseen architectures, realizing a 5x reduction in the total compute required for search while improving asymptotic performance. We also illustrate that the surrogate models are transferable to new domains via a real-world transfer learning case study using industrial time series data.
Numerous approaches have been explored for graph clustering, including those which optimize a global criteria such as modularity. More recently, Graph Neural Networks (GNNs), which have produced ...state-of-the-art results in graph analysis tasks such as node classification and link prediction, have been applied for unsupervised graph clustering using these modularity-based metrics. Modularity, though robust for many practical applications, suffers from the resolution limit problem, in which optimization may fail to identify clusters smaller than a scale that is dependent on properties of the network. In this paper, we propose a new GNN framework which draws from the Potts model in physics to overcome this limitation. Experiments on a variety of real world datasets show that this model achieves state-of-the-art clustering results.
Crowd vs. Expert Kutlu, Mucahid; McDonnell, Tyler; Barkallah, Yassmine ...
The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval,
06/2018
Conference Proceeding
While crowdsourcing offers a low-cost, scalable way to collect relevance judgments, lack of transparency with remote crowd work has limited understanding about the quality of collected judgments. In ...prior work, we showed a variety of benefits from asking crowd workers to provide \em rationales for each relevance judgment \citemcdonnell2016relevant. In this work, we scale up our rationale-based judging design to assess its reliability on the 2014 TREC Web Track, collecting roughly 25K crowd judgments for 5K document-topic pairs. We also study having crowd judges perform topic-focused judging, rather than across topics, finding this improves quality. Overall, we show that crowd judgments can be used to reliably rank IR systems for evaluation. We further explore the potential of rationales to shed new light on reasons for judging disagreement between experts and crowd workers. Our qualitative and quantitative analysis distinguishes subjective vs.\ objective forms of disagreement, as well as the relative importance of each disagreement cause, and we present a new taxonomy for organizing the different types of disagreement we observe. We show that many crowd disagreements seem valid and plausible, with disagreement in many cases due to judging errors by the original TREC assessors. We also share our WebCrowd25k dataset, including: (1) crowd judgments with rationales, and (2) taxonomy category labels for each judging disagreement analyzed.
Results on the effects of ionizing radiation on the signal produced by plastic scintillating rods manufactured by Eljen Technology company are presented for various matrix materials, dopant ...concentrations, fluors (EJ-200 and EJ-260), anti-oxidant concentrations, scintillator thickness, doses, and dose rates. The light output before and after irradiation is measured using an alpha source and a photomultiplier tube, and the light transmission by a spectrophotometer. Assuming an exponential decrease in the light output with dose, the change in light output is quantified using the exponential dose constant \(D\). The \(D\) values are similar for primary and secondary doping concentrations of 1 and 2 times, and for antioxidant concentrations of 0, 1, and 2 times, the default manufacturer's concentration. The \(D\) value depends approximately linearly on the logarithm of the dose rate for dose rates between 2.2 Gy/hr and 70 Gy/hr for all materials. For EJ-200 polyvinyltoluene-based (PVT) scintillator, the dose constant is approximately linear in the logarithm of the dose rate up to 3400 Gy/hr, while for polystyrene-based (PS) scintillator or for both materials with EJ-260 fluors, it remains constant or decreases (depending on doping concentration) above about 100 Gy/hr. The results from rods of varying thickness and from the different fluors suggest damage to the initial light output is a larger effect than color center formation for scintillator thickness \(\leq1\) cm. For the blue scintillator (EJ-200), the transmission measurements indicate damage to the fluors. We also find that while PVT is more resistant to radiation damage than PS at dose rates higher than about 100 Gy/hr for EJ-200 fluors, they show similar damage at lower dose rates and for EJ-260 fluors.
Crowdsourcing offers an affordable and scalable means to collect relevance judgments for IR test collections. However, crowd assessors may show higher variance in judgment quality than trusted ...assessors. In this paper, we investigate how to effectively utilize both groups of assessors in partnership. We specifically investigate how agreement in judging is correlated with three factors: relevance category, document rankings, and topical variance. Based on this, we then propose two collaborative judging methods in which a portion of the document-topic pairs are assessed by in-house judges while the rest are assessed by crowd-workers. Experiments conducted on two TREC collections show encouraging results when we distribute work intelligently between our two groups of assessors.