The PubMed archive contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. ...Computationally efficient and interpretable tools are needed to help researchers find and understand associations between biomedical concepts. The goal of literature-based discovery (LBD) is to connect concepts in isolated literature domains that would normally go undiscovered. This usually takes the form of an A-B-C relationship, where A and C terms are linked through a B term intermediate. Here we describe Serial KinderMiner (SKiM), an LBD algorithm for finding statistically significant links between an A term and one or more C terms through some B term intermediate(s). The development of SKiM is motivated by the observation that there are only a few LBD tools that provide a functional web interface, and that the available tools are limited in one or more of the following ways: (1) they identify a relationship but not the type of relationship, (2) they do not allow the user to provide their own lists of B or C terms, hindering flexibility, (3) they do not allow for querying thousands of C terms (which is crucial if, for instance, the user wants to query connections between a disease and the thousands of available drugs), or (4) they are specific for a particular biomedical domain (such as cancer). We provide an open-source tool and web interface that improves on all of these issues.
We demonstrate SKiM's ability to discover useful A-B-C linkages in three control experiments: classic LBD discoveries, drug repurposing, and finding associations related to cancer. Furthermore, we supplement SKiM with a knowledge graph built with transformer machine-learning models to aid in interpreting the relationships between terms found by SKiM. Finally, we provide a simple and intuitive open-source web interface ( https://skim.morgridge.org ) with comprehensive lists of drugs, diseases, phenotypes, and symptoms so that anyone can easily perform SKiM searches.
SKiM is a simple algorithm that can perform LBD searches to discover relationships between arbitrary user-defined concepts. SKiM is generalized for any domain, can perform searches with many thousands of C term concepts, and moves beyond the simple identification of an existence of a relationship; many relationships are given relationship type labels from our knowledge graph.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
IMPORTANCE: Giant cell arteritis (GCA) is the most common systemic vasculitis in elderly individuals. Diagnosis is confirmed by temporal artery (TA) biopsy, although biopsy results are often ...negative. Despite the use of corticosteroids, disease may progress. Identification of causal agents will improve outcomes. Biopsy-positive GCA is associated with TA infection by varicella-zoster virus (VZV). OBJECTIVE: To analyze VZV infection in TAs of patients with clinically suspected GCA whose TAs were histopathologically negative and in normal TAs removed post mortem from age-matched individuals. DESIGN, SETTING, AND PARTICIPANTS: A cross-sectional study for VZV antigen was performed from January 2013 to March 2015 using archived, deidentified, formalin-fixed, paraffin-embedded GCA-negative, GCA-positive, and normal TAs (50 sections/TA) collected during the past 30 years. Regions adjacent to those containing VZV were examined by hematoxylin-eosin staining. Immunohistochemistry identified inflammatory cells and cell types around nerve bundles containing VZV. A combination of 17 tertiary referral centers and private practices worldwide contributed archived TAs from individuals older than 50 years. MAIN OUTCOMES AND MEASURES: Presence and distribution of VZV antigen in TAs and histopathological changes in sections adjacent to those containing VZV were confirmed by 2 independent readers. RESULTS: Varicella-zoster virus antigen was found in 45 of 70 GCA-negative TAs (64%), compared with 11 of 49 normal TAs (22%) (relative risk RR = 2.86; 95% CI, 1.75-5.31; P < .001). Extension of our earlier study revealed VZV antigen in 68 of 93 GCA-positive TAs (73%), compared with 11 of 49 normal TAs (22%) (RR = 3.26; 95% CI, 2.03-5.98; P < .001). Compared with normal TAs, VZV antigen was more likely to be present in the adventitia of both GCA-negative TAs (RR = 2.43; 95% CI, 1.82-3.41; P < .001) and GCA-positive TAs (RR = 2.03; 95% CI, 1.52-2.86; P < .001). Varicella-zoster virus antigen was frequently found in perineurial cells expressing claudin-1 around nerve bundles. Of 45 GCA-negative participants whose TAs contained VZV antigen, 1 had histopathological features characteristic of GCA, and 16 (36%) showed adventitial inflammation adjacent to viral antigen; no inflammation was seen in normal TAs. CONCLUSIONS AND RELEVANCE: In patients with clinically suspected GCA, prevalence of VZV in their TAs is similar independent of whether biopsy results are negative or positive pathologically. Antiviral treatment may confer additional benefit to patients with biopsy-negative GCA treated with corticosteroids, although the optimal antiviral regimen remains to be determined.
The large data volumes expected from the High Luminosity LHC (HL-LHC) present challenges to existing paradigms and facilities for end-user data analysis. Modern cyberinfrastructure tools provide a ...diverse set of services that can be composed into a system that provides physicists with powerful tools that give them straightforward access to large computing resources, with low barriers to entry. The Coffea-Casa analysis facility (AF) provides an environment for end users enabling the execution of increasingly complex analyses such as those demonstrated by the Analysis Grand Challenge (AGC) and capturing the features that physicists will need for the HL-LHC. We describe the development progress of the Coffea-Casa facility featuring its modularity while demonstrating the ability to port and customize the facility software stack to other locations. The facility also facilitates the support of batch systems while staying Kubernetes-native. We present the evolved architecture of the facility, such as the integration of advanced data delivery services (e.g. ServiceX) and making data caching services (e.g. XCache) available to end users of the facility. We also highlight the composability of modern cyberinfrastructure tools. To enable machine learning pipelines at coffee-casa analysis facilities, a set of industry ML solutions adopted for HEP columnar analysis were integrated on top of existing facility services. These services also feature transparent access for user workflows to GPUs available at a facility via inference servers while using Kubernetes as enabling technology.
The IceCube Neutrino Observatory is a cubic kilometer neutrino telescope located at the geographic South Pole. To accurately and promptly reconstruct the arrival direction of candidate neutrino ...events for Multi-Messenger Astrophysics use cases, IceCube employs Skymap Scanner workflows managed by the SkyDriver service. The Skymap Scanner performs maximum-likelihood tests on individual pixels generated from the Hierarchical Equal Area isoLatitude Pixelation (HEALPix) algorithm. Each test is computationally independent, which allows for massive parallelization. This workload is distributed using the Event Workflow Management System (EWMS)—a message-based workflow management system designed to scale to trillions of pixels per day. SkyDriver orchestrates multiple distinct Skymap Scanner workflows behind a REST interface, providing an easy-to-use reconstruction service for real-time candidate, cataloged, and simulated events. Here, we outline the SkyDriver service technique and the initial development of EWMS.
Creating new materials, discovering new drugs, and simulating systems are essential processes for research and innovation and require substantial computational power. While many applications can be ...split into many smaller independent tasks, some cannot and may take hours or weeks to run to completion. To better manage those longer-running jobs, it would be desirable to stop them at any arbitrary point in time and later continue their computation on another compute resource; this is usually referred to as checkpointing. While some applications can manage checkpointing programmatically, it would be preferable if the batch scheduling system could do that independently. This paper evaluates the feasibility of using CRIU (Checkpoint Restore in Userspace), an open-source tool for the GNU/Linux environments, emphasizing the OSG’s OSPool HTCondor setup. CRIU allows checkpointing the process state into a disk image and can deal with both open files and established network connections seamlessly. Furthermore, it can checkpoint traditional Linux processes and containerized workloads. The functionality seems adequate for many scenarios supported in the OSPool. However, some limitations prevent it from being usable in all circumstances.
LHC data is constantly being moved between computing and storage sites to support analysis, processing, and simulation; this is done at a scale that is currently unique within the science community. ...For example, the CMS experiment on the LHC manages approximately 200PB of storage across 100 sites and, on a daily basis, moves 1PB between sites via GridFTP as primary protocol. This paper describes the performance results we have achieved by exploring alternatives to the GridFTP protocol for these data movements. In particular the HTTPS third party copy over Xrootd data servers as a possible replacement of GridFTP for LHC big data movements.
The processing needs for the High Luminosity (HL) upgrade for the LHC require the CMS collaboration to harness the computational power available on non-CMS resources, such as High-Performance ...Computing centers (HPCs). These sites often limit the external network connectivity of their computational nodes. In this paper we describe a strategy in which all network connections of CMS jobs inside a facility are routed to a single point of external network connectivity using a Virtual Private Network (VPN) server by creating virtual network interfaces in the computational nodes. We show that when the computational nodes and the host running the VPN server have the namespaces capability enabled, the setup can run entirely on user space with no other root permissions required. The VPN server host may be a privileged node inside the facility configured for outside network access, or an external service that the nodes are allowed to contact. When namespaces are not enabled at the client side, then the setup falls back to using a SOCKS server instead of virtual network interfaces. We demonstrate the strategy by executing CMS Monte Carlo production requests on opportunistic non-CMS resources at the University of Notre Dame. For these jobs, cvmfs support is tested via fusermount (cvmfsexec), and the native fuse module.
LHC experiments make extensive use of web proxy caches, especially for software distribution via the CernVM File System and for conditions data via the Frontier Distributed Database Caching system. ...Since many jobs read the same data, cache hit rates are high and hence most of the traffic flows efficiently over Local Area Networks. However, it is not always possible to have local web caches, particularly for opportunistic cases where
experiments have little control over site services. The Open High Throughput Computing (HTC) Content Delivery Network (CDN), openhtc.io, aims to address this by using web proxy caches from a commercial CDN provider. Cloudflare provides a simple interface for registering DNS aliases of any web server and does reverse proxy web caching on those aliases. The openhtc.io domain is hosted on Cloudflare's free tier CDN which has no bandwidth limit and makes use of data centers throughout the world, so the average performance for clients is much improved compared to reading from CERN or a Tier 1. The load on WLCG servers is also significantly reduced. WLCG Web Proxy Auto Discovery is used to select local web caches when they are available and otherwise select openhtc.io caching. This paper describes the Open HTC CDN in detail and provides initial results from its use for LHC@Home and USCMS opportunistic computing.
In this paper, we propose an application-aware intelligent load balancing system for high-throughput, distributed computing, and data-intensive science workflows. We leverage emerging deep learning ...techniques for time-series modeling to develop an application-aware predictive analytics system for accurately forecasting GridFTP connection loads. Our solution integrates with a major U.S. CMS Tier-2 site; we use a real dataset representing 670 million GridFTP transfer connections measured over 18 months to drive our predictive analytics solution. First, we perform extensive analysis on this dataset and use the connection loads as an example to study the temporal dependencies between various user-roles and workflow memberships. We use the analysis to motivate the design of a gated recurrent unit (GRU) based deep recurrent neural network (RNN) for modeling long-term temporal dependencies and predicting connection loads. We develop a novel application-aware, predictive and intelligent load balancer, APRIL, that effectively integrates application metadata and load forecast information to maximize server utilization. We conduct extensive experiments to evaluate the performance of our deep RNN predictive analytics system and compare it with other approaches such as ARIMA and multi-layer perceptron (MLP) predictors. The results show that our forecasting model, depending on the user-role, performs between 5.88%-92.6% better than the alternatives. We also demonstrate the effectiveness of APRIL by comparing it with the load balancing capabilities of an existing production Linux Virtual Server (LVS) cluster. Our approach improves server utilization, on an average, between 0.5 to 11 times, when compared with its LVS counterpart.