Sheep pox virus is one of the most serious diseases of livestock. The virus causes damage to integuments, spreads rapidly within the herd, and often leads to serious economic losses. The set of ...sanitary and epidemiological measures during outbreaks of the infection largely depends on identification of the virus strain and unambiguous differentiation of the infectious strain from vaccine ones. The most modern and precise approach to identifying virus strains is the whole metagenome sequencing of samples followed by assembly and analysis of the virome of the sample. We evaluate the applicability of shotgun metagenomic sequencing to accurately confirm the species of the virus and to identify it to the level of the strain using de novo assembly of the virome. It is shown that this approach allows one not only to accurately determine the strain affiliation of an infectious agent, but also to reveal and identify possible coinfections of both viral and bacterial natures.
Rapid increase of data volume from the experiments running at the Large Hadron Collider (LHC) prompted physics computing community to evaluate new data handling and processing solutions. Russian grid ...sites and universities' clusters scattered over a large area aim at the task of uniting their resources for future productive work, at the same time giving an opportunity to support large physics collaborations. In our project we address the fundamental problem of designing a computing architecture to integrate distributed storage resources for LHC experiments and other data-intensive science applications and to provide access to data from heterogeneous computing facilities. Studies include development and implementation of federated data storage prototype for Worldwide LHC Computing Grid (WLCG) centres of different levels and University clusters within one National Cloud. The prototype is based on computing resources located in Moscow, Dubna, Saint Petersburg, Gatchina and Geneva. This project intends to implement a federated distributed storage for all kind of operations such as read/write/transfer and access via WAN from Grid centres, university clusters, supercomputers, academic and commercial clouds. The efficiency and performance of the system are demonstrated using synthetic and experiment-specific tests including real data processing and analysis workflows from ATLAS and ALICE experiments, as well as compute-intensive bioinformatics applications (PALEOMIX) running on supercomputers. We present topology and architecture of the designed system, report performance and statistics for different access patterns and show how federated data storage can be used efficiently by physicists and biologists. We also describe how sharing data on a widely distributed storage system can lead to a new computing model and reformations of computing style, for instance how bioinformatics program running on supercomputers can read/write data from the federated storage.
Large-scale scientific experiments produce vast volumes of data. These data are stored, processed and analyzed in a distributed computing environment. The life cycle of experiment is managed by ...specialized software like Distributed Data Management and Workload Management Systems. In order to be interpreted and mined, experimental data must be accompanied by auxiliary metadata, which are recorded at each data processing step. Metadata describes scientific data and represent scientific objects or results of scientific experiments, allowing them to be shared by various applications, to be recorded in databases or published via Web. Processing and analysis of constantly growing volume of auxiliary metadata is a challenging task, not simpler than the management and processing of experimental data itself. Furthermore, metadata sources are often loosely coupled and potentially may lead to an end-user inconsistency in combined information queries. To aggregate and synthesize a range of primary metadata sources, and enhance them with flexible schema-less addition of aggregated data, we are developing the Data Knowledge Base architecture serving as the intelligence behind GUIs and APIs.
In recent years the concepts of Big Data became well established in IT. Systems managing large data volumes produce metadata that describe data and workflows. These metadata are used to obtain ...information about current system state and for statistical and trend analysis of the processes these systems drive. Over the time the amount of the stored metadata can grow dramatically. In this article we present our studies to demonstrate how metadata storage scalability and performance can be improved by using hybrid RDBMS NoSQL architecture.
The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. ATLAS, one of the largest collaborations ...ever assembled in the the history of science, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. To manage the workflow for all data processing on hundreds of data centers the PanDA (Production and Distributed Analysis)Workload Management System is used. An ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF), is realizing within BigPanDA and megaPanDA projects. These projects are now exploring how PanDA might be used for managing computing jobs that run on supercomputers including OLCF’s Titan and NRC-KI HPC2. The main idea is to reuse, as much as possible, existing components of the PanDA system that are already deployed on the LHC Grid for analysis of physics data. The next generation of PanDA will allow many data-intensive sciences employing a variety of computing platforms to benefit from ATLAS experience and proven tools in highly scalable processing.
The LHC experiments are preparing for the precision measurements and further discoveries that will be made possible by higher LHC energies from April 2015 (LHC Run2). The need for simulation, data ...processing and analysis would overwhelm the expected capacity of grid infrastructure computing facilities deployed by the Worldwide LHC Computing Grid (WLCG). To meet this challenge the integration of the opportunistic resources into LHC computing model is highly important. The Tier-1 facility at Kurchatov Institute (NRC-KI) in Moscow is a part of WLCG and it will process, simulate and store up to 10% of total data obtained from ALICE, ATLAS and LHCb experiments. In addition Kurchatov Institute has supercomputers with peak performance 0.12 PFLOPS. The delegation of even a fraction of supercomputing resources to the LHC Computing will notably increase total capacity. In 2014 the development a portal combining a Tier-1 and a supercomputer in Kurchatov Institute was started to provide common interfaces and storage. The portal will be used not only for HENP experiments, but also by other data- and compute-intensive sciences like biology with genome sequencing analysis; astrophysics with cosmic rays analysis, antimatter and dark matter search, etc.
One of the most important studies dedicated to be solved for ATLAS physical analysis is a reconstruction of proton-proton events with large number of interactions in Transition Radiation Tracker. ...Paper includes Transition Radiation Tracker performance results obtained with the usage of the ATLAS GRID and Kurchatov Institute’s Data Processing Center including Tier-1 grid site and supercomputer as well as analysis of CPU efficiency during these studies.
The review of the distributed grid computing infrastructure for LHC experiments in Russia is given. The emphasis is placed on the Tier-1 site construction at the National Research Centre "Kurchatov ...Institute" (Moscow) and the Joint Institute for Nuclear Research (Dubna).
This Letter presents the first experimental observation of the attractive strong interaction between a proton and a multistrange baryon (hyperon) Ξ−. The result is extracted from two-particle ...correlations of combined p−Ξ−⊕p¯−Ξ¯+ pairs measured in p−Pb collisions at sNN=5.02 TeV at the LHC with ALICE. The measured correlation function is compared with the prediction obtained assuming only an attractive Coulomb interaction and a standard deviation in the range 3.6, 5.3 is found. Since the measured p−Ξ−⊕p¯−Ξ¯+ correlation is significantly enhanced with respect to the Coulomb prediction, the presence of an additional, strong, attractive interaction is evident. The data are compatible with recent lattice calculations by the HAL-QCD Collaboration, with a standard deviation in the range 1.8, 3.7. The lattice potential predicts a shallow repulsive Ξ− interaction within pure neutron matter and this implies stiffer equations of state for neutron-rich matter including hyperons. Implications of the strong interaction for the modeling of neutron stars are discussed.
Direct photon production at mid-rapidity in Pb–Pb collisions at sNN=2.76 TeV was studied in the transverse momentum range 0.9<pT<14 GeV/c. Photons were detected with the highly segmented ...electromagnetic calorimeter PHOS and via conversions in the ALICE detector material with the e+e− pair reconstructed in the central tracking system. The results of the two methods were combined and direct photon spectra were measured for the 0–20%, 20–40%, and 40–80% centrality classes. For all three classes, agreement was found with perturbative QCD calculations for pT≳5 GeV/c. Direct photon spectra down to pT≈1 GeV/c could be extracted for the 20–40% and 0–20% centrality classes. The significance of the direct photon signal for 0.9<pT<2.1 GeV/c is 2.6σ for the 0–20% class. The spectrum in this pT range and centrality class can be described by an exponential with an inverse slope parameter of (297±12stat±41syst) MeV. State-of-the-art models for photon production in heavy-ion collisions agree with the data within uncertainties.