Dynfarm: A Dynamic Site Extension Ciaschini, V.; De Girolamo, D.
Journal of physics. Conference series,
10/2017, Letnik:
898, Številka:
8
Journal Article
Recenzirano
Odprti dostop
Requests for computing resources from LHC experiments are constantly mounting, and so are their peak usage. Since dimensioning a site to handle the peak usage times is impractical due to constraints ...on resources that many publicly-owned computing centres have, opportunistic usage of resources from external, even commercial, cloud providers is becoming more and more interesting, and is even the subject of upcoming initiative from the EU commission, named HelixNebula. While extra resources are always a good thing, to fully take advantage of them they must be integrated in the site's own infrastructure and made available to users as if they were local resources. At the CNAF INFN Tier-1 we have developed a framework, called dynfarm, capable of taking external resources and, placing minimal and easily satisfied requirements upon them, fully integrate them into a pre-existing infrastructure and treat them as if they were local, fully-owned resources. In this article we for the first time will a give a full, complete description of the framework's architecture along with all of its capabilities, to describe exactly what is possible with it and what are its requirements.
Grids are potentially composed of several thousands of users from different institutions sharing their computing resources (or using resources provided by third parties). Controlling access to these ...resources is a difficult problem, as it depends on the policies of the organizations the users belong to and of the resource owners. Moreover, a simple authorization implementation, based on a direct user registration on the resources, is not applicable to a large scale environment. In this paper, we describe the solution to this problem developed in the framework of the European DataGrid M. Draoli, G. Mascari, R. Piccinelli, Project Presentation, DataGrid-11-NOT-0103-_1 and DataTAG
http://www.datatag.org/ projects: the Virtual Organization Membership Service (VOMS) R. Alfieri, et al., Managing Dynamic User Communities in a Grid of Autonomous Resources, TUBT005, in: Proceedings of the CHEP 2003, 2003. VOMS allows a fine grained control of the use of the resources both to the users’ organizations and to the resource owners.
Applications from High Energy Physics scientific community are constantly growing and implemented by a large number of developers. This implies a strong churn on the code and an associated risk of ...faults, which is unavoidable as long as the software undergoes active evolution. However, the necessities of production systems run counter to this. Stability and predictability are of paramount importance; in addition, a short turn-around time for the defect discovery-correction-deployment cycle is required. A way to reconcile these opposite foci is to use a software quality model to obtain an approximation of the risk before releasing a program to only deliver software with a risk lower than an agreed threshold. In this article we evaluated two quality predictive models to identify the operational risk and the quality of some software products. We applied these models to the development history of several EMI packages with intent to discover the risk factor of each product and compare it with its real history. We attempted to determine if the models reasonably maps reality for the applications under evaluation, and finally we concluded suggesting directions for further studies.
The WNoDeS architecture, providing distributed, integrated access to both Cloud and Grid resources through virtualization technologies, makes use of an Authentication Gateway to support diverse ...authentication mechanisms. Three main use cases are foreseen, covering access via X.509 digital certificates, federated services like Shibboleth or Kerberos, and credit-based access. In this paper, we describe the structure of the WNoDeS authentication gateway.
After the successful LHC data taking in Run-I and in view of the future runs, the LHC experiments are facing new challenges in the design and operation of the computing facilities. The computing ...infrastructure for Run-II is dimensioned to cope at most with the average amount of data recorded. The usage peaks, as already observed in Run-I, may however originate large backlogs, thus delaying the completion of the data reconstruction and ultimately the data availability for physics analysis. In order to cope with the production peaks, CMS - along the lines followed by other LHC experiments - is exploring the opportunity to access Cloud resources provided by external partners or commercial providers. Specific use cases have already been explored and successfully exploited during Long Shutdown 1 (LS1) and the first part of Run 2. In this work we present the proof of concept of the elastic extension of a CMS site, specifically the Bologna Tier-3, on an external OpenStack infrastructure. We focus on the "Cloud Bursting" of a CMS Grid site using a newly designed LSF configuration that allows the dynamic registration of new worker nodes to LSF. In this approach, the dynamically added worker nodes instantiated on the OpenStack infrastructure are transparently accessed by the LHC Grid tools and at the same time they serve as an extension of the farm for the local usage. The amount of resources allocated thus can be elastically modeled to cope up with the needs of CMS experiment and local users. Moreover, a direct access integration of OpenStack resources to the CMS workload management system is explored. In this paper we present this approach, we report on the performances of the on-demand allocated resources, and we discuss the lessons learned and the next steps.
The Tier-1 at CNAF is the main INFN computing facility offering computing and storage resources to more than 30 different scientific collaborations including the 4 experiments at the LHC. It is also ...foreseen a huge increase in computing needs in the following years mainly driven by the experiments at the LHC (especially starting with the run 3 from 2021) but also by other upcoming experiments such as CTA1 While we are considering the upgrade of the infrastructure of our data center, we are also evaluating the possibility of using CPU resources available in other data centres or even leased from commercial cloud providers. Hence, at INFN Tier-1, besides participating to the EU project HNSciCloud, we have also pledged a small amount of computing resources (∼ 2000 cores) located at the Bari ReCaS2 for the WLCG experiments for 2016 and we are testing the use of resources provided by a commercial cloud provider. While the Bari ReCaS data center is directly connected to the GARR network3 with the obvious advantage of a low latency and high bandwidth connection, in the case of the commercial provider we rely only on the General Purpose Network. In this paper we describe the set-up phase and the first results of these installations started in the last quarter of 2015, focusing on the issues that we have had to cope with and discussing the measured results in terms of efficiency.
The computing infrastructures serving the LHC experiments have been designed to cope at most with the average amount of data recorded. The usage peaks, as already observed in Run-I, may however ...originate large backlogs, thus delaying the completion of the data reconstruction and ultimately the data availability for physics analysis. In order to cope with the production peaks, the LHC experiments are exploring the opportunity to access Cloud resources provided by external partners or commercial providers. In this work we present the proof of concept of the elastic extension of a local analysis facility, specifically the Bologna Tier-3 Grid site, for the LHC experiments hosted at the site, on an external OpenStack infrastructure. We focus on the Cloud Bursting of the Grid site using DynFarm, a newly designed tool that allows the dynamic registration of new worker nodes to LSF. In this approach, the dynamically added worker nodes instantiated on an OpenStack infrastructure are transparently accessed by the LHC Grid tools and at the same time they serve as an extension of the farm for the local usage.
The gLite workload management system Andreetto, P; Andreozzi, S; Avellino, G ...
Journal of physics. Conference series,
07/2008, Letnik:
119, Številka:
6
Journal Article
Recenzirano
Odprti dostop
The gLite Workload Management System (WMS) is a collection of components that provide the service responsible for distributing and managing tasks across computing and storage resources available on a ...Grid. The WMS basically receives requests of job execution from a client, finds the required appropriate resources, then dispatches and follows the jobs until completion, handling failure whenever possible. Other than single batch-like jobs, compound job types handled by the WMS are Directed Acyclic Graphs (a set of jobs where the input/output/execution of one of more jobs may depend on one or more other jobs), Parametric Jobs (multiple jobs with one parametrized description), and Collections (multiple jobs with a common description). Jobs are described via a flexible, high-level Job Definition Language (JDL). New functionality was recently added to the system (use of Service Discovery for obtaining new service endpoints to be contacted, automatic sandbox files archival/compression and sharing, support for bulk-submission and bulk-matchmaking). Intensive testing and troubleshooting allowed to dramatically increase both job submission rate and service stability. Future developments of the gLite WMS will be focused on reducing external software dependency, improving portability, robustness and usability.
The Grid community uses two well-established registration services, which allow users to be authenticated under the auspices of Virtual Organizations (VOs). The Virtual Organization Membership ...Service (VOMS), developed in the context of the Enabling Grid for E-sciencE (EGEE) project, is an Attribute Authority service that issues attributes expressing membership information of a subject within a VO. VOMS allows to partition users in groups, assign them roles and free-form attributes which are then used to drive authorization decisions. The VOMS administrative application, VOMS-Admin, manages and populates the VOMS database with membership information. The Virtual Organization Management Registration Service (VOMRS), developed at Fermilab, extends the basic registration and management functionalities present in VOMS-Admin. It implements a registration workflow that requires VO usage policy acceptance and membership approval by administrators. VOMRS supports management of multiple grid certificates, and handling users' request for group and role assignments, and membership status. VOMRS is capable of interfacing to local systems with personnel information (e.g. the CERN Human Resource Database) and of pulling relevant member information from them. VOMRS synchronizes the relevant subset of information with VOMS. The recent development of new features in VOMS-Admin raises the possibility of rationalizing the support and converging on a single solution by continuing and extending existing collaborations between EGEE and OSG. Such strategy is supported by WLCG, OSG, US CMS, US Atlas, and other stakeholders worldwide. In this paper, we will analyze features in use by major experiments and the use cases for registration addressed by the mature single solution.
The SuperB asymmetric energy e+e− collider and detector to be built at the newly founded Nicola Cabibbo Lab will provide a uniquely sensitive probe of New Physics in the flavour sector of the ...Standard Model. SuperB distributed computing group performed a detailed evaluation of DIRAC Distributed Infrastructure for use in the SuperB experiment based on the two use cases: End User Analysis and Monte Carlo Production. Test aims to evaluate DIRAC capabilities to manage both gLite and OSG sites, File Catalog management, job and data management features in SuperB realistic use cases.