Dynfarm: A Dynamic Site Extension Ciaschini, V.; De Girolamo, D.
Journal of physics. Conference series,
10/2017, Letnik:
898, Številka:
8
Journal Article
Recenzirano
Odprti dostop
Requests for computing resources from LHC experiments are constantly mounting, and so are their peak usage. Since dimensioning a site to handle the peak usage times is impractical due to constraints ...on resources that many publicly-owned computing centres have, opportunistic usage of resources from external, even commercial, cloud providers is becoming more and more interesting, and is even the subject of upcoming initiative from the EU commission, named HelixNebula. While extra resources are always a good thing, to fully take advantage of them they must be integrated in the site's own infrastructure and made available to users as if they were local resources. At the CNAF INFN Tier-1 we have developed a framework, called dynfarm, capable of taking external resources and, placing minimal and easily satisfied requirements upon them, fully integrate them into a pre-existing infrastructure and treat them as if they were local, fully-owned resources. In this article we for the first time will a give a full, complete description of the framework's architecture along with all of its capabilities, to describe exactly what is possible with it and what are its requirements.
The WLCG infrastructure moved from a very rigid network topology, based on the MONARC model, to a more relaxed system, where data movement between regions or countries does not necessarily need to ...involve T1 centres. While this evolution brought obvious advantages, especially in terms of flexibility for the LHC experiment's data management systems, it also opened the question of how to monitor the increasing number of possible network paths, in order to provide a global reliable network service. The perfSONAR network monitoring system has been evaluated and agreed as a proper solution to cover the WLCG network monitoring use cases: it allows WLCG to plan and execute latency and bandwidth tests between any instrumented endpoint through a central scheduling configuration, it allows archiving of the metrics in a local database, it provides a programmatic and a web based interface exposing the tests results; it also provides a graphical interface for remote management operations. In this contribution we will present our activity to deploy a perfSONAR based network monitoring infrastructure, in the scope of the WLCG Operations Coordination initiative: we will motivate the main choices we agreed in terms of configuration and management, describe the additional tools we developed to complement the standard packages and present the status of the deployment, together with the possible future evolution.
The computing infrastructures serving the LHC experiments have been designed to cope at most with the average amount of data recorded. The usage peaks, as already observed in Run-I, may however ...originate large backlogs, thus delaying the completion of the data reconstruction and ultimately the data availability for physics analysis. In order to cope with the production peaks, the LHC experiments are exploring the opportunity to access Cloud resources provided by external partners or commercial providers. In this work we present the proof of concept of the elastic extension of a local analysis facility, specifically the Bologna Tier-3 Grid site, for the LHC experiments hosted at the site, on an external OpenStack infrastructure. We focus on the Cloud Bursting of the Grid site using DynFarm, a newly designed tool that allows the dynamic registration of new worker nodes to LSF. In this approach, the dynamically added worker nodes instantiated on an OpenStack infrastructure are transparently accessed by the LHC Grid tools and at the same time they serve as an extension of the farm for the local usage.
The monitoring and alert system is fundamental for the management and the operation of the network in a large data center such as an LHC Tier-1. The network of the INFN Tier-1 at CNAF is a ...multi-vendor environment: for its management and monitoring several tools have been adopted and different sensors have been developed. In this paper, after an overview on the different aspects to be monitored and the tools used for them (i.e. MRTG, Nagios, Arpwatch, NetFlow, Syslog, etc), we will describe the “NetBoard”, a monitoring toolkit developed at the INFN Tier-1. NetBoard, developed for a multi-vendor network, is able to install and auto-configure all tools needed for its monitoring, either via network devices discovery mechanism or via configuration file or via wizard. In this way, we are also able to activate different types of sensors and Nagios checks according to the equipment vendor specifications. Moreover, when a new device is connected in the LAN, NetBoard can detect where it is plugged. Finally the NetBoard web interface allows to have the overall status of the entire network “at a glance”, both the local and the geographical (including the LHCOPN and the LHCONE) link utilization, health status of network devices (with active alerts) and flow analysis.
Long-term preservation of experimental data (intended as both raw and derived formats) is one of the emerging requirements coming from scientific collaborations. Within the High Energy Physics ...community the Data Preservation in High Energy Physics (DPHEP) group coordinates this effort. CNAF is not only one of the Tier-1s for the LHC experiments, it is also a computing center providing computing and storage resources to many other HEP and non-HEP scientific collaborations, including the CDF experiment. After the end of data taking in 2011, CDF is now facing the challenge to both preserve the large amount of data produced during several years of data taking and to retain the ability to access and reuse it in the future. CNAF is heavily involved in the CDF Data Preservation activities, in collaboration with the Fermilab National Laboratory (FNAL) computing sector. At the moment about 4 PB of data (raw data and analysis-level ntuples) are starting to be copied from FNAL to the CNAF tape library and the framework to subsequently access the data is being set up. In parallel to the data access system, a data analysis framework is being developed which allows to run the complete CDF analysis chain in the long term future, from raw data reprocessing to analysis-level ntuple production. In this contribution we illustrate the technical solutions we put in place to address the issues encountered as we proceeded in this activity.
The Tier-1 at CNAF is the main INFN computing facility offering computing and storage resources to more than 30 different scientific collaborations including the 4 experiments at the LHC. It is also ...foreseen a huge increase in computing needs in the following years mainly driven by the experiments at the LHC (especially starting with the run 3 from 2021) but also by other upcoming experiments such as CTA1 While we are considering the upgrade of the infrastructure of our data center, we are also evaluating the possibility of using CPU resources available in other data centres or even leased from commercial cloud providers. Hence, at INFN Tier-1, besides participating to the EU project HNSciCloud, we have also pledged a small amount of computing resources (∼ 2000 cores) located at the Bari ReCaS2 for the WLCG experiments for 2016 and we are testing the use of resources provided by a commercial cloud provider. While the Bari ReCaS data center is directly connected to the GARR network3 with the obvious advantage of a low latency and high bandwidth connection, in the case of the commercial provider we rely only on the General Purpose Network. In this paper we describe the set-up phase and the first results of these installations started in the last quarter of 2015, focusing on the issues that we have had to cope with and discussing the measured results in terms of efficiency.
This paper describes the design and the current state of implementation of an infrastructure made available to software developers within the Italian National Institute for Nuclear Physics (INFN) to ...support and facilitate their daily activity. The infrastructure integrates several tools, each providing a well-identified function: project management, version control system, continuous integration, dynamic provisioning of virtual machines, efficiency improvement, knowledge base. When applicable, access to the services is based on the INFN-wide Authentication and Authorization Infrastructure. The system is being installed and progressively made available to INFN users belonging to tens of sites and laboratories and will represent a solid foundation for the software development efforts of the many experiments and projects that see the involvement of the Institute. The infrastructure will be beneficial especially for small- and medium-size collaborations, which often cannot afford the resources, in particular in terms of know-how, needed to set up such services.
The INFN Tier-1 Bortolotti, G; Cavalli, A; Chiarelli, L ...
Journal of physics. Conference series,
12/2012, Letnik:
396, Številka:
4
Journal Article
Recenzirano
Odprti dostop
INFN-CNAF is the central computing facility of INFN: it is the Italian Tier-1 for the experiments at LHC, but also one of the main Italian computing facilities for several other experiments such as ...BABAR, CDF, SuperB, Virgo, Argo, AMS, Pamela, MAGIC, Auger etc. Currently there is an installed CPU capacity of 100,000 HS06, a net disk capacity of 9 PB and an equivalent amount of tape storage (these figures are going to be increased in the first half of 2012 respectively to 125,000 HS06, 12 PB and 18 PB). More than 80,000 computing jobs are executed daily on the farm, managed by LSF, accessing the storage, managed by GPFS, with an aggregate bandwidth up to several GB/s. The access to the storage system from the farm is direct through the file protocol. The interconnection of the computing resources and the data storage is based on 10 Gbps technology. The disk-servers and the storage systems are connected through a Storage Area Network allowing a complete flexibility and easiness of management; dedicated disk-servers are connected, also via the SAN, to the tape library. The INFN Tier-1 is connected to the other centers via 3×10 Gbps links (to be upgraded at the end of 2012), including the LHCOPN and to the LHCONE. In this paper we show the main results of our center after 2 full years of run of LHC.
To examine barriers to initiation and continuation of mental health treatment among individuals with common mental disorders.
Data were from the World Health Organization (WHO) World Mental Health ...(WMH) surveys. Representative household samples were interviewed face to face in 24 countries. Reasons to initiate and continue treatment were examined in a subsample (n = 63,678) and analyzed at different levels of clinical severity.
Among those with a DSM-IV disorder in the past 12 months, low perceived need was the most common reason for not initiating treatment and more common among moderate and mild than severe cases. Women and younger people with disorders were more likely to recognize a need for treatment. A desire to handle the problem on one's own was the most common barrier among respondents with a disorder who perceived a need for treatment (63.8%). Attitudinal barriers were much more important than structural barriers to both initiating and continuing treatment. However, attitudinal barriers dominated for mild-moderate cases and structural barriers for severe cases. Perceived ineffectiveness of treatment was the most commonly reported reason for treatment drop-out (39.3%), followed by negative experiences with treatment providers (26.9% of respondents with severe disorders).
Low perceived need and attitudinal barriers are the major barriers to seeking and staying in treatment among individuals with common mental disorders worldwide. Apart from targeting structural barriers, mainly in countries with poor resources, increasing population mental health literacy is an important endeavor worldwide.
Performance, reliability and scalability in data-access are key issues in the context of the computing Grid and High Energy Physics data processing and analysis applications, in particular ...considering the large data size and I/O load that a Large Hadron Collider data centre has to support. In this paper we present the technical details and the results of a large scale validation and performance measurement employing different data-access platforms-namely CASTOR, dCache, GPFS and Scalla/Xrootd. The tests have been performed at the CNAF Tier-1, the central computing facility of the Italian National Institute for Nuclear Research (INFN). Our storage back-end was based on Fibre Channel disk-servers organized in a Storage Area Network, being the disk-servers connected to the computing farm via Gigabit LAN. We used 24 disk-servers, 260 TB of raw-disk space and 280 worker nodes as computing clients, able to run concurrently up to about 1100 jobs. The aim of the test was to perform sequential and random read/write accesses to the data, as well as more realistic access patterns, in order to evaluate efficiency, availability, robustness and performance of the various data-access solutions.