The Open Computing Cluster for Advanced data Manipulation (OCCAM) is a multipurpose flexible HPC cluster designed and operated by a collaboration between the University of Torino and the Sezione di ...Torino of the Istituto Nazionale di Fisica Nucleare. It is aimed at providing a flexible, reconfigurable and extendable infrastructure to cater to a wide range of different scientific computing use cases, including ones from solid-state chemistry, high-energy physics, computer science, big data analytics, computational biology, genomics and many others. Furthermore, it will serve as a platform for R&D activities on computational technologies themselves, with topics ranging from GPU acceleration to Cloud Computing technologies. A heterogeneous and reconfigurable system like this poses a number of challenges related to the frequency at which heterogeneous hardware resources might change their availability and shareability status, which in turn affect methods and means to allocate, manage, optimize, bill, monitor VMs, containers, virtual farms, jobs, interactive bare-metal sessions, etc. This work describes some of the use cases that prompted the design and construction of the HPC cluster, its architecture and resource provisioning model, along with a first characterization of its performance by some synthetic benchmark tools and a few realistic use-case tests.
The private Cloud at the Torino INFN computing centre offers IaaS services to different scientific computing applications. The infrastructure is managed with the OpenNebula cloud controller. The main ...stakeholders of the facility are a grid Tier-2 site for the ALICE collaboration at LHC, an interactive analysis facility for the same experiment and a grid Tier-2 site for the BES-III collaboration, plus an increasing number of other small tenants. Besides keeping track of the usage, the automation of dynamic allocation of resources to tenants requires detailed monitoring and accounting of the resource usage. As a first investigation towards this, we set up a monitoring system to inspect the site activities both in terms of IaaS and applications running on the hosted virtual instances. For this purpose we used the Elasticsearch, Logstash and Kibana stack. In the current implementation, the heterogeneous accounting information is fed to different MySQL databases and sent to Elasticsearch via a custom Logstash plugin. For the IaaS metering, we developed sensors for the OpenNebula API. The IaaS level information gathered through the API is sent to the MySQL database through an ad-hoc developed RESTful web service, which is also used for other accounting purposes. Concerning the application level, we used the Root plugin TProofMonSenderSQL to collect accounting data from the interactive analysis facility. The BES-III virtual instances used to be monitored with Zabbix, as a proof of concept we also retrieve the information contained in the Zabbix database. Each of these three cases is indexed separately in Elasticsearch. We are now starting to consider dismissing the intermediate level provided by the SQL database and evaluating a NoSQL option as a unique central database for all the monitoring information. We setup a set of Kibana dashboards with pre-defined queries in order to monitor the relevant information in each case. In this way we have achieved a uniform monitoring interface for both the IaaS and the scientific applications, mostly leveraging off-the-shelf tools.
A dashboard devoted to the computing in the Italian sites for the ALICE experiment at the LHC has been deployed. A combination of different complementary monitoring tools is typically used in most of ...the Tier-2 sites: this makes somewhat difficult to figure out at a glance the status of the site and to compare information extracted from different sources for debugging purposes. To overcome these limitations a dedicated ALICE dashboard has been designed and implemented in each of the ALICE Tier-2 sites in Italy: in particular, it provides a single, interactive and easily customizable graphical interface where heterogeneous data are presented. The dashboard is based on two main ingredients: an open source time-series database and a dashboard builder tool for visualizing time-series metrics. Various sensors, able to collect data from the multiple data sources, have been also written. A first version of a national computing dashboard has been implemented using a specific instance of the builder to gather data from all the local databases.
Micro Pattern Gas Detectors (MPGD) are the new frontier in gas trackers. Among this kind of devices, the Gas Electron Multiplier (GEM) chambers are widely used. The experimental signals acquired with ...the detector must obviously be reconstructed and analysed. In this contribution, a new offline software to perform reconstruction, alignment and analysis on the data collected with APV-25 and TIGER ASICs will be presented. GRAAL (Gem Reconstruction And Analysis Library) is able to measure the performance of a MPGD detector with a strip segmented anode (presently). The code is divided in three parts: reconstruction, where the hits are digitized and clusterized; tracking, where a procedure fits the points from the tracking system and uses that information to align the chamber with rotations and shifts; analysis, where the performance is evaluated (e.g. efficiency, spatial resolution,etc.). The user must set the geometry of the setup and then the program returns automatically the analysis results, taking care of different conditions of gas mixture, electric field, magnetic field, geometries, strip orientation, dead strip, misalignment and many others.
Elastic cloud computing applications, i.e. applications that automatically scale according to computing needs, work on the ideal assumption of infinite resources. While large public cloud ...infrastructures may be a reasonable approximation of this condition, scientific computing centres like WLCG Grid sites usually work in a saturated regime, in which applications compete for scarce resources through queues, priorities and scheduling policies, and keeping a fraction of the computing cores idle to allow for headroom is usually not an option. In our particular environment one of the applications (a WLCG Tier-2 Grid site) is much larger than all the others and cannot autoscale easily. Nevertheless, other smaller applications can benefit of automatic elasticity; the implementation of this property in our infrastructure, based on the OpenNebula cloud stack, will be described and the very first operational experiences with a small number of strategies for timely allocation and release of resources will be discussed.
The experiment BESIII, running at the accelerator BEPCII in Beijing (P.R.C.), is going to be updated with the replacement of the Inner Drift Chamber with a Cylindrical triple-GEM Inner Tracker ...(CGEM-IT). In the R&D stage, two standalone C++ codes were implemented: GTS (Garfield-based Triple-GEM Simulator), for digitization and tuning of simulated data to the experimental ones, and GRAAL (GEM Reconstruction And Analysis Library), for the reconstruction and analysis of the experimental events collected in testbeams. GTS simulates the triple-GEM response to the particle passage, treating each stage separately: ionization, GEM properties, gas mixture, magnetic field and finally the induction of the signal on the anode. The necessary information was extracted by GARFIELD++ simulations, parametrized and used as input in GTS. This speeds up the simulation, since GTS performs only samplings instead of the full digitization chain. The simulated events were reconstructed with the same procedure used for experimental data and tuning factors were evaluated to obtain a satisfactory match. GRAAL is used in the analysis of the testbeam experimental data. It provides several levels of reconstruction: from the cluster formation, gathering contiguous firing strips, to the spatial position and the signal time reconstruciton. Two algorithms are used: the charge centroid and the micro-TPC, which exploit the charge deposition on the strips and the time information. Also a merging of the two algorithms is available to efficiently weight the two outcomes and obtain the best estimate of the spatial coordinate. Moreover, GRAAL performs tracking and alignment. Both codes are going to be made available also for other MPGDs simulation and reconstruction.
An auto-installing tool on an usb drive can allow for a quick and easy automatic deployment of OpenNebula-based cloud infrastructures remotely managed by a central VMDIRAC instance. A single team, in ...the main site of an HEP Collaboration or elsewhere, can manage and run a relatively large network of federated (micro-)cloud infrastructures, making an highly dynamic and elastic use of computing resources. Exploiting such an approach can lead to modular systems of cloud-bursting infrastructures addressing complex real-life scenarios.