Conditions data is the subset of non-event data that is necessary to process event data. It poses a unique set of challenges, namely a heterogeneous structure and high access rates by distributed ...computing. The HSF Conditions Databases activity is a forum for cross-experiment discussions inviting as broad a participation as possible. It grew out of the HSF Community White Paper work to study conditions data access, where experts from ATLAS, Belle II, and CMS converged on a common language and proposed a schema that represents best practice. Following discussions with a broader community, including NP as well as HEP experiments, a core set of use cases, functionality and behaviour was defined with the aim to describe a core conditions database API. This paper will describe the reference implementation of both the conditions database service and the client which together encapsulate HSF best practice conditions data handling. Django was chosen for the service implementation, which uses an ORM instead of the direct use of SQL for all but one method. The simple relational database schema to organise conditions data is implemented in PostgreSQL. The task of storing conditions data payloads themselves is outsourced to any POSIX-compliant filesystem, allowing for transparent relocation and redundancy. Crucially this design provides a clear separation between retrieving the metadata describing which conditions data are needed for a data processing job, and retrieving the actual payloads from storage. The service deployment using Helm on OKD will be described together with scaling tests and operations experience from the sPHENIX experiment running more than 25k cores at BNL.
Conditions data infrastructure for both ATLAS and CMS have to deal with the management of several terabytes of data 1, 2. Distributed computing access to this data requires particular care and ...attention to manage request-rates of up to several tens of kHz. Thanks to the large overlap in use cases and requirements, ATLAS and CMS have worked towards a common solution for conditions data management with the aim of using this design for data-taking in Run 3. In the meantime other experiments, including NA62, have expressed an interest in this cross-experiment initiative. For experiments with a smaller payload volume and complexity, there is particular interest in having simple payload storage. The conditions data management model is implemented in a small set of relational database tables. A prototype access toolkit consisting of an intermediate web server has been implemented, using standard technologies available in the Java community. Access is provided through a set of REST services for which the API has been described in a generic way using standard Open API specifications, implemented in Swagger. Such a solution allows the automatic generation of client code and server stubs and further allows changes in the backend technology transparently. An important advantage of using a REST API for conditions access is the possibility of caching identical URLs, addressing one of the biggest challenges that large distributed computing solutions impose on conditions data access, avoiding direct DB access by means of standard web proxy solutions.
CHEP 2023: Preface to the Proceedings Sawatzky, Brad; Boehnlein, Amber; Heyes, Graham ...
EPJ Web of Conferences,
2024, Volume:
295
Journal Article, Conference Proceeding
Peer reviewed
Open access
The 26 th International Conference on Computing in High Energy and Nuclear Physics (CHEP), organized by Jefferson Lab, took place in Norfolk, Virginia from 5–11 May 2023. The conference attracted 581 ...registered participants from 28 different countries. There were scientific presentations made over the 5 days of the conference. These were divided between 20 long talks and 2 keynotes, which were presented in plenary sessions; 450+ short talks, which were presented in parallel sessions; and 140+ posters split over two dedicated sessions.
For many scientific projects, data management is an increasingly complicated challenge. The number of data-intensive instruments generating unprecedented volumes of data is growing and their ...accompanying workflows are becoming more complex. Their storage and computing resources are heterogeneous and are distributed at numerous geographical locations belonging to different administrative domains and organisations. These locations do not necessarily coincide with the places where data is produced nor where data is stored, analysed by researchers, or archived for safe long-term storage. To fulfil these needs, the data management system Rucio has been developed to allow the high-energy physics experiment ATLAS at LHC to manage its large volumes of data in an efficient and scalable way. But ATLAS is not alone, and several diverse scientific projects have started evaluating, adopting, and adapting the Rucio system for their own needs. As the Rucio community has grown, many improvements have been introduced, customisations have been added, and many bugs have been fixed. Additionally, new dataflows have been investigated and operational experiences have been documented. In this article we collect and compare the common successes, pitfalls, and oddities that arose in the evaluation efforts of multiple diverse experiments, and compare them with the ATLAS experience. This includes the high-energy physics experiments Belle II and CMS, the neutrino experiment DUNE, the scattering radar experiment EISCAT3D, the gravitational wave observatories LIGO and VIRGO, the SKA radio telescope, and the dark matter search experiment XENON.
Integration of Rucio Metadata in Belle II Serfon, Cédric; Panta, Anil; Ito, Hironori ...
EPJ Web of Conferences,
2024, Volume:
295
Journal Article, Conference Proceeding
Peer reviewed
Open access
Rucio is a Data Management software that has become a de-facto standard in the HEP community and beyond. It allows the management of large volumes of data over their full lifecycle. The Belle II ...experiment located at KEK (Japan) recently moved to Rucio to manage its data over the coming decade (O(10) PB/year). In addition to its Data Management functionalities, Rucio also provides support for storing generic metadata. Rucio metadata already provides accurate accounting of the data stored all over the sites serving Belle II. Annotating files with generic metadata opens up possibilities for finer-grained metadata query support. We will first introduce some of the new developments aimed at providing good performance that were done to cover Belle II use-cases like bulk insert methods, metadata inheritance, etc. We will then describe the various tests performed to validate Rucio generic metadata at Belle II scale (O(100M) files), detailing the import and performance tests that were made.
The discovery of gravitational waves, first observed in September 2015 following the merger of a binary black hole system, has already revolutionised our understanding of the Universe. This was ...further enhanced in August 2017, when the coalescence of a binary neutron star system was observed both with gravitational waves and a variety of electromagnetic counterparts; this joint observation marked the beginning of gravitational multimessenger astronomy. The Einstein Telescope, a proposed next-generation ground-based gravitational-wave observatory, will dramatically increase the sensitivity to sources: the number of observations of gravitational waves is expected to increase from roughly 100 per year to roughly 100’000 per year, and signals may be visible for hours at a time, given the low frequency cutoff of the planned instrument. This increase in the number of observed events, and the duration with which they are observed, is hugely beneficial to the scientific goals of the community but poses a number of significant computing challenges. Moreover, the currently used computing algorithms do not scale to this new environment, both in terms of the amount of resources required and the speed with which each signal must be characterised. This contribution will discuss the Einstein Telescope's computing challenges, and the activities that are underway to prepare for them. Available computing resources and technologies will greatly evolve in the years ahead, and those working to develop the Einstein Telescope data analysis algorithms will need to take this into account. It will also be important to factor into the initial development of the experiment's computing model the availability of huge parallel HPC systems and ubiquitous Cloud computing; the design of the model will also, for the first time, include the environmental impact as one of the optimisation metrics.
Distributed data management on Belle II Padolski, Siarhei; Ito, Hironori; Laycock, Paul ...
EPJ Web of Conferences,
2020, Volume:
245
Journal Article, Conference Proceeding
Peer reviewed
Open access
The Belle II experiment started taking physics data in April 2018 with an estimated total volume of all files including raw events, Monte-Carlo and skim statistics of 340 petabytes expected by the ...end of operations in the late-2020s. Originally designed as a fully integrated component of the BelleDIRAC production system, the Belle II distributed data management (DDM) software needs to manage data across about 29 storage elements worldwide for a collaboration of nearly 1000 physicists. By late 2018, this software required significant performance improvements to meet the requirements of physics data taking and was seriously lacking in automation. Rucio, the DDM solution created by ATLAS, was an obvious alternative but required tight integration with BelleDIRAC and a seamless yet non-trivial migration. This contribution describes the work done on both DDM options, the current status of the software running successfully in production and the problems associated with trying to balance long-term operations cost against short term risk.
DIRAC and Rucio are two standard pieces of software widely used in the HEP domain. DIRAC provides Workload and Data Management functionalities, among other things, while Rucio is a dedicated, ...advanced Distributed Data Management system. Many communities that already use DIRAC have expressed their interest in using DIRAC for Workload Management in combination with Rucio for Data Management. In this paper, we describe the integration of the Rucio File Catalog into DIRAC that was initially developed for the Belle II collaboration.
Integration of Rucio in Belle II Serfon, Cédric; Mashinistov, Ruslan; De Stefano, John Steven ...
EPJ Web of Conferences,
2021, Volume:
251
Journal Article, Conference Proceeding
Peer reviewed
Open access
The Belle II experiment, which started taking physics data in April 2019, will multiply the volume of data currently stored on its nearly 30 storage elements worldwide by one order of magnitude to ...reach about 340 PB of data (raw and Monte Carlo simulation data) by the end of operations. To tackle this massive increase and to manage the data even after the end of the data taking, it was decided to move the Distributed Data Management software from a homegrown piece of software to a widely used Data Management solution in HEP and beyond : Rucio. This contribution describes the work done to integrate Rucio with Belle II distributed computing infrastructure as well as the migration strategy that was successfully performed to ensure a smooth transition.
Over a decade ago, the H1 Collaboration decided to embrace the object-oriented paradigm and completely redesign its data analysis model and data storage format. The event data model, based on the ...ROOT framework, consists of three layers - tracks and calorimeter clusters, identified particles and finally event summary data - with a singleton class providing unified access. This original solution was then augmented with a fourth layer containing user-defined objects. This contribution will summarise the history of the solutions used, from modifications to the original design, to the evolution of the high-level end-user analysis object framework which is used by H1 today. Several important issues are addressed - the portability of expert knowledge to increase the efficiency of data analysis, the flexibility of the framework to incorporate new analyses, the performance and ease of use, and lessons learned for future projects.