LHCb is planning major changes for its data processing and analysis workflows for LHC Run 3. Removing the hardware trigger, a software only trigger at 30 MHz will reconstruct events using final ...alignment and calibration information provided during the triggering phase. These changes pose a major strain on the online software framework which needs to improve significantly. The foreseen changes in the area of the core framework include a re-design of the event scheduling, introduction of concurrent processing, optimizations in processor cache accesses and code vectorization. Furthermore changes in the areas of event model, conditions data and detector description are foreseen. The changes in the data processing workflow will allow an unprecedented amount of signal events to be selected and therefore increase the load on the experiments simulation needs. Several areas of improvement for fast simulation are currently being investigated together with improvements needed in the area of distributed computing. Finally the amount of data stored needs to be reflected in the analysis computing model where individual user analysis on distributed computing resources will become inefficient. This contribution will give an overview of the status of those activities and future plans in the different areas from the perspective of the LHCb computing project.
The LHCb detector will be upgraded for the LHC Run 3 and will be readout at 30 MHz, corresponding to the full inelastic collision rate, with major implications on the full software trigger and ...offline computing. If the current computing model and software framework are kept, the data storage capacity and computing power required to process data at this rate, and to generate and reconstruct equivalent samples of simulated events, will exceed the current capacity by at least one order of magnitude. A redesign of the software framework, including scheduling, the event model, the detector description and the conditions database, is needed to fully exploit the computing power of multi-, many-core architectures, and coprocessors. Data processing and the analysis model will also change towards an early streaming of different data types, in order to limit storage resources, with further implications for the data analysis workflows. Fast simulation options will allow to obtain a reasonable parameterization of the detector response in considerably less computing time. Finally, the upgrade of LHCb will be a good opportunity to review and implement changes in the domains of software design, test and review, and analysis workflow and preservation. In this contribution, activities and recent results in all the above areas are presented.
LHC experiments are depending on a rich palette of software components to build their specific applications. These underlying software components include the ROOT analysis framework, the Geant4 ...simulation toolkit, monte carlo generators, grid middle-ware, graphics libraries, scripting languages, databases, tools, etc. which are provided centrally in up to date versions on multiple platforms (Linux, Mac, Windows). Until recently this set of packages has been tested and released in a tree like structure as a consistent set of versions across operating systems, architectures and compilers for LHC experiments only. Because of the tree like deployment these releases were only usable in connection with a configuration management tool which provided the proper build and run-time environments and was hindering other parties outside LHC from easily using this palette of packages. In a new approach the releases will be grouped in "flat structure" such that interested parties can start using it without configuration management, retaining all the above mentioned advantages. In addition to an increased usability the software shall also be distributed via system provided package deployment systems (rpm, apt, etc.). The approach of software deployment is following the ideas of providing a wide range of HEP specific software packages and tools in a coherent, up to date and modular way on multiple platforms. The target audience for such software deployments are individual developers or smaller development groups / experiments who don't have the resources to maintain this kind of infrastructure. This new software deployment strategy has already been successfully implemented for groups at CERN.
The LHCb Grid Simulation: Proof of Concept Hushchyn, M; Ustyuzhanin, A; Arzymatov, K ...
Journal of physics. Conference series,
10/2017, Letnik:
898, Številka:
5
Journal Article
Recenzirano
Odprti dostop
The Worldwide LHC Computing Grid provides access to data and computational resources to analyze it for researchers with different geographical locations. The grid has a hierarchical topology with ...multiple sites distributed over the world with varying number of CPUs, amount of disk storage and connection bandwidth. Job scheduling and data distribution strategy are key elements of grid performance. Optimization of algorithms for those tasks requires their testing on real grid which is hard to achieve. Having a grid simulator might simplify this task and therefore lead to more optimal scheduling and data placement algorithms. In this paper we demonstrate a grid simulator for the LHCb distributed computing software.
The machine/job features mechanism Alef, M; Cass, T; Keijser, J J ...
Journal of physics. Conference series,
10/2017, Letnik:
898, Številka:
9
Journal Article
Recenzirano
Odprti dostop
Within the HEPiX virtualization group and the Worldwide LHC Computing Grid's Machine/Job Features Task Force, a mechanism has been developed which provides access to detailed information about the ...current host and the current job to the job itself. This allows user payloads to access meta information, independent of the current batch system or virtual machine model. The information can be accessed either locally via the filesystem on a worker node, or remotely via HTTP(S) from a webserver. This paper describes the final version of the specification from 2016 which was published as an HEP Software Foundation technical note, and the design of the implementations of this version for batch and virtual machine platforms. We discuss early experiences with these implementations and how they can be exploited by experiment frameworks.
The Worldwide LHC Computing Grid relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any ...network issues, including connection failures, congestion, traffic routing, etc. The WLCG Network and Transfer Metrics project aims to integrate and combine all network-related monitoring data collected by the WLCG infrastructure. This includes FTS monitoring information, monitoring data from the XRootD federation, as well as results of the perfSONAR tests. The main challenge consists of further integrating and analyzing this information in order to allow the optimizing of data transfers and workload management systems of the LHC experiments. In this contribution, we present our activity in commissioning WLCG perfSONAR network and integrating network and transfer metrics: We motivate the need for the network performance monitoring, describe the main use cases of the LHC experiments as well as status and evolution in the areas of configuration and capacity management, datastore and analytics, including integration of transfer and network metrics and operations and support.
The LHCb experiment 1 has taken data between December 2009 and February 2013. The data taking conditions and trigger rate were adjusted several times during this period to make optimal use of the ...luminosity delivered by the LHC and to extend the physics potential of the experiment. By 2012, LHCb was taking data at twice the instantaneous luminosity and 2.5 times the high level trigger rate than originally foreseen. This represents a considerable increase in the amount of data which had to be handled compared to the original Computing Model from 2005, both in terms of compute power and in terms of storage. In this paper we describe the changes that have taken place in the LHCb computing model during the last 2 years of data taking to process and analyse the increased data rates within limited computing resources. In particular a quite original change was introduced at the end of 2011 when LHCb started to use for reprocessing compute power that was not co-located with the RAW data, namely using Tier2 sites and private resources. The flexibility of the LHCbDirac Grid interware allowed easy inclusion of these additional resources that in 2012 provided 45% of the compute power for the end-of-year reprocessing. Several changes were also implemented in the Data Management model in order to limit the need for accessing data from tape, as well as in the data placement policy in order to cope with a large imbalance in storage resources at Tier1 sites. We also discuss changes that are being implemented during the LHC Long Shutdown 1 (LS1) to prepare for a further doubling of the data rate when the LHC restarts at a higher energy in 2015.
In the Grid world, there are many tools for monitoring both activities and infrastructure. The huge amount of information available needs to be well organized, especially considering the pressing ...need for prompt reaction in case of problems impacting the activities of a large Virtual Organization. Such activities include data taking, data reconstruction, data reprocessing and user analysis. The monitoring system for the LHCb Grid Computing relies on many heterogeneous and independent sources of information. These offers different views for a better understanding of problems, while an operations team follow defined procedures that have been put in place to handle them. This work summarizes the state-of-the-art of LHCb Grid operations, emphasizing the reasons that brought to various choices, and what are the tools currently in use to run our daily activities. We highlight the most common problems experienced across years of activities on the WLCG infrastructure, the services with their criticality, the procedures in place, the relevant metrics, the tools available and the ones still missing.
FTS3: Quantitative Monitoring Riahi, H; Salichos, M; Keeble, O ...
Journal of physics. Conference series,
01/2015, Letnik:
664, Številka:
6
Journal Article
Recenzirano
Odprti dostop
The overall success of LHC data processing depends heavily on stable, reliable and fast data distribution. The Worldwide LHC Computing Grid (WLCG) relies on the File Transfer Service (FTS) as the ...data movement middleware for moving sets of files from one site to another. This paper describes the components of FTS3 monitoring infrastructure and how they are built to satisfy the common and particular requirements of the LHC experiments. We show how the system provides a complete and detailed cross-virtual organization (VO) picture of transfers for sites, operators and VOs. This information has proven critical due to the shared nature of the infrastructure, allowing a complete view of all transfers on shared network links between various workflows and VOs using the same FTS transfer manager. We also report on the performance of the FTS service itself, using data generated by the aforementioned monitoring infrastructure both during the commissioning and the first phase of production. We also explain how this monitoring information and network metrics produced can be used both as a starting point for troubleshooting data transfer issues, but also as a mechanism to collect information such as transfer efficiency between sites, achieved throughput and its evolution over time, most common errors, etc, and take decision upon them to further optimize transfer workflows. The service setup is subject to sites policies to control the network resource usage, as well as all the VOs making use of the Grid resources at the site to satisfy their requirements. FTS3 is the new version of FTS and has been deployed in production in August 2014.
Self managing experiment resources Stagni, F; Ubeda, M; Tsaregorodtsev, A ...
Journal of physics. Conference series,
01/2014, Letnik:
513, Številka:
3
Journal Article
Recenzirano
Odprti dostop
Within this paper we present an autonomic Computing resources management system, used by LHCb for assessing the status of their Grid resources. Virtual Organizations Grids include heterogeneous ...resources. For example, LHC experiments very often use resources not provided by WLCG, and Cloud Computing resources will soon provide a non-negligible fraction of their computing power. The lack of standards and procedures across experiments and sites generated the appearance of multiple information systems, monitoring tools, ticket portals, etc... which nowadays coexist and represent a very precious source of information for running HEP experiments Computing systems as well as sites. These two facts lead to many particular solutions for a general problem: managing the experiment resources. In this paper we present how LHCb, via the DIRAC interware, addressed such issues. With a renewed Central Information Schema hosting all resources metadata and a Status System (Resource Status System) delivering real time information, the system controls the resources topology, independently of the resource types. The Resource Status System applies data mining techniques against all possible information sources available and assesses the status changes, that are then propagated to the topology description. Obviously, giving full control to such an automated system is not risk-free. Therefore, in order to minimise the probability of misbehavior, a battery of tests has been developed in order to certify the correctness of its assessments. We will demonstrate the performance and efficiency of such a system in terms of cost reduction and reliability.