Today's analyses for high-energy physics (HEP) experiments involve processing a large amount of data with highly specialized algorithms. The contemporary workflow from recorded data to final results ...is based on the execution of small scripts - often written in Python or ROOT macros which call complex compiled algorithms in the background - to perform fitting procedures and generate plots. During recent years interactive programming environments, such as Jupyter, became popular. Jupyter allows to develop Python-based applications, so-called notebooks, which bundle code, documentation and results, e.g. plots. Advantages over classical script-based approaches is the feature to recompute only parts of the analysis code, which allows for fast and iterative development, and a web-based user frontend, which can be hosted centrally and only requires a browser on the user side. In our novel approach, Python and Jupyter are tightly integrated into the Belle II Analysis Software Framework (basf2), currently being developed for the Belle II experiment in Japan. This allows to develop code in Jupyter notebooks for every aspect of the event simulation, reconstruction and analysis chain. These interactive notebooks can be hosted as a centralized web service via jupyterhub with docker and used by all scientists of the Belle II Collaboration. Because of its generality and encapsulation, the setup can easily be scaled to large installations.
The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) is a general-purpose particle detector and comprises the largest silicon-based tracking system built to date with 75 ...million individual readout channels. The precise reconstruction of particle tracks from this tremendous amount of input channels is a compute-intensive task. The foreseen LHC beam parameters for the next data taking period, starting in 2015, will result in an increase in the number of simultaneous proton-proton interactions and hence the number of particle tracks per event. Due to the stagnating clock frequencies of individual CPU cores, new approaches to particle track reconstruction need to be evaluated in order to cope with this computational challenge. Track finding methods that are based on cellular automata (CA) offer a fast and parallelizable alternative to the well-established Kalman filter-based algorithms. We present a new cellular automaton based track reconstruction, which copes with the complex detector geometry of CMS. We detail the specific design choices made to allow for a high-performance computation on GPU and CPU devices using the OpenCL framework. We conclude by evaluating the physics performance, as well as the computational properties of our implementation on various hardware platforms and show that a significant speedup can be attained by using GPU architectures while achieving a reasonable physics performance at the same time.
Software Quality Control at Belle II Ritter, M; Kuhr, T; Hauth, T ...
Journal of physics. Conference series,
10/2017, Letnik:
898, Številka:
7
Journal Article
Recenzirano
Odprti dostop
Over the last seven years the software stack of the next generation B factory experiment Belle II has grown to over one million lines of C++ and Python code, counting only the part included in ...offline software releases. There are several thousand commits to the central repository by about 100 individual developers per year. To keep a coherent software stack of high quality that it can be sustained and used efficiently for data acquisition, simulation, reconstruction, and analysis over the lifetime of the Belle II experiment is a challenge. A set of tools is employed to monitor the quality of the software and provide fast feedback to the developers. They are integrated in a machinery that is controlled by a buildbot master and automates the quality checks. The tools include different compilers, cppcheck, the clang static analyzer, valgrind memcheck, doxygen, a geometry overlap checker, a check for missing or extra library links, unit tests, steering file level tests, a sophisticated high-level validation suite, and an issue tracker. The technological development infrastructure is complemented by organizational means to coordinate the development.
Modern high-energy physics experiments rely on the extensive usage of computing resources, both for the reconstruction of measured events as well as for Monte-Carlo simulation. The Institut fur ...Experimentelle Kernphysik (EKP) at KIT is participating in both the CMS and Belle experiments with computing and storage resources. In the upcoming years, these requirements are expected to increase due to growing amount of recorded data and the rise in complexity of the simulated events. It is therefore essential to increase the available computing capabilities by tapping into all resource pools. At the EKP institute, powerful desktop machines are available to users. Due to the multi-core nature of modern CPUs, vast amounts of CPU time are not utilized by common desktop usage patterns. Other important providers of compute capabilities are classical HPC data centers at universities or national research centers. Due to the shared nature of these installations, the standardized software stack required by HEP applications cannot be installed. A viable way to overcome this constraint and offer a standardized software environment in a transparent manner is the usage of virtualization technologies. The OpenStack project has become a widely adopted solution to virtualize hardware and offer additional services like storage and virtual machine management. This contribution will report on the incorporation of the institute's desktop machines into a private OpenStack Cloud. The additional compute resources provisioned via the virtual machines have been used for Monte-Carlo simulation and data analysis. Furthermore, a concept to integrate shared, remote HPC centers into regular HEP job workflows will be presented. In this approach, local and remote resources are merged to form a uniform, virtual compute cluster with a single point-of-entry for the user. Evaluations of the performance and stability of this setup and operational experiences will be discussed.
This contribution reports on solutions, experiences and recent developments with the dynamic, on-demand provisioning of remote computing resources for analysis and simulation workflows. Local ...resources of a physics institute are extended by private and commercial cloud sites, ranging from the inclusion of desktop clusters over institute clusters to HPC centers. Rather than relying on dedicated HEP computing centers, it is nowadays more reasonable and flexible to utilize remote computing capacity via virtualization techniques or container concepts. We report on recent experience from incorporating a remote HPC center (NEMO Cluster, Freiburg University) and resources dynamically requested from the commercial provider 1&1 Internet SE into our intitute's computing infrastructure. The Freiburg HPC resources are requested via the standard batch system, allowing HPC and HEP applications to be executed simultaneously, such that regular batch jobs run side by side to virtual machines managed via OpenStack 1. For the inclusion of the 1&1 commercial resources, a Python API and SDK as well as the possibility to upload images were available. Large scale tests prove the capability to serve the scientific use case in the European 1&1 datacenters. The described environment at the Institute of Experimental Nuclear Physics (IEKP) at KIT serves the needs of researchers participating in the CMS and Belle II experiments. In total, resources exceeding half a million CPU hours have been provided by remote sites.
The Belle II Core Software Kuhr, T.; Pulvermacher, C.; Ritter, M. ...
Computing and software for big science,
12/2019, Letnik:
3, Številka:
1
Journal Article
Odprti dostop
Modern high-energy physics (HEP) enterprises, such as the Belle II experiment (Abe et al, Belle II Technical Design Report. KEK Report 2010-1,
2010
; Kou et al, The Belle II Physics book. KEK ...Preprint 2018-27,
2018
) at the KEK laboratory in Japan, create huge amounts of data. Sophisticated algorithms for simulation, reconstruction, visualization, and analysis are required to fully exploit the potential of these data. We describe the core components of the Belle II software that provide the foundation for the development of complex algorithms and their efficient application on large data sets.
The Belle II collaboration decided in 2016 to migrate its collaborative services and tools into the existing IT infrastructure at DESY. The goal was to reduce the maintenance effort for solutions ...operated by Belle II members as well as to deploy state-of-art technologies. In addition, some new services and tools were or will be introduced. Planning and migration work was carried out by small teams consisting of experts form Belle II and the involved IT divisions. The migration was successfully accomplished before the KEK computer centre replacement in August 2016.
With the second run period of the LHC, high energy physics collaborations will have to face increasing computing infrastructural needs. Opportunistic resources are expected to absorb many ...computationally expensive tasks, such as Monte Carlo event simulation. This leaves dedicated HEP infrastructure with an increased load of analysis tasks that in turn will need to process an increased volume of data. In addition to storage capacities, a key factor for future computing infrastructure is therefore input bandwidth available per core. Modern data analysis infrastructure relies on one of two paradigms: data is kept on dedicated storage and accessed via network or distributed over all compute nodes and accessed locally. Dedicated storage allows data volume to grow independently of processing capacities, whereas local access allows processing capacities to scale linearly. However, with the growing data volume and processing requirements, HEP will require both of these features. For enabling adequate user analyses in the future, the KIT CMS group is merging both paradigms: popular data is spread over a local disk layer on compute nodes, while any data is available from an arbitrarily sized background storage. This concept is implemented as a pool of distributed caches, which are loosely coordinated by a central service. A Tier 3 prototype cluster is currently being set up for performant user analyses of both local and remote data.
The processing of data acquired by the CMS detector at LHC is carried out with an object-oriented C++ software framework: CMSSW. With the increasing luminosity delivered by the LHC, the treatment of ...recorded data requires extraordinary large computing resources, also in terms of CPU usage. A possible solution to cope with this task is the exploitation of the features offered by the latest microprocessor architectures. Modern CPUs present several vector units, the capacity of which is growing steadily with the introduction of new processor generations. Moreover, an increasing number of cores per die is offered by the main vendors, even on consumer hardware. Most recent C++ compilers provide facilities to take advantage of such innovations, either by explicit statements in the programs sources or automatically adapting the generated machine instructions to the available hardware, without the need of modifying the existing code base. Programming techniques to implement reconstruction algorithms and optimised data structures are presented, that aim to scalable vectorization and parallelization of the calculations. One of their features is the usage of new language features of the C++11 standard. Portions of the CMSSW framework are illustrated which have been found to be especially profitable for the application of vectorization and multi-threading techniques. Specific utility components have been developed to help vectorization and parallelization. They can easily become part of a larger common library. To conclude, careful measurements are described, which show the execution speedups achieved via vectorised and multi-threaded code in the context of CMSSW.
Compute clusters use Portable Batch Systems (PBS) to distribute workload among individual cluster machines. To extend standard batch systems to Cloud infrastructures, a new service monitors the ...number of queued jobs and keeps track of the price of available resources. This meta-scheduler dynamically adapts the number of Cloud worker nodes according to the requirement profile. Two different worker node topologies are presented and tested on the Amazon EC2 Cloud service.