The basic premise of pilot systems is to create an overlay scheduling system on top of leased resources. And by definition, leases have a limited lifetime, so any job that is scheduled on such ...resources must finish before the lease is over, or it will be killed and all the computation is wasted. In order to effectively schedule jobs to resources, the pilot system thus requires the expected runtime of the users' jobs. Past studies have shown that relying on user provided estimates is not a valid strategy, so the system should try to make an estimate by itself. This paper provides a study of the historical data obtained from the Compact Muon Solenoid (CMS) experiment's Analysis Operations submission system. Clear patterns are observed, suggesting that making prediction of an expected job lifetime range is achievable with high confidence level in this environment.
•Intel Omni-Path outperforms Cray Aries interconnect.•8-year-old Cray Gemini interconnect is the poorest performer.•Nonlinear and collisional kernels compute bound on CPU and memory bound on GPU ...systems.•Streaming and field kernels are compute bound.•Most expensive kernel (nonlinear) well-optimized for both CPU and GPU.
We describe the mathematical formulation, outline the numerical discretization, and present performance analysis results for the CGYRO plasma turbulence code. The performance data was collected on 5 current leadership systems (2 KNL-based, 2 hybrid CPU-GPU and 1 Skylake-based). CGYRO is a relatively new gyrokinetic turbulence code, based on the well-known GYRO code, but redesigned from the ground up to operate efficiently on multicore and GPU-accelerated systems. The gyrokinetic equations specify a 5-dimensional distribution function for each species, with species coupled through both the Maxwell equations and collision operator. For the cross-machine performance analysis, we report and compare timings for 8 separate computational kernels. This kernel-based breakdown illustrates the strengths and weaknesses of the floating-point and communication architectures of the respective systems. We conclude with a preview of new multiscale turbulence results that are shown to accurately recover experimentally-observed electron turbulence levels in an ITER-baseline plasma regime that cannot be described using traditional long-wavelength simulation.
The HTCondor based glideinWMS has become the product of choice for exploiting Grid resources for many communities. Unfortunately, its default operational model expects users to log into a machine ...running a HTCondor schedd before being able to submit their jobs. Many users would instead prefer to use their local workstation for everything. A product that addresses this problem is RCondor, a module delivered with the HTCondor package. RCondor provides command line tools that simulate the behavior of a local HTCondor installation, while using ssh under the hood to execute commands on the remote node instead. RCondor also interfaces with sshfs, virtualizing access to remote files, thus giving the user the impression of a truly local HTCondor installation. This paper presents a detailed description of RCondor, as well as comparing it to the other methods currently available for accessing remote HTCondor schedds.
The machine/job features mechanism Alef, M; Cass, T; Keijser, J J ...
Journal of physics. Conference series,
10/2017, Volume:
898, Issue:
9
Journal Article
Peer reviewed
Open access
Within the HEPiX virtualization group and the Worldwide LHC Computing Grid's Machine/Job Features Task Force, a mechanism has been developed which provides access to detailed information about the ...current host and the current job to the job itself. This allows user payloads to access meta information, independent of the current batch system or virtual machine model. The information can be accessed either locally via the filesystem on a worker node, or remotely via HTTP(S) from a webserver. This paper describes the final version of the specification from 2016 which was published as an HEP Software Foundation technical note, and the design of the implementations of this version for batch and virtual machine platforms. We discuss early experiences with these implementations and how they can be exploited by experiment frameworks.
Bosco is a software project developed by the Open Science Grid to help scientists better utilize their on-campus computing resources. Instead of submitting jobs through a dedicated gatekeeper, as ...most remote submission mechanisms use, it uses the built-in SSH protocol to gain access to the cluster. By using a common access method, SSH, we are able to simplify the interaction with the cluster, making the submission process more user friendly. Additionally, it does not add any extra software to be installed on the cluster making Bosco an attractive option for the cluster administrator. In this paper, we will describe Bosco, the personal supercomputing assistant, and how Bosco is used by researchers across the U.S. to manage their computing workflows. In addition, we will also talk about how researchers are using it, including an unique use of Bosco to submit CMS reconstruction jobs to an opportunistic XSEDE resource.
Databases are used in many software components of HEP computing, from monitoring and job scheduling to data storage and processing. It is not always clear at the beginning of a project if a problem ...can be handled by a single server, or if one needs to plan for a multi-server solution. Before a scalable solution is adopted, it helps to know how well it performs in a single server case to avoid situations when a multi-server solution is adopted mostly due to sub-optimal performance per node. This paper presents comparison benchmarks of popular open source database management systems. As a test application we use a user job monitoring system based on the Glidein workflow management system used in the CMS Collaboration.
Following the success of the XRootd-based US CMS data federation, the AAA project investigated extensions of the federation architecture by developing two sample implementations of an XRootd, ...disk-based, caching proxy. The first one simply starts fetching a whole file as soon as a file open request is received and is suitable when completely random file access is expected or it is already known that a whole file be read. The second implementation supports on-demand downloading of partial files. Extensions to the Hadoop Distributed File System have been developed to allow for an immediate fallback to network access when local HDFS storage fails to provide the requested block. Both cache implementations are in pre-production testing at UCSD.
Using Xrootd to Federate Regional Storage Bauerdick, L; Benjamin, D; Bloom, K ...
Journal of physics. Conference series,
01/2012, Volume:
396, Issue:
4
Journal Article
Peer reviewed
Open access
While the LHC data movement systems have demonstrated the ability to move data at the necessary throughput, we have identified two weaknesses: the latency for physicists to access data and the ...complexity of the tools involved. To address these, both ATLAS and CMS have begun to federate regional storage systems using Xrootd. Xrootd, referring to a protocol and implementation, allows us to provide data access to all disk-resident data from a single virtual endpoint. This “redirector” discovers the actual location of the data and redirects the client to the appropriate site. The approach is particularly advantageous since typically the redirection requires much less than 500 milliseconds and the Xrootd client is conveniently built into LHC physicists’ analysis tools. Currently, there are three regional storage federations - a US ATLAS region, a European CMS region, and a US CMS region. The US ATLAS and US CMS regions include their respective Tier 1, Tier 2 and some Tier 3 facilities; a large percentage of experimental data is available via the federation. Additionally, US ATLAS has begun studying low-latency regional federations of close-by sites. From the base idea of federating storage behind an endpoint, the implementations and use cases diverge. The CMS software framework is capable of efficiently processing data over high-latency links, so using the remote site directly is comparable to accessing local data. The ATLAS processing model allows a broad spectrum of user applications with varying degrees of performance with regard to latency; a particular focus has been optimizing n-tuple analysis. Both VOs use GSI security. ATLAS has developed a mapping of VOMS roles to specific file system authorizations, while CMS has developed callouts to the site's mapping service. Each federation presents a global namespace to users. For ATLAS, the global-to-local mapping is based on a heuristic-based lookup from the site's local file catalog, while CMS does the mapping based on translations given in a configuration file. We will also cover the latest usage statistics and interesting use cases that have developed over the previous 18 months.
CMS will require access to more than 125k processor cores for the beginning of Run 2 in 2015 to carry out its ambitious physics program with more and higher complexity events. During Run1 these ...resources were predominantly provided by a mix of grid sites and local batch resources. During the long shut down cloud infrastructures, diverse opportunistic resources and HPC supercomputing centers were made available to CMS, which further complicated the operations of the submission infrastructure. In this presentation we will discuss the CMS effort to adopt and deploy the glideinWMS system as a common resource provisioning layer to grid, cloud, local batch, and opportunistic resources and sites. We will address the challenges associated with integrating the various types of resources, the efficiency gains and simplifications associated with using a common resource provisioning layer, and discuss the solutions found. We will finish with an outlook of future plans for how CMS is moving forward on resource provisioning for more heterogenous architectures and services.