Improvements in web browser performance and web standards compliance, as well as the availability of comprehensive JavaScript libraries, provides an opportunity to develop functionally rich yet ...intuitive web applications that allow users to access, render and analyse data in novel ways. However, the development of such large-scale JavaScript web applications presents new challenges, in particular with regard to code sustainability and team-based work. We present an approach that meets the challenges of large-scale JavaScript web application design and development, including client-side model-view-controller architecture, design patterns, and JavaScript libraries. Furthermore, we show how the approach leads naturally to the encapsulation of the data source as a web API, allowing applications to be easily ported to new data sources. The Experiment Dashboard framework is used for the development of applications for monitoring the distributed computing activities of virtual organisations on the Worldwide LHC Computing Grid. We demonstrate the benefits of the approach for large-scale JavaScript web applications in this context by examining the design of several Experiment Dashboard applications for data processing, data transfer and site status monitoring, and by showing how they have been ported for different virtual organisations and technologies.
A Grid job monitoring system Dumitrescu, Catalin; Nowack, Andreas; Padhi, Sanjay ...
Journal of physics. Conference series,
04/2010, Volume:
219, Issue:
7
Journal Article
Peer reviewed
Open access
This paper presents a web-based Job Monitoring framework for individual Grid sites that allows users to follow in detail their jobs in quasi-real time. The framework consists of several independent ...components : (a) a set of sensors that run on the site CE and worker nodes and update a database, (b) a simple yet extensible web services framework and (c) an Ajax powered web interface having a look-and-feel and control similar to a desktop application. The monitoring framework supports LSF, Condor and PBS-like batch systems. This is one of the first monitoring systems where an X.509 authenticated web interface can be seamlessly accessed by both end-users and site administrators. While a site administrator has access to all the possible information, a user can only view the jobs for the Virtual Organizations (VO) he/she is a part of. The monitoring framework design supports several possible deployment scenarios. For a site running a supported batch system, the system may be deployed as a whole, or existing site sensors can be adapted and reused with the web services components. A site may even prefer to build the web server independently and choose to use only the Ajax powered web interface. Finally, the system is being used to monitor a glideinWMS instance. This broadens the scope significantly, allowing it to monitor jobs over multiple sites.
Grids enable uniform access to resources by implementing standard interfaces to resource gateways. In the Open Science Grid (OSG), privileges are granted on the basis of the user's membership to a ...Virtual Organization (VO). However, Grid sites are solely responsible to determine and control access privileges to resources using users' identity and personal attributes, which are available through Grid credentials. While this guarantees full control on access rights to the sites, it makes VO privileges heterogeneous throughout the Grid and hardly fits with the Grid paradigm of uniform access to resources. To address these challenges, we are developing the Scalable Virtual Organization Privileges Management Environment (SVOPME), which provides tools for VOs to define and publish desired privileges and assists sites to provide the appropriate access policies. Moreover, SVOPME provides tools for Grid sites to analyze site access policies for various resources, verify compliance with preferred VO policies, and generate directives for site administrators on how the local access policies can be amended to achieve such compliance without taking control of local configurations away from site administrators. This paper discusses what access policies are of interest to the OSG community and how SVOPME implements privilege management for OSG.
The H1 Virtual Organization (VO), as one of the small VOs, employs most components of the EMI or gLite Middleware. In this framework, a monitoring system is designed for the H1 Experiment to identify ...and recognize within the GRID the best suitable resources for execution of CPU-time consuming Monte Carlo (MC) simulation tasks (jobs). Monitored resources are Computer Elements (CEs), Storage Elements (SEs), WMS-servers (WMSs), CernVM File System (CVMFS) available to the VO HONE and local GRID User Interfaces (UIs). The general principle of monitoring GRID elements is based on the execution of short test jobs on different CE queues using submission through various WMSs and directly to the CREAM-CEs as well. Real H1 MC Production jobs with a small number of events are used to perform the tests. Test jobs are periodically submitted into GRID queues, the status of these jobs is checked, output files of completed jobs are retrieved, the result of each job is analyzed and the waiting time and run time are derived. Using this information, the status of the GRID elements is estimated and the most suitable ones are included in the automatically generated configuration files for use in the H1 MC production. The monitoring system allows for identification of problems in the GRID sites and promptly reacts on it (for example by sending GGUS (Global Grid User Support) trouble tickets). The system can easily be adapted to identify the optimal resources for tasks other than MC production, simply by changing to the relevant test jobs. The monitoring system is written mostly in Python and Perl with insertion of a few shell scripts. In addition to the test monitoring system we use information from real production jobs to monitor the availability and quality of the GRID resources. The monitoring tools register the number of job resubmissions, the percentage of failed and finished jobs relative to all jobs on the CEs and determine the average values of waiting and running time for the involved GRID queues. CEs which do not meet the set criteria can be removed from the production chain by including them in an exception table. All of these monitoring actions lead to a more reliable and faster execution of MC requests.
All major experiments at the Large Hadron Collider (LHC) need to measure real storage usage at the Grid sites. This information is equally important for resource management, planning, and operations. ...To verify the consistency of central catalogs, experiments are asking sites to provide a full list of the files they have on storage, including size, checksum, and other file attributes. Such storage dumps, provided at regular intervals, give a realistic view of the storage resource usage by the experiments. Regular monitoring of the space usage and data verification serve as additional internal checks of the system integrity and performance. Both the importance and the complexity of these tasks increase with the constant growth of the total data volumes during the active data taking period at the LHC. The use of common solutions helps to reduce the maintenance costs, both at the large Tier1 facilities supporting multiple virtual organizations and at the small sites that often lack manpower. We discuss requirements and solutions to the common tasks of data storage accounting and verification, and present experiment-specific strategies and implementations used within the LHC experiments according to their computing models.
FermiGrid—experience and future plans Chadwick, K; Berman, E; Canal, P ...
Journal of physics. Conference series,
07/2008, Volume:
119, Issue:
5
Journal Article
Peer reviewed
Open access
Fermilab supports a scientific program that includes experiments and scientists located across the globe. In order to better serve this community, Fermilab has placed its production computer ...resources in a Campus Grid infrastructure called FermiGrid . The FermiGrid infrastructure allows the large experiments at Fermilab to have priority access to their own resources, enables sharing of these resources in an opportunistic fashion, and movement of work (jobs, data) between the Campus Grid and National Grids such as Open Science Grid (OSG) and the Worldwide LHC Computing Grid Collaboration (WLCG). FermiGrid resources support multiple Virtual Organizations (VOs), including VOs from the OSG, EGEE, and the WLCG. Fermilab also makes leading contributions to the Open Science Grid in the areas of accounting, batch computing, grid security, job management, resource selection, site infrastructure, storage management, and VO services. Through the FermiGrid interfaces, authenticated and authorized VOs and individuals may access our core grid services, the 10,000+ Fermilab resident CPUs, near-petabyte (including CMS) online disk pools and the multi-petabyte Fermilab Mass Storage System. These core grid services include a site wide Globus gatekeeper, VO management services for several VOs, Fermilab site authorization services, grid user mapping services, as well as job accounting and monitoring, resource selection and data movement services. Access to these services is via standard and well-supported grid interfaces. We will report on the user experience of using the FermiGrid campus infrastructure interfaced to a national cyberinfrastructure - the successes and the problems.
CMS computing needs reliable, stable and fast connections among multi-tiered computing infrastructures. For data distribution, the CMS experiment relies on a data placement and transfer system, ...PhEDEx, managing replication operations at each site in the distribution network. PhEDEx uses the File Transfer Service (FTS), a low level data movement service responsible for moving sets of files from one site to another, while allowing participating sites to control the network resource usage. FTS servers are provided by Tier-0 and Tier-1 centres and are used by all computing sites in CMS, according to the established policy. FTS needs to be set up according to the Grid site's policies, and properly configured to satisfy the requirements of all Virtual Organizations making use of the Grid resources at the site. Managing the service efficiently requires good knowledge of the CMS needs for all kinds of transfer workflows. This contribution deals with a revision of FTS servers used by CMS, collecting statistics on their usage, customizing the topologies and improving their setup in order to keep CMS transferring data at the desired levels in a reliable and robust way.
The ATLAS Tier-3 farm at the University of Geneva provides storage and processing power for analysis of ATLAS data. In addition the facility is used for development, validation and commissioning of ...the High Level Trigger of ATLAS 1. The latter purpose leads to additional requirements on the availability of latest software and data, which will be presented. The farm is also a part of the WLCG 2, and is available to all members of the ATLAS Virtual Organization. The farm currently provides 268 CPU cores and 177 TB of storage space. A grid Storage Element, implemented with the Disk Pool Manager software 3, is available and integrated with the ATLAS Distributed Data Management system 4. The batch system can be used directly by local users, or with a grid interface provided by NorduGrid ARC middleware 5. In this article we will present the use cases that we support, as well as the experience with the software and the hardware we are using. Results of I/O benchmarking tests, which were done for our DPM Storage Element and for the NFS servers we are using, will also be presented.