The CMS detector project loads copies of conditions data to over 100,000 computer cores worldwide by using a software subsystem called Frontier. This subsystem translates database queries into HTTP, ...looks up the results in a central database at CERN, and caches the results in an industry-standard HTTP proxy/caching server called Squid. One of the most challenging aspects of any cache system is coherency, that is, ensuring that changes made to the underlying data get propagated out to all clients in a timely manner. Recently, the Frontier system was enhanced to drastically reduce the time for changes to be propagated everywhere without heavily loading servers. The propagation time is now as low as 15 minutes for some kinds of data and no more than 60 minutes for the rest of the data. This was accomplished by taking advantage of an HTTP and Squid feature called If-Modified-Since. In order to use this feature, the Frontier server sends a Last-Modified timestamp, but since modification times are not normally tracked by Oracle databases, a PL/SQL program was developed to track the modification times of database tables. We discuss the details of this caching scheme and the obstacles overcome including database and Squid bugs.
CMS conditions data access using FroNTier Blumenfeld, B; Dykstra, D; Lueking, L ...
Journal of physics. Conference series,
07/2008, Volume:
119, Issue:
7
Journal Article
Peer reviewed
Open access
The CMS experiment at the LHC has established an infrastructure using the FroNTier framework to deliver conditions (i.e. calibration, alignment, etc.) data to processing clients worldwide. FroNTier ...is a simple web service approach providing client HTTP access to a central database service. The system for CMS has been developed to work with POOL which provides object relational mapping between the C++ clients and various database technologies. Because of the read only nature of the data, Squid proxy caching servers are maintained near clients and these caches provide high performance data access. Several features have been developed to make the system meet the needs of CMS including careful attention to cache coherency with the central database, and low latency loading required for the operation of the online High Level Trigger. The ease of deployment, stability of operation, and high performance make the FroNTier approach well suited to the GRID environment being used for CMS offline, as well as for the online environment used by the CMS High Level Trigger. The use of standard software, such as Squid and various monitoring tools, makes the system reliable, highly configurable and easily maintained. We describe the architecture, software, deployment, performance, monitoring and overall operational experience for the system.
The CMS dataset bookkeeping service Afaq, A; Dolgert, A; Guo, Y ...
Journal of physics. Conference series,
07/2008, Volume:
119, Issue:
7
Journal Article
Peer reviewed
Open access
The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data ...provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.
The CMS DBS query language Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar ...
Journal of physics. Conference series,
04/2010, Volume:
219, Issue:
4
Journal Article
Peer reviewed
Open access
The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the ...services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.
Distributed Analysis in CMS Fanfani, Alessandra; Afaq, Anzar; Sanches, Jose Afonso ...
Journal of grid computing,
06/2010, Volume:
8, Issue:
2
Journal Article
Open access
The CMS experiment expects to manage several Pbytes of data each year during the LHC programme, distributing them over many computing sites around the world and enabling data access at those centers ...for analysis. CMS has identified the distributed sites as the primary location for physics analysis to support a wide community with thousands potential users. This represents an unprecedented experimental challenge in terms of the scale of distributed computing resources and number of user. An overview of the computing architecture, the software tools and the distributed infrastructure is reported. Summaries of the experience in establishing efficient and scalable operations to get prepared for CMS distributed analysis are presented, followed by the user experience in their current analysis activities.
The DO experiment at Fermilab is accumulating data from the electronic detection of collisions between protons and anti-protons. The presentation describes the data structure, data cataloging and ...serving of the multiterabyte data set to a user community. The current data consists of over 85 terabytes stored in a hierarchy of data sets with various latencies and frequencies of use. The primary data storage is on some 40,000 8-mm tapes while the most frequently used data is on nearly 300 Gigabytes of SCSI disks. Data is served to VMS and UNIX analysis clusters over an FDDI network from a centralized file server. We also describe plans for handling a future data set anticipated to be an order of magnitude larger. Some of the ideas being considered are alternative data structures, parallel disk access, automated tape libraries, and centralized analysis servers.
We present a grid system, which is in development, employing an architecture comprising the primary functional components of job handling, data handling, and monitoring and information services. Each ...component is built using existing Grid middleware. The Job handling utilizes Condor Match Making Services to broker job submissions, Condor-G to schedule, and GRAM to submit and execute jobs on remote compute resources. The information services provide strategic information of the system including a file replica catalogue, compute availability, and network data-throughput rate predictions, which are made available to the other components. Data handling services are provided by SAM, the data management system built for the Dzero experiment at Fermilab, to optimize data delivery, and cache and replicate data as needed at the processing nodes. The SAM-Grid system is being built to provide experiments in progress at Fermilab the ability to utilize worldwide computing resources to process enormous quantities of data for complex physics analyses.
ECONFC0303241:THKT003,2003 The D0 experiment at Fermilab relies on a central Oracle database for storing
all detector calibration information. Access to this data is needed by hundreds
of physics ...applications distributed worldwide. In order to meet the demands of
these applications from scarce resources, we have created a distributed system
that isolates the user applications from the database facilities. This system,
known as the Database Application Network (DAN) operates as the middle tier in
a three tier architecture. A DAN server employs a hierarchical caching scheme
and database connection management facility that limits access to the database
resource. The modular design allows for caching strategies and database access
components to be determined by runtime configuration. To solve scalability
problems, a proxy database component allows for DAN servers to be arranged in a
hierarchy. Also included is an event based monitoring system that is currently
being used to collect statistics for performance analysis and problem
diagnosis. DAN servers are currently implemented as a Python multithreaded
program using CORBA for network communications and interface specification. The
requirement details, design, and implementation of DAN are discussed along with
operational experience and future plans.
The D0 experiment at Fermilab relies on a central Oracle database for storing all detector calibration information. Access to this data is needed by hundreds of physics applications distributed ...worldwide. In order to meet the demands of these applications from scarce resources, we have created a distributed system that isolates the user applications from the database facilities. This system, known as the Database Application Network (DAN) operates as the middle tier in a three tier architecture. A DAN server employs a hierarchical caching scheme and database connection management facility that limits access to the database resource. The modular design allows for caching strategies and database access components to be determined by runtime configuration. To solve scalability problems, a proxy database component allows for DAN servers to be arranged in a hierarchy. Also included is an event based monitoring system that is currently being used to collect statistics for performance analysis and problem diagnosis. DAN servers are currently implemented as a Python multithreaded program using CORBA for network communications and interface specification. The requirement details, design, and implementation of DAN are discussed along with operational experience and future plans.