The Belle II experiment at KEK is preparing for first collisions in 2017. Processing the large amounts of data that will be produced will require conditions data to be readily available to systems ...worldwide in a fast and efficient manner that is straightforward for both the user and maintainer. The Belle II conditions database was designed with a straightforward goal: make it as easily maintainable as possible. To this end, HEP-specific software tools were avoided as much as possible and industry standard tools used instead. HTTP REST services were selected as the application interface, which provide a high-level interface to users through the use of standard libraries such as curl. The application interface itself is written in Java and runs in an embedded Payara-Micro Java EE application server. Scalability at the application interface is provided by use of Hazelcast, an open source In-Memory Data Grid (IMDG) providing distributed in-memory computing and supporting the creation and clustering of new application interface instances as demand increases. The IMDG provides fast and efficient access to conditions data via in-memory caching.
Transparency and data integrity are crucial to any scientific study wanting to garner impact and credibility in the scientific community. The purpose of this paper is to discuss how this can be ...achieved using what we define as the Semantic Catalog. The catalog exploits community vocabularies as well as linked open data best practices to seamlessly describe and link things, data, and off-the-shelf (OTS) services to support scientific offshore wind energy research for the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) Wind and Water Power Program. This is largely made possible by leveraging collaborative advances in the Internet of Things (IoT), Semantic Web, Linked Services, Linked Open Data (LOD), and Resource Description Framework (RDF) vocabulary communities, which provides the foundation for our design. By adapting these linked community best practices, we designed a wind characterization Data Management Facility (DMF) capable of continuous data collection, processing, and preservation of in situ and remote sensing instrument measurements. The design incorporates the aforementioned Semantic Catalog which provides a transparent and ubiquitous interface for its user community to the things, data, and services for which the DMF is composed.
Belle II Conditions Database Ritter, M; Wood, L; Kuhr, T ...
Journal of physics. Conference series,
09/2018, Letnik:
1085, Številka:
3
Journal Article
Recenzirano
Odprti dostop
The Belle II experiment at KEK is preparing for taking first collision data in early 2018. For the success of the experiment it is essential to have information about varying conditions available to ...systems worldwide in a fast and efficient manner that is straightforward for both the user and maintainer. The Belle II Conditions Database was designed to make maintenance as easy as possible. To this end, a HTTP REST service was developed with industry-standard tools such as Swagger for the API interface development, Payara for the Java EE application server, and the Hazelcast in-memory data grid for support of scalable caching as well as transparent distribution of the service across multiple sites. On the client side, the online and offline software has to be able to obtain conditions data from the Belle II Conditions Database in a robust and reliable way under very different situations. As such the client side interface to the Belle II Conditions Database has been designed with a variety of access mechanisms which allow the software to be used with and without an internet connection. Different methods to access the payload information are implemented to allow for a high level of customization per site and to simplify testing of new payloads locally. Changes to the conditions data are usually handled transparently but users can actively check whether an object has changed or register callback functions to be called whenever a conditions data object is updated. In addition a command line user interface has been developed to simplify inspection and modification of the database contents.
Global cloud resolving models at resolutions of 4km or less create significant challenges for simulation output, data storage, data management, and post-simulation analysis and visualization. To ...support efficient model output as well as data analysis, new methods for IO and data organization must be evaluated. The model we are supporting, the Global Cloud Resolving Model being developed at Colorado State University, uses a geodesic grid. The non-monotonic nature of the grid's coordinate variables requires enhancements to existing data processing tools and community standards for describing and manipulating grids. The resolution, size and extent of the data suggest the need for parallel analysis tools and allow for the possibility of new techniques in data mining, filtering and comparison to observations. We describe the challenges posed by various aspects of data generation, management, and analysis, our work exploring IO strategies for the model, and a preliminary architecture, web portal, and tool enhancements which, when complete, will enable broad community access to the data sets in familiar ways to the community.
Applying subsurface simulation codes to understand heterogeneous flow and transport problems is a complex process potentially involving multiple models, multiple scales, and spanning multiple ...scientific disciplines. A typical end-to-end process involves many tools, scripts and data sources usually shared only though informal channels. Additionally, the process contains many sub-processes that are repeated frequently and could be automated and shared. Finally, keeping records of the models, processes, and correlation between inputs and outputs is currently manual, time consuming and error prone. We are developing a software framework that integrates a workflow execution environment, shared data repository, and analysis and visualization tools to support development and use of new hybrid subsurface simulation codes. We are taking advantage of recent advances in scientific process automation using the Kepler system and advances in data services based on content management. Extensibility and flexibility are key underlying design considerations to support the constantly changing set of tools, scripts, and models available. We describe the architecture and components of this system with early examples of applying it to a continuum subsurface model.
The purpose of this paper is to illustrate the use of semantic technologies and approaches to seamlessly link things, services, and data in the proposed design of a scientific offshore wind energy ...research for the U.S. Department of Energy Wind and Water Technology Office of the Office of Energy Efficiency and Renewable Energy (EERE). By adapting linked community best practices, we were able to design a collaborative facility supporting both operational staff and end users that incorporates off-the-shelf components and overcome traditional barriers between devices, resulting data, and processing services. This was made largely possible through complementary advances in the Internet of Things (IoT), semantic web, Linked Services, and Linked Data communities, which provide the foundation for our design.
Basis sets are some of the most important input data for computational models in the chemistry, materials, biology, and other science domains that utilize computational quantum mechanics methods. ...Providing a shared, Web-accessible environment where researchers can not only download basis sets in their required format but browse the data, contribute new basis sets, and ultimately curate and manage the data as a community will facilitate growth of this resource and encourage sharing both data and knowledge. We describe the Basis Set Exchange (BSE), a Web portal that provides advanced browsing and download capabilities, facilities for contributing basis set data, and an environment that incorporates tools to foster development and interaction of communities. The BSE leverages and enables continued development of the basis set library originally assembled at the Environmental Molecular Sciences Laboratory.
We propose an approach for improved reproducibility that includes capturing and relating provenance characteristics and performance metrics. We discuss two use cases: scientific reproducibility of ...results in the Energy Exascale Earth System Model (E3SM—previously ACME) and performance reproducibility in molecular dynamics workflows on HPC platforms. To capture and persist the provenance and performance data of these workflows, we have designed and developed the Chimbuko and ProvEn frameworks. Chimbuko captures provenance and enables detailed single workflow performance analysis. ProvEn is a hybrid, queryable system for storing and analyzing the provenance and performance metrics of multiple runs in workflow performance analysis campaigns. Workflow provenance and performance data output from Chimbuko can be visualized in a dynamic, multilevel visualization providing overview and zoom-in capabilities for areas of interest. Provenance and related performance data ingested into ProvEn is queryable and can be used to reproduce runs. Our provenance-based approach highlights challenges in extracting information and gaps in the information collected. It is agnostic to the type of provenance data it captures so that both the reproducibility of scientific results and that of performance can be explored with our tools.
•A hybrid pore-continuum multiscale flow and reactive transport model is proposed.•The multiscale model approach is demonstrated for a mixing-controlled reaction.•The new model provides improved ...predictive capability over single-scale models.
Continuum-scale models, which employ a porous medium conceptualization to represent properties and processes averaged over a large number of solid grains and pore spaces, are widely used to study subsurface flow and reactive transport. Recently, pore-scale models, which explicitly resolve individual soil grains and pores, have been developed to more accurately model and study pore-scale phenomena, such as mineral precipitation and dissolution reactions, microbially-mediated surface reactions, and other complex processes. However, these highly-resolved models are prohibitively expensive for modeling domains of sizes relevant to practical problems. To broaden the utility of pore-scale models for larger domains, we developed a hybrid multiscale model that initially simulates the full domain at the continuum scale and applies a pore-scale model only to areas of high reactivity. Since the location and number of pore-scale model regions in the model varies as the reactions proceed, an adaptive script defines the number and location of pore regions within each continuum iteration and initializes pore-scale simulations from macroscale information. Another script communicates information from the pore-scale simulation results back to the continuum scale. These components provide loose coupling between the pore- and continuum-scale codes into a single hybrid multiscale model implemented within the SWIFT workflow environment. In this paper, we consider an irreversible homogeneous bimolecular reaction (two solutes reacting to form a third solute) in a 2D test problem. This paper is focused on the approach used for multiscale coupling between pore- and continuum-scale models, application to a realistic test problem, and implications of the results for predictive simulation of mixing-controlled reactions in porous media. Our results and analysis demonstrate that the hybrid multiscale method provides a feasible approach for increasing the accuracy of subsurface reactive transport simulations.
The Belle II experiment at KEK observed its first collisions in
the summer of 2018. Processing the large amounts of data that will be
produced requires conditions data to be readily available to ...systems
worldwide in a fast and efficient manner that is straightforward for both the
user and maintainer. This was accomplished by relying on industrystandard
tools and methods: the conditions database is built as an HTTP
REST service using tools such as Swagger for the API interface
development, Payara for the Java EE application server, and Squid for the
caching proxy. This article presents the design of the Belle II conditions
database production environment as well as details about the capabilities
and performance during both Monte Carlo campaigns and data
reprocessing.