The PanDA (Production and Distributed Analysis) workload management system was developed to meet the scale and complexity of distributed computing for the ATLAS experiment. PanDA managed resources ...are distributed worldwide, on hundreds of computing sites, with thousands of physicists accessing hundreds of Petabytes of data and the rate of data processing already exceeds Exabyte per year. While PanDA currently uses more than 200,000 cores at well over 100 Grid sites, future LHC data taking runs will require more resources than Grid computing can possibly provide. Additional computing and storage resources are required. Therefore ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. In this paper we will describe a project aimed at integration of ATLAS Production System with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). Current approach utilizes modified PanDA Pilot framework for job submission to Titan's batch queues and local data management, with lightweight MPI wrappers to run single node workloads in parallel on Titan's multi-core worker nodes. It provides for running of standard ATLAS production jobs on unused resources (backfill) on Titan. The system already allowed ATLAS to collect on Titan millions of core-hours per month, execute hundreds of thousands jobs, while simultaneously improving Titans utilization efficiency. We will discuss the details of the implementation, current experience with running the system, as well as future plans aimed at improvements in scalability and efficiency. Notice: This manuscript has been authored, by employees of Brookhaven Science Associates, LLC under Contract No. DE-AC02-98CH10886 with the U.S. Department of Energy. The publisher by accepting the manuscript for publication acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.
ATLAS WORLD-cloud and networking in PanDA Megino, F Barreiro; De, K; Di Girolamo, A ...
Journal of physics. Conference series,
10/2017, Letnik:
898, Številka:
5
Journal Article
Recenzirano
Odprti dostop
The ATLAS computing model was originally designed as static clouds (usually national or geographical groupings of sites) around the Tier 1 centres, which confined tasks and most of the data traffic. ...Since those early days, the sites' network bandwidth has increased at 0(1000) and the difference in functionalities between Tier 1s and Tier 2s has reduced. After years of manual, intermediate solutions, we have now ramped up to full usage of World-cloud, the latest step in the PanDA Workload Management System to increase resource utilization on the ATLAS Grid, for all workflows (MC production, data (re)processing, etc.). We have based the development on two new site concepts. Nuclei sites are the Tier 1s and large Tier 2s, where tasks will be assigned and the output aggregated, and satellites are the sites that will execute the jobs and send the output to their nucleus. PanDA dynamically pairs nuclei and satellite sites for each task based on the input data availability, capability matching, site load and network connectivity. This contribution will introduce the conceptual changes for World-cloud, the development necessary in PanDA, an insight into the network model and the first half-year of operational experience.
Having information such as an estimation of the processing time or possibility of system outage (abnormal behaviour) helps to assist to monitor system performance and to predict its next state. The ...current cyber-infrastructure of the ATLAS Production System presents computing conditions in which contention for resources among high-priority data analyses happens routinely, that might lead to significant workload and data handling interruptions. The lack of the possibility to monitor and to predict the behaviour of the analysis process (its duration) and system's state itself provides motivation for a focus on design of the built-in situational awareness analytic tools.
The paper describes the implementation of a high-performance system for the processing and analysis of log files for the PanDA infrastructure of the ATLAS experiment at the Large Hadron Collider ...(LHC), responsible for the workload management of order of 2M daily jobs across the Worldwide LHC Computing Grid. The solution is based on the ELK technology stack, which includes several components: Filebeat, Logstash, ElasticSearch (ES), and Kibana. Filebeat is used to collect data from logs. Logstash processes data and export to Elasticsearch. ES are responsible for centralized data storage. Accumulated data in ES can be viewed using a special software Kibana. These components were integrated with the PanDA infrastructure and replaced previous log processing systems for increased scalability and usability. The authors will describe all the components and their configuration tuning for the current tasks, the scale of the actual system and give several real-life examples of how this centralized log processing and storage service is used to showcase the advantages for daily operations.
The second generation of the ATLAS Production System called ProdSys2 is a distributed workload manager that runs daily hundreds of thousands of jobs, from dozens of different ATLAS specific ...workflows, across more than hundred heterogeneous sites. It achieves high utilization by combining dynamic job definition based on many criteria, such as input and output size, memory requirements and CPU consumption, with manageable scheduling policies and by supporting different kind of computational resources, such as GRID, clouds, supercomputers and volunteer-computers. The system dynamically assigns a group of jobs (task) to a group of geographically distributed computing resources. Dynamic assignment and resources utilization is one of the major features of the system, it didn't exist in the earliest versions of the production system where Grid resources topology was predefined using national or/and geographical pattern. Production System has a sophisticated job fault-recovery mechanism, which efficiently allows to run multi-Terabyte tasks without human intervention. We have implemented "train" model and open-ended production which allow to submit tasks automatically as soon as new set of data is available and to chain physics groups data processing and analysis with central production by the experiment. We present an overview of the ATLAS Production System and its major components features and architecture: task definition, web user interface and monitoring. We describe the important design decisions and lessons learned from an operational experience during the first year of LHC Run2. We also report the performance of the designed system and how various workflows, such as data (re)processing, Monte-Carlo and physics group production, users analysis, are scheduled and executed within one production system on heterogeneous computing resources.
The Production and Distributed Analysis system (PanDA) has been in use in the ATLAS Experiment since 2005. It uses a sophisticated pilot system to execute submitted jobs on the worker nodes. While ...originally designed for ATLAS, the PanDA Pilot has recently been refactored to facilitate use outside of ATLAS. Experiments are now handled as plug-ins such that a new PanDA Pilot user only has to implement a set of prototyped methods in the plug-in classes, and provide a script that configures and runs the experiment-specific payload. We will give an overview of the Next Generation PanDA Pilot system and will present major features and recent improvements including live user payload debugging, data access via the Federated XRootD system, stage-out to alternative storage elements, support for the new ATLAS DDM system (Rucio), and an improved integration with glExec, as well as a description of the experiment-specific plug-in classes. The performance of the pilot system in processing LHC data on the OSG, LCG and Nordugrid infrastructures used by ATLAS will also be presented. We will describe plans for future development on the time scale of the next few years.
Abstract Background The Nephrology Unit at São Lucas Hospital, a University Hospital in Southern Brazil, has recently reached 35 years since its first kidney transplant. Few centers in the area have ...made a longitudinal analysis of processes, problems, grafts, and patient survival changes along this time. Methods A single-center, retrospective study was performed. Data were separated into different eras, based on the nature of immunosuppression used: pre-cyclosporine (1978–1986), cyclosporine (1987–1997), mycophenolate introduction (1998–2002), new immunosuppressant drugs (2003–2007), and the current period (2008–2013). Results Between April 27, 1978, and April 30, 2013, 1231 transplants were performed. Significant differences were detected among different eras. The number of transplants has been progressively increasing, to include significantly older recipients (and donors), at a longer waiting list time, receiving organs that underwent longer cold ischemia time ( P < .001). Yet, fewer acute rejection episodes and lower incidence of myocardial infarction and post-transplant diabetes mellitus ( P < .001) were detected. In the present era, patient survival at 1, 3, and 5 years is 98.3%, 94.6%, and 90.5% respectively, for living donors, and 92.4%, 87.2%, and 80.7% for deceased donors, respectively. Living donor graft survival is 92.2%, 88.7%, and 82.4%, respectively, whereas deceased donor survival is 80.4%, 71.1%, and 63.7%, respectively. Conclusions This retrospective analysis has significant historical value. It assembles and depicts a long follow-up period of a transplant series at a single Brazilian center. Throughout the eras, organ and patient survival increased, with fewer rejection episodes or complications, yet with overall decreased graft function.
Electromagnetic fields (EMFs) can act as inducers or mediators of stress response through the production of heat shock proteins (HSPs) that modulate immune response and thymus functions. In this ...study, we analyzed cellular stress levels in rat thymus after exposure of the rats to a 2.45 GHz radio frequency (RF) using an experimental diathermic model in a Gigahertz Transverse Electromagnetic (GTEM) chamber.
In this experiment, we used H&E staining, the ELISA test and immunohistochemistry to examine Hsp70 and Hsp90 expression in the thymus and glucocorticoid receptors (GR) of 64 female Sprague–Dawley rats exposed individually to 2.45 GHz (at 0, 1.5, 3.0 or 12.0 W power). The 1 g averaged peak and mean SAR values in the thymus and whole body of each rat to ensure that sub-thermal levels of radiation were being reached.
The thymus tissue presented several morphological changes, including increased distribution of blood vessels along with the appearance of red blood cells and hemorrhagic reticuloepithelial cells. Levels of Hsp90 decreased in the thymus when animals were exposed to the highest power level (12 W), but only one group did not show recovery after 24 h. Hsp70 presented no significant modifications in any of the groups. The glucocorticoid receptors presented greater immunomarking on the thymic cortex in exposed animals.
Our results indicate that non-ionizing sub-thermal radiation causes changes in the endothelial permeability and vascularization of the thymus, and is a tissue-modulating agent for Hsp90 and GR.
Abstract Background Solid organ transplant recipients are susceptible to antibiotic-resistant infections and carbapenem-resistant Acinetobacter baumannii (CRAB) has recently been recognized as a ...serious complication in solid organ recipients. High mortality rates have been described. Methods We retrospectively analyzed 807 transplantations and detected 10 patients who died 24 hours after the diagnosis of septicemia, all with CRAB-positive blood cultures. Recipients were followed up for at least 1 year and were stratified into the following groups: Group 1, patients alive; Group 2, patients that died due to other causes except Acinetobacter infection; and Group 3, patients who died within 24 hours of CRAB diagnosis. Results CRAB-positive patients died a median of 3.17 (range, 1.81–18.7) months after transplantation. In these patients, expanded criteria donors (ECDs) were more frequent ( P < .001), as were the use of anti-thymocyte globulin (ATG) induction ( P = .02) and delayed graft function ( P = .01). For ECD recipients, death rate from any cause, whether induced with ATG or not, was 25% and 20.6%, respectively (odds ratio OR, 1.28; confidence interval CI 95%, 0.56–2.91; P = .68). The death rate from CRAB-related sepsis was 10.3% and 0% whether receiving ATG or not, respectively (OR, 15.49; CI 95%, 0.87–277.16; P = .014). There was a 25.75-fold increase in the death rate in ECD kidney recipients induced with thymoglobulin and with CRAB-related sepsis. Conclusion Transplants from ECDs and induced with thymoglobulin may be at increased risk of CRAB death in 24 hours when compared with patients with standard donors and induced with thymoglobulin.
The aim of this manuscript is to present an integrated process that includes reaction and separation steps for producing vanillin and lignin-based polyurethanes from
Kraft lignin. It provides details ...about lignin oxidation and subsequent vanillin recovery, as well as, the synthesis of lignin-based polyurethanes. The oxidation of
Kraft lignin in alkaline medium has been carried out in a batch reactor and the optimum operational conditions for vanillin production obtained. The feasibility of a continuous process for vanillin production has been analyzed using a structured bubble column reactor. The generated reaction stream (degraded lignin and sodium vanillate) was further subjected to an ultrafiltration process to recover the vanillate. An ion-exchange process allows recovering the vanillin by passing the vanillate solution through a column packed with an ion-exchange resin in H
+ form. The remaining lignin can act as a raw material to produce polyurethanes and/or biofuels. In this work the first approach was explored.