Through virtualization and resource integration, cloud computing has expanded its service area and offers a better user experience than the traditional platforms, along with its business operation ...model bringing huge economic and social benefits. However, a large amount of evidence shows that cloud computing is facing with serious security and trust crisis, and building a trust-enabled transaction environment has become its key factor. The traditional cloud trust model usually adopts a centralized architecture, which causes large management overhead, network congestion and even single point of failure. Furthermore, due to a lack of transparency and traceability, trust evaluation results cannot be fully recognized by all participants. Blockchain is a new and promising decentralized framework and distributed computing paradigm. Its unique features in operating rules and traceability of records ensure the integrity, undeniability and security of the transaction data. Therefore, blockchain is very suitable for constructing a distributed and decentralized trust architecture. This paper carries out a comprehensive survey on blockchain-based trust approaches in cloud computing systems. Based on a novel cloud-edge trust management framework and a double-blockchain structure based cloud transaction model, it identifies the open challenges and gives directions for future research in this field.
The CMS monitoring applications for LHC Run 3 Jashal, Brij Kishor; Kuznetsov, Valentin; Legger, Federica ...
EPJ Web of Conferences,
2024, Letnik:
295
Journal Article, Conference Proceeding
Recenzirano
Odprti dostop
Data taking at the Large Hadron Collider (LHC) at CERN restarted in 2022. The CMS experiment relies on a distributed computing infrastructure based on WLCG (Worldwide LHC Computing Grid) to support ...the LHC Run 3 physics program. The CMS computing infrastructure is highly heterogeneous and relies on a set of centrally provided services, such as distributed workload management and data management, and computing resources hosted at almost 150 sites worldwide. Smooth data taking and processing requires all computing subsystems to be fully operational, and available computing and storage resources need to be continuously monitored. During the long shutdown between LHC Run 2 and Run 3, the CMS monitoring infrastructure has undergone major changes to increase the coverage of monitored applications and services, while becoming more sustainable and easier to operate and maintain. The used technologies are based on open-source solutions, either provided by the CERN IT department through the MONIT infrastructure, or managed by the CMS monitoring team. Monitoring applications for distributed workload management, submission infrastructure based on HTCondor, distributed data management, facilities have been ported from mostly custom-built applications to use common data flow and visualization services. Data are mostly stored in non-SQL databases and storage technologies such as ElasticSearch, VictoriaMetrics, Prometheus, InfluxDB and HDFS, and accessed either via programmatic APIs, Apache Spark or Sqoop jobs, or visualized preferentially using Grafana. Most CMS monitoring applications are deployed on Kubernetes clusters to minimize maintenance operations. In this contribution we present the full stack of CMS monitoring services and show how we leveraged the use of common technologies to cover a variety of monitoring applications and cope with the computing challenges of LHC Run 3.
A Ceph S3 Object Data Store for HEP Smith, Nick; Jayatilaka, Bo; Mason, David ...
EPJ Web of Conferences,
2024, Letnik:
295
Journal Article, Conference Proceeding
Recenzirano
Odprti dostop
In CMS, data access and management is organized around the data tier model: a static definition of what subset of event information is available in a particular dataset, realized as a collection of ...files. We present a novel data management model that obviates the need for data tiers by exploding files into individual event data product objects. The objects are stored and retrieved through Ceph S3 technology, with a layout designed to minimize data and metadata volume while maximizing data processing parallelism. We demonstrate that this object data format shows promise in reducing total storage requirements while allowing more flexible data access patterns. Performance benchmarks of a prototype data processing framework using this object data format and a test Ceph cluster are presented, showing good scaling behavior in a distributed processing task.
The ATLAS experiment at CERN is one of the largest scientific machines built to date and will have ever growing computing needs as the Large Hadron Collider collects an increasingly larger volume of ...data over the next 20 years. ATLAS is conducting R&D projects on Amazon Web Services and Google Cloud as complementary resources for distributed computing, focusing on some of the key features of commercial clouds: lightweight operation, elasticity and availability of multiple chip architectures. The proof of concept phases have concluded with the cloud-native, vendoragnostic integration with the experiment’s data and workload management frameworks. Google Cloud has been used to evaluate elastic batch computing, ramping up ephemeral clusters of up to O(100k) cores to process tasks requiring quick turnaround. Amazon Web Services has been exploited for the successful physics validation of the Athena simulation software on ARM processors. We have also set up an interactive facility for physics analysis allowing endusers to spin up private, on-demand clusters for parallel computing with up to 4 000 cores, or run GPU enabled notebooks and jobs for machine learning applications. The success of the proof of concept phases has led to the extension of the Google Cloud project, where ATLAS will study the total cost of ownership of a production cloud site during 15 months with 10k cores on average, fully integrated with distributed grid computing resources and continue the R&D projects.
Hyper-Kamiokande is a next-generation multi-purpose neutrino experiment with a primary focus on constraining CP-violation in the lepton sector. It features a diverse science programme that includes ...neutrino oscillation studies, astrophysics, neutrino cross-section measurements, and searches for physics beyond the standard model, such as proton decay. Building on its predecessor, Super-Kamiokande, the Hyper-Kamiokande far detector has a total volume approximately 5 times larger and is estimated to collect nearly 2 PB of data per year. The experiment will also include both onand off-axis near detectors, including an Intermediate Water Cherenkov Detector. To manage the significant demands relating to the data from these detectors, and the associated Monte Carlo simulations for a range of physics studies, an efficient and scalable distributed computing model is essential. This model leverages Worldwide LHC Grid computing infrastructure and utilises the GridPP DIRAC instance for both workload management and for file cataloguing. In this report we forecast the computing requirements for the Hyper-K experiment, estimated to reach around 35 PB (per replica) and 8,700 CPU cores (~100,000 HS06) by 2036. We outline the resources, tools, and workflow in place to satisfy this demand.
As a distributed computing paradigm, edge computing has become a key technology for providing timely services to mobile devices by connecting Internet of Things (IoT), cloud centers, and other ...facilities. By offloading compute-intensive tasks from IoT devices to edge/cloud servers, the communication and computation pressure caused by the massive data in Industrial IoT can be effectively reduced. In the process of computation offloading in edge computing, it is critical to dynamically make optimal offloading decisions to minimize the delay and energy consumption spent on the devices. Although there are a large number of task offloading-decision models, how to measure and evaluate the quality of different models and configurations is crucial. In this article, we propose a novel simulation platform named ChainFL, which can build an edge computing environment among IoT devices while being compatible with federated learning and blockchain technologies to better support the embedding of security-focused offloading algorithms. ChainFL is lightweight and compatible, and it can quickly build complex network environments by connecting devices of different architectures. Moreover, due to its distributed nature, ChainFL can also be deployed as a federated learning platform across multiple devices to enable federated learning with high security due to its embedded blockchain. Finally, we validate the versatility and effectiveness of ChainFL by embedding a complex offloading-decision model in the platform, and deploying it in an Industrial IoT environment with security risks.
The Internet of Things (IoT) is defined as interconnected digital and mechanical devices with intelligent and interactive data transmission features over a defined network. The ability of the IoT to ...collect, analyze and mine data into information and knowledge motivates the integration of IoT with grid and cloud computing. New job scheduling techniques are crucial for the effective integration and management of IoT with grid computing as they provide optimal computational solutions. The computational grid is a modern technology that enables distributed computing to take advantage of a organization's resources in order to handle complex computational problems. However, the scheduling process is considered an NP-hard problem due to the heterogeneity of resources and management systems in the IoT grid. This paper proposed a Greedy Firefly Algorithm (GFA) for jobs scheduling in the grid environment. In the proposed greedy firefly algorithm, a greedy method is utilized as a local search mechanism to enhance the rate of convergence and efficiency of schedules produced by the standard firefly algorithm. Several experiments were conducted using the GridSim toolkit to evaluate the proposed greedy firefly algorithm's performance. The study measured several sizes of real grid computing workload traces, starting with lightweight traces with only 500 jobs, then typical with 3000 to 7000 jobs, and finally heavy load containing 8000 to 10,000 jobs. The experiment results revealed that the greedy firefly algorithm could insignificantly reduce the makespan makespan and execution times of the IoT grid scheduling process as compared to other evaluated scheduling methods. Furthermore, the proposed greedy firefly algorithm converges on large search spacefaster , making it suitable for large-scale IoT grid environments.
Povzetek: Članek predstavlja inteligentni algoritem za razporejanje nalog v oblačnih podatkovnih centrih, ki temelji na Cuckoo inteligentni metodologiji. Avtorja podrobno analizirata različne ...optimizacijske metode, kot so genetski algoritmi, požrešni algoritmi, Antlion optimizator in optimizacija kolonij mravelj. Predlagana uporaba algoritma, temelječega na Cuckoo metodi, naj bi izboljšala čas razporejanja in optimizacijo virov v dinamičnih okoljih, kar bi prispevalo k večji učinkovitosti oblačnih storitev.
Cloud computing is an emerging distributed computing paradigm that has become one of the extremely popular computing paradigms nowadays. One of the reasons for the popularity of cloud computing is ...due to its elasticity feature. Elasticity is a unique feature that enables the cloud platforms to add and remove resources “on the fly” to handle changes in workload demands. On the other hand, if the elasticity feature is not correctly managed, the cloud platforms may face over-provisioning or under-provisioning problems due to the arrival rate of users to the cloud applications varies over the time. Therefore, it necessitates the resource elasticity management issue as one of the challenging problems to be taken into account in the cloud computing environment. In this paper, we propose an elastic controller based on Colored Petri Nets to manage cloud infrastructures automatically. Finally, we evaluate the efficiency of the proposed elastic controller under three real workloads. The simulation results indicate that the proposed elastic controller reduces the response time by up to 4.8%, and increases the resource utilization and the elasticity by up to 9.3% and 6.7% respectively, compared with other approaches.