Resource-constrained IoT devices, such as sensors and actuators, have become ubiquitous in recent years. This has led to the generation of large quantities of data in real-time, which is an appealing ...target for AI systems. However, deploying machine learning models on such end-devices is nearly impossible. A typical solution involves offloading data to external computing systems (such as cloud servers) for further processing but this worsens latency, leads to increased communication costs, and adds to privacy concerns. To address this issue, efforts have been made to place additional computing devices at the edge of the network, i.e., close to the IoT devices where the data is generated. Deploying machine learning systems on such edge computing devices alleviates the above issues by allowing computations to be performed close to the data sources. This survey describes major research efforts where machine learning systems have been deployed at the edge of computer networks, focusing on the operational aspects including compression techniques, tools, frameworks, and hardware used in successful applications of intelligent edge systems.
FCPP to aggregate them all Audrito, Giorgio; Torta, Gianluca
Science of computer programming,
January 2024, 2024-01-00, Volume:
231
Journal Article
Peer reviewed
Aggregate computing is a promising approach to the self-organisation of distributed devices, allowing to express on a high level of abstraction complex distributed algorithms with robust behaviour ...guarantees. This approach has been argued to be fruitfully applicable to many different contexts: wireless sensor networks, internet of things, self-organising edge, fog or cloud computing scenarios, and simulations of such. However, older implementations of this language rely on the Java Virtual Machine and have an high performance overhead, impairing their usability in contexts where performance is critical (cloud) or computational resources are tightly bounded (WSN/IoT).
FieldCalc++ (FCPP, implementing the field calculus in C++) overcomes these limitations, being the first aggregate computing implementation to be able to effectively target all these different contexts. Leveraging C++ compile time optimisations, fine-grained parallelism and an optimised design, the library stands out for its efficiency, portability and extensibility, beyond its support for aggregate programs on contexts ranging from WSN deployments to graph data processing on the cloud.
•FCPP is a C++ library implementing the Aggregate Programming paradigm.•FCPP effectively targets contexts where performance is critical (cloud) or resources are tightly bound (WSN/IoT).•The library stands out for its efficiency, portability and extensibility w.r.t. existing AP implementations.•FCPP comes with a powerful simulator able to run batch scenarios, and a 3D GUI for exploring the system behaviour.
Coded distributed computing (CDC) is a new technique proposed with the purpose of decreasing the intense data exchange required for parallelizing distributed computing systems. Under the famous ...MapReduce paradigm, this coded approach has been shown to decrease this communication overhead by a factor that is linearly proportional to the overall computation load during the mapping phase. In this paper, we propose multi-access distributed computing (MADC) as a generalization of the original CDC model, where now mappers (nodes in charge of the map functions) and reducers (nodes in charge of the reduce functions) are distinct computing nodes that are connected through a multi-access network topology. Focusing on the MADC setting with combinatorial topology, which implies <inline-formula> <tex-math notation="LaTeX">\Lambda </tex-math></inline-formula> mappers and <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula> reducers such that there is a unique reducer connected to any <inline-formula> <tex-math notation="LaTeX">\alpha </tex-math></inline-formula> mappers, we propose a coded scheme and an information-theoretic converse, which jointly identify the optimal inter-reducer communication load, as a function of the computation load, to within a constant gap of 1.5. Additionally, a modified coded scheme and converse identify the optimal max-link communication load across all existing links to within a gap of 4.
Distributed computing has become a common approach for large-scale computation tasks due to benefits such as high reliability, scalability, computation speed, and cost-effectiveness. However, ...distributed computing faces critical issues related to communication load and straggler effects. In particular, computing nodes need to exchange intermediate results with each other in order to calculate the final result, and this significantly increases communication overheads. Furthermore, a distributed computing network may include straggling nodes that run intermittently slower. This results in a longer overall time needed to execute the computation tasks, thereby limiting the performance of distributed computing. To address these issues, coded distributed computing (CDC), i.e., a combination of coding theoretic techniques and distributed computing, has been recently proposed as a promising solution. Coding theoretic techniques have proved effective in WiFi and cellular systems to deal with channel noise. Therefore, CDC may significantly reduce communication load, alleviate the effects of stragglers, provide fault-tolerance, privacy and security. In this survey, we first introduce the fundamentals of CDC, followed by basic CDC schemes. Then, we review and analyze a number of CDC approaches proposed to reduce the communication costs, mitigate the straggler effects, and guarantee privacy and security. Furthermore, we present and discuss applications of CDC in modern computer networks. Finally, we highlight important challenges and promising research directions related to CDC.
We consider a wireless distributed computing system, in which multiple mobile users, connected wirelessly through an access point, collaborate to perform a computation task. In particular, users ...communicate with each other via the access point to exchange their locally computed intermediate computation results, which is known as data shuffling. We propose a scalable framework for this system, in which the required communication bandwidth for data shuffling does not increase with the number of users in the network. The key idea is to utilize a particular repetitive pattern of placing the data set (thus a particular repetitive pattern of intermediate computations), in order to provide the coding opportunities at both the users and the access point, which reduce the required uplink communication bandwidth from users to the access point and the downlink communication bandwidth from access point to users by factors that grow linearly with the number of users. We also demonstrate that the proposed data set placement and coded shuffling schemes are optimal (i.e., achieve the minimum required shuffling load) for both a centralized setting and a decentralized setting, by developing tight information-theoretic lower bounds.
Deep Neural Networks (DNNs) are becoming an important tool in modern computing applications. Accelerating their training is a major challenge and techniques range from distributed algorithms to ...low-level circuit design. In this survey, we describe the problem from a theoretical perspective, followed by approaches for its parallelization. We present trends in DNN architectures and the resulting implications on parallelization strategies. We then review and model the different types of concurrency in DNNs: from the single operator, through parallelism in network inference and training, to distributed deep learning. We discuss asynchronous stochastic optimization, distributed system architectures, communication schemes, and neural architecture search. Based on those approaches, we extrapolate potential directions for parallelism in deep learning.
Demonstrating the effect that climate change is having on regional weather is a subject which occupies climate scientists, government policy makers and the media. After an extreme weather event ...occurs, the question is often posed, ‘Was the event caused by anthropogenic climate change?’ Recently, a new branch of climate science (known as attribution) has sought to quantify how much the risk of extreme events occurring has increased or decreased due to climate change. One method of attribution uses very large ensembles of climate models computed via volunteer distributed computing. A recent advancement is the ability to run both a global climate model and a higher resolution regional climate model on a volunteer's home computer. Such a set‐up allows the simulation of weather on a scale that is of most use to studies of the attribution of extreme events. This article introduces a global climate model that has been developed to simulate the climatology of all major land regions with reasonable accuracy. This then provides the boundary conditions to a regional climate model (which uses the same formulation but at higher resolution) to ensure that it can produce realistic climate and weather over any region of choice. The development process is documented and a comparison to previous coupled climate models and atmosphere‐only climate models is made. The system (known as weather@home) by which the global model is coupled to a regional climate model and run on volunteers' home computers is then detailed. Finally, a validation of the whole system is performed, with a particular emphasis on how accurately the distributions of daily mean temperature and daily mean precipitation are modelled in a particular application over Europe. This builds confidence in the applicability of the weather@home system for event attribution studies.
In distributed computing systems, to mitigate the adverse effect of stragglers on the computation time, computation redundancy is used. The redundancy can be added proactively at the beginning, or ...reactively after some time based on the delay pattern of the workers. While most of the existing work with reactive mitigation strategy only considered task replication, we propose a coded reactive straggler mitigation with an uncoded and a coded phase for distributed matrix-matrix multiplications. Specifically, in the uncoded phase of the proposed strategy, the master distributes the computational job without redundancy among the workers. After a predetermined waiting time, the master cancels the remaining tasks. It then encodes the remaining tasks and distributes them among the workers. In the uncoded phase, in addition to the conventional erasure model, where workers can communicate only once, we consider multi-message communication (MMC) model to exploit the partial works done by workers. The optimum waiting time for the uncoded phase and the optimum code rate for the coded phase are also obtained. Our simulation results demonstrate that the proposed coded reactive mitigation significantly decreases the execution time in comparison with both the proactive mitigation strategy or the existing reactive mitigation strategy.
There is a growing demand for software developers who have experience writing parallel programs rather than just "parallelizing" sequential systems as computer hardware gets more and more parallel. ...In order to develop the skills of future software engineers, it is crucial to teach pupils parallelism in elementary computer science courses. We searched the Scopus database for articles on "teaching parallel and distributed computing" and "parallel programming," published in English between 2008 and 2019. 26 papers were included in the study after quality review. As a result, a lab course using the C++ programming language and MPI library serves as the primary teaching tool for parallel and distributed computing.
Existing research has been concentrated on improving the reliability of a distributed computing system through optimizing tasks allocation, providing software redundancy and providing hardware ...redundancy. None of these works considered the performance sharing mechanism in a distributed computing system. Different from other performance sharing systems whose reliability can be calculated directly, the reliability evaluation of a distributed computing system with performance sharing is more challenging since the reliability depends on the task execution time of each processor after performance sharing. This research considers a distributed computing system with performance sharing mechanism such that the computing power can be redistributed among different processors in the system. A reliability model is proposed to evaluate the distributed computing system with performance sharing. An optimization model is formulated to derive the optimal performance sharing policy such that the system reliability can be maximized. Both analytic examples and numerical examples are carried out to illustrate the proposed model and algorithm.