Improving Data-Analytics Performance Via Autonomic Control of Concurrency and Resource Units

E-viri

PDF

Celotno besedilo

Recenzirano Odprti dostop

Improving Data-Analytics Performance Via Autonomic Control of Concurrency and Resource Units

Lee, Gil Jae; Fortes, José A. B.

ACM transactions on autonomous and adaptive systems, 03/2019, Letnik: 13, Številka: 3

Journal Article

Many big-data processing jobs use data-analytics frameworks such as Apache Hadoop (currently also known as YARN). Such frameworks have tunable configuration parameters set by experienced system administrators and/or job developers. However, tuning parameters manually can be hard and time-consuming because it requires domain-specific knowledge and understanding of complex inter-dependencies among parameters. Most of the frameworks seek efficient resource management by assigning resource units to jobs, the maximum number of units allowed in a system being part of the static configuration of the system. This static resource management has limited effectiveness in coping with job diversity and workload dynamics, even in the case of a single job. The work reported in this article seeks to improve performance (e.g., multiple-jobs makespan and job completion time) without modification of either the framework or the applications and avoiding problems of previous self-tuning approaches based on performance models or resource usage. These problems include (1) the need for time-consuming training, typically offline and (2) unsuitability for multi-jobs/tenant environments. This article proposes a hierarchical self-tuning approach using (1) a fuzzy-logic controller to dynamically adjust the maximum number of concurrent jobs and (2) additional controllers (one for each cluster node) to adjust the maximum number of resource units assigned to jobs on each node. The fuzzy-logic controller uses fuzzy rules based on a concave-downward relationship between aggregate CPU usage and the number of concurrent jobs. The other controllers use a heuristic algorithm to adjust the number of resource units on the basis of both CPU and disk IO usage by jobs. To manage the maximum number of available resource units in each node, the controllers also take resource usage by other processes (e.g., system processes) into account. A prototype of our approach was implemented for Apache Hadoop on a cluster running at CloudLab. The proposed approach was demonstrated and evaluated with workloads composed of jobs with similar resource usage patterns as well as other realistic mixed-pattern workloads synthesized by SWIM, a statistical workload injector for MapReduce. The evaluation shows that the proposed approach yields up to a 48% reduction of the jobs makespan that results from using Hadoop-default settings.

Išči dalje

Avtor

Lee, Gil Jae | Fortes, José A. B.

Dostop do baze podatkov JCR je dovoljen samo uporabnikom iz Slovenije. Vaš trenutni IP-naslov ni na seznamu dovoljenih za dostop, zato je potrebna avtentikacija z ustreznim računom AAI.

Leto	Faktor vpliva		Izdaja		Kategorija		Razvrstitev
Leto	JCR	SNIP	JCR	SNIP	JCR	SNIP	JCR	SNIP

Povezave do osebnih bibliografij avtorjev	Povezave do podatkov o raziskovalcih v sistemu SICRIS

Vir: Osebne bibliografije in: SICRIS

Naloži sliko

Vnos na polico

Dodajanje gradiva na polico je uspelo.

Dodajanje gradiva na polico je spodletelo.

Dodajanje gradiva na polico ni bilo potrebno.

Trajna povezava

E-pošta

Faktor vpliva

Izberite knjižnično izkaznico:

Baze podatkov, v katerih je revija indeksirana

Citiranje

Tema