Debugging large-scale data science pipelines using dagger

E-viri

PDF

Recenzirano Odprti dostop

Debugging large-scale data science pipelines using dagger

Rezig, El Kindi; Brahmaroutu, Ashrita; Tatbul, Nesime; Ouzzani, Mourad; Tang, Nan; Mattson, Timothy; Madden, Samuel; Stonebraker, Michael

Proceedings of the VLDB Endowment, 08/2020, Letnik: 13, Številka: 12

Journal Article

Data pipelines are the new code. Consequently, data scientists need new tools to support the often time-consuming process of debugging their pipelines. We introduce Dagger , an end-to-end system to debug and mitigate data-centric errors in data pipelines, such as a data transformation gone wrong or a classifier underperforming due to noisy training data. Dagger supports inter-module debugging, where the pipeline blocks are treated as black boxes, as well as intra-module debugging, where users can debug data objects in Python scripts (e.g., DataFrames). In this demo, we will walk the audience through a rich, real-world business intelligence use case from our industrial collaborators at Intel, to highlight how Dagger enables data scientists to productively identify and mitigate data-centric problems at different stages of pipeline development.

Išči dalje

Avtor

Dostop do baze podatkov JCR je dovoljen samo uporabnikom iz Slovenije. Vaš trenutni IP-naslov ni na seznamu dovoljenih za dostop, zato je potrebna avtentikacija z ustreznim računom AAI.

Leto	Faktor vpliva		Izdaja		Kategorija		Razvrstitev
Leto	JCR	SNIP	JCR	SNIP	JCR	SNIP	JCR	SNIP

Povezave do osebnih bibliografij avtorjev	Povezave do podatkov o raziskovalcih v sistemu SICRIS

Vir: Osebne bibliografije in: SICRIS

Naloži sliko

Vnos na polico

Dodajanje gradiva na polico je uspelo.

Dodajanje gradiva na polico je spodletelo.

Dodajanje gradiva na polico ni bilo potrebno.

Trajna povezava

E-pošta

Faktor vpliva

Izberite knjižnično izkaznico:

Baze podatkov, v katerih je revija indeksirana

Citiranje

Tema