Treffer: GradeML:Towards holistic performance analysis for machine learning workflows

Title:

GradeML:Towards holistic performance analysis for machine learning workflows

Authors:

Hegeman, Tim, Jansen, Matthijs, Iosup, Alexandru, Trivedi, Animesh

Source:

Hegeman, T, Jansen, M, Iosup, A & Trivedi, A 2021, GradeML : Towards holistic performance analysis for machine learning workflows. in ICPE 2021 : Companion of the ACM/SPEC International Conference on Performance Engineering. Association for Computing Machinery, Inc, pp. 57-63, 2021 ACM/SPEC International Conference on Performance Engineering, ICPE 2021, Virtual, Online, France, 19/04/21. https://doi.org/10.1145/3447545.3451185

Publisher Information:

Association for Computing Machinery, Inc

Publication Year:

2021

Subject Terms:

Data gathering, GradeML, Machine learning workflow, MLDevOps, Modeling, Performance analysis, /dk/atira/pure/sustainabledevelopmentgoals/responsible_consumption_and_production, name=SDG 12 - Responsible Consumption and Production

Document Type:

Fachzeitschrift article in journal/newspaper

Language:

English

ISBN:

978-1-4503-8331-8
1-4503-8331-9

Relation:

info:eu-repo/semantics/altIdentifier/hdl/https://hdl.handle.net/1871.1/e2a2a21e-f458-4f86-b6bd-2ca530784ac5; info:eu-repo/semantics/altIdentifier/isbn/9781450383318; urn:ISBN:9781450383318

DOI:

10.1145/3447545.3451185

Availability:

https://research.vu.nl/en/publications/e2a2a21e-f458-4f86-b6bd-2ca530784ac5
https://doi.org/10.1145/3447545.3451185
https://hdl.handle.net/1871.1/e2a2a21e-f458-4f86-b6bd-2ca530784ac5
https://www.scopus.com/pages/publications/85104932960
https://www.scopus.com/inward/citedby.url?scp=85104932960&partnerID=8YFLogxK

Rights:

info:eu-repo/semantics/openAccess

Accession Number:

edsbas.321A4F59

Database:

BASE

Weitere Informationen

Today, machine learning (ML) workloads are nearly ubiquitous. Over the past decade, much effort has been put into making ML model-training fast and efficient, e.g., by proposing new ML frameworks (such as TensorFlow, PyTorch), leveraging hardware support (TPUs, GPUs, FPGAs), and implementing new execution models (pipelines, distributed training). Matching this trend, considerable effort has also been put into performance analysis tools focusing on ML model-training. However, as we identify in this work, ML model training rarely happens in isolation and is instead one step in a larger ML workflow. Therefore, it is surprising that there exists no performance analysis tool that covers the entire life-cycle of ML workflows. Addressing this large conceptual gap, we envision in this work a holistic performance analysis tool for ML workflows. We analyze the state-of-practice and the state-of-the-art, presenting quantitative evidence about the performance of existing performance tools. We formulate our vision for holistic performance analysis of ML workflows along four design pillars: a unified execution model, lightweight collection of performance data, efficient data aggregation and presentation, and close integration in ML systems. Finally, we propose first steps towards implementing our vision as GradeML, a holistic performance analysis tool for ML workflows. Our preliminary work and experiments are open source at https://github.com/atlarge-research/grademl.

Treffer: GradeML:Towards holistic performance analysis for machine learning workflows

Weitere Informationen

Links

Zusatz-Funktionen