Treffer: Scanflow: an end-to-end agent-based autonomic ML workflow manager for clusters
Weitere Informationen
Machine Learning (ML) is more than just training models, the whole life-cycle must be considered. Once deployed, a ML model needs to be constantly managed, supervised and debugged to guarantee its availability, validity and robustness in dynamic contexts. This demonstration presents an agent-based ML workflow manager so-called Scanflow1, which enables autonomic management and supervision of the end-to-end life-cycle of ML workflows on distributed clusters. The case study on a MNIST project2 shows that different teams can collaborate using Scanflow within a ML project at different phases, and the effectiveness of agents to maintain the model accuracy and throughput of the model serving while running in production. ; This work was partially supported by Lenovo as part of LenovoBSC 2020 collaboration agreement, by the Spanish Government under contract PID2019-107255GB-C22, and by the Generalitat de Catalunya under contract 2017-SGR-1414 and under grant 2020 FI-B 00257. ; Postprint (published version)