Treffer: Batched sparse direct solver design and evaluation in SuperLU_DIST

Title:
Batched sparse direct solver design and evaluation in SuperLU_DIST
Source:
The International Journal of High Performance Computing Applications, vol 38, iss 6
Publisher Information:
eScholarship, University of California
Publication Year:
2024
Collection:
University of California: eScholarship
Subject Geographic:
Document Type:
Fachzeitschrift article in journal/newspaper
File Description:
application/pdf
Language:
unknown
DOI:
10.1177/10943420241268200
Rights:
CC-BY
Accession Number:
edsbas.887FB0FC
Database:
BASE

Weitere Informationen

Over the course of interactions with various application teams, the need for batched sparse linear algebra functions has emerged in order to make more efficient use of the GPUs for many small and sparse linear algebra problems. In this paper, we present our recent work on a batched sparse direct solver for GPUs. The sparse LU factorization is computed by the levels of the elimination tree, leveraging the batched dense operations at each level and a new batched Scatter GPU kernel. The sparse triangular solve is computed by the level sets of the directed acyclic graph (DAG) of the triangular matrix. Batched operations overcome the large overhead associated with launching many small kernels. For medium sized matrix batches with not-so-small bandwidth, using an NVIDIA A100 GPU, our new batched sparse direct solver is orders of magnitude faster than a batched banded solver and uses less than one-tenth of the memory.