UNI-MB - logo
UMNIK - logo
 
E-viri
Celotno besedilo
  • A, Ullas; Nasre, Rupesh; Govindarajan, R

    2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC), 2023-Dec.-18
    Conference Proceeding

    Designing parallel graph algorithms on GPUs has been challenging. We observe three limitations with the existing work. First, algorithms often rely on only one of the strategies to propagate information: push or pull. We observe that neither is an optimal choice in many cases. Second, the cost of updating the underlying data structures per iteration is high. This results in a significant performance overhead. Third, considering the in-herent irregularity of graph processing, one-size-fits-all approach is too rigid for different types of graphs. In this work, we address these shortcomings by improving the processing of an existing graph framework, Subway. In particular, we propose a novel technique in terms of amalgamating the two propagation strategies (push and pull) into a hybrid traversal strategy. In this, the vertices of the graph propagate their information by pulling the information from the neighbours, performing a local computation, and subsequently pushing the result to all the neighbours, all within an iteration. We propose to reuse the SubCSR structure in Subway across a few iterations to significantly reduce the computational overhead, but without compromising the correctness or efficiency of the algorithm. Furthermore, we explore heuristics on when to use push, pull, or hybrid traversal strategies. We illustrate the effectiveness of our three-pronged approach by applying it to four popular graph algorithms: Connected Components (CC), Single-Source Shortest Path (SSSP), Breadth First Search (BFS) and Page Rank (PR) on an NVIDIA GeForce RTX 3060 GPU. Our extensive experimental evaluation on GeForce RTX 3060 GPU reveals that the proposed hybrid approach with adaptive heuristics and approximate subCSR computation is effective in reducing the execution time of CC, SSSP, and PR by 31 %, 7.56%, and 6.43% respectively, compared to the minimum of push or pull algorithm that uses subCSR structure.