Akademska digitalna zbirka SLovenije - logo
E-viri
Celotno besedilo
Recenzirano
  • SHREG: Mitigating register ...
    Jin, Seunghyun; Lee, Hyunwuk; Lee, Jonghyun; Kim, Junsung; Ro, Won Woo

    Journal of systems architecture, July 2024, 2024-07-00, Letnik: 152
    Journal Article

    Graphics Processing Units (GPUs) have become dominant accelerators for Machine Learning (ML) and High-Performance Computing (HPC) applications due to their massive parallelism capabilities, through the utilization of general matrix-to-matrix multiplication (GEMM) kernels. However, GEMM kernels often suffer from duplicated memory requests, mainly caused by matrix tiling used for handling large matrices. While GPUs have adopted programmable shared memory to mitigate this issue by preserving frequently reused data in shared memory, GEMM still introduces duplication in register files. Our observations show that the matrix tiling issues memory requests to the same shared memory address for neighboring threads, and this results in a substantial increase in the number of duplicated data in the register files. Such duplication degrades GPU performance by limiting warp-level parallelism due to the register shortage and redundant memory requests to shared memory. We find that the data duplication can be categorized into two types that occur with fixed patterns during the matrix tiling. Based on these observations, we introduce SHREG, an architecture design that enables different threads to share registers for overlapped data from shared memory, effectively reducing duplicated data within the register files. By leveraging the duplication patterns, SHREG utilizes register sharing and improves performance with minimal hardware overhead. Our evaluation shows that SHREG improves performance by 31.4% on various ML applications over the baseline GPU.