UNI-MB - logo
UMNIK - logo
 
E-resources
Peer reviewed Open access
  • CARLA: A Convolution Accele...
    Ahmadi, Mehdi; Vakili, Shervin; Langlois, J. M. Pierre

    IEEE transactions on circuits and systems. I, Regular papers, 08/2021, Volume: 68, Issue: 8
    Journal Article

    Convolutional Neural Networks (CNNs) have proven to be extremely accurate for image recognition, even outperforming human recognition capability. When deployed on battery-powered mobile devices, efficient computer architectures are required to enable fast and energy-efficient computation of costly convolution operations. Despite recent advances in hardware accelerator design for CNNs, two major problems have not yet been addressed effectively, particularly when the convolution layers have highly diverse structures: (1) minimizing energy-hungry off-chip DRAM data movements; (2) maximizing the utilization factor of processing resources to perform convolutions. This work thus proposes an energy-efficient architecture equipped with several optimized dataflows to support the structural diversity of modern CNNs. The proposed approach is evaluated on convolutional layers of VGGNet-16 and ResNet-50. Results show that the architecture achieves a Processing Element (PE) utilization factor of 98% for the majority of <inline-formula> <tex-math notation="LaTeX">3\times 3 </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">1\times 1 </tex-math></inline-formula> convolutional layers, while limiting latency to 396.9 ms and 92.7 ms when performing convolutional layers of VGGNet-16 and ResNet-50, respectively. In addition, the proposed architecture benefits from the structured sparsity in ResNet-50 to reduce the latency to 42.5 ms when half of the channels are pruned.