Author : Ajay Kumar Gautam 1
Date of Publication :5th June 2023
Abstract: In recent years, FPGA based convolutional neural networks (CNNs) accelerator have attracted a lot of attention towards it. This is primarily due to the fact that, in comparison to GPUs, they offer a greater level of energy efficiency. On the other hand, it can be challenging for solutions based on FPGAs to perform better their GPU replacements in terms of throughput. In this paper, we have proven that using FPGA based acceleration for a CNN that has been trained with binarized weights and the activations factor can be preferable in terms of throughput and energy efficiency. An efficient and totally mapped FPGA accelerator architecture with deep pipeline stages presented with layer normalization to operate on small batch size. In contrast to GPU acceleration, the performance of an FPGA accelerator is still not considerably affected by the size of the data batch being processed. On the other hand, GPU acceleration is considerably affected by the size of the data batch being processed. According to test results, the suggested BCNN architecture operating on a Virtex-7 FPGA processes individual requests in small batch sizes 8.3 times faster and 75 times greater efficiently than a Titan X GPU.
Reference :