A Transfer Learning Approach to Object Detection Acceleration for Embedded Applications

Vance, Lauren M.

A Transfer Learning Approach to Object Detection Acceleration for Embedded Applications

Files

A_Transfer_Learning_Approach_to_Object_Detection_Acceleration_for_Embedded_Applications.pdf (1.28 MB)

Date

2021-08

Authors

Vance, Lauren M.

Language

American English

Committee Chair

Christopher, Lauren

Committee Members

King, Brian
Rizkalla, Maher

Degree

M.S.E.C.E.

Degree Year

2021

Department

Electrical & Computer Engineering

Grantor

Purdue University

Abstract

Deep learning solutions to computer vision tasks have revolutionized many industries in recent years, but embedded systems have too many restrictions to take advantage of current state-of-the-art configurations. Typical embedded processor hardware configurations must meet very low power and memory constraints to maintain small and lightweight packaging, and the architectures of the current best deep learning models are too computationally-intensive for these hardware configurations. Current research shows that convolutional neural networks (CNNs) can be deployed with a few architectural modifications on Field-Programmable Gate Arrays (FPGAs) resulting in minimal loss of accuracy, similar or decreased processing speeds, and lower power consumption when compared to general-purpose Central Processing Units (CPUs) and Graphics Processing Units (GPUs). This research contributes further to these findings with the FPGA implementation of a YOLOv4 object detection model that was developed with the use of transfer learning. The transfer-learned model uses the weights of a model pre-trained on the MS-COCO dataset as a starting point then fine-tunes only the output layers for detection on more specific objects of five classes. The model architecture was then modified slightly for compatibility with the FPGA hardware using techniques such as weight quantization and replacing unsupported activation layer types. The model was deployed on three different hardware setups (CPU, GPU, FPGA) for inference on a test set of 100 images. It was found that the FPGA was able to achieve real-time inference speeds of 33.77 frames-per-second, a speedup of 7.74 frames-per-second when compared to GPU deployment. The model also consumed 96% less power than a GPU configuration with only approximately 4% average loss in accuracy across all 5 classes. The results are even more striking when compared to CPU deployment, with 131.7-times speedup in inference throughput. CPUs have long since been outperformed by GPUs for deep learning applications but are used in most embedded systems. These results further illustrate the advantages of FPGAs for deep learning inference on embedded systems even when transfer learning is used for an efficient end-to-end deployment process. This work advances current state-of-the-art with the implementation of a YOLOv4 object detection model developed with transfer learning for FPGA deployment.

Description

Indiana University-Purdue University Indianapolis (IUPUI)

Keywords

deep learning, computer vision, object detection, embedded systems, fpga, transfer learning

Rights

Attribution-NoDerivatives 4.0 International

Type

Thesis

Permanent Link

https://hdl.handle.net/1805/26436
http://dx.doi.org/10.7912/C2/62

Collections

Electrical & Computer Engineering Department Theses and Dissertations

Full item page