University of Twente Student Theses
Applying compress techniques to lower latency on real time object detection application in Android device.
Tran, Duc Duc (2023) Applying compress techniques to lower latency on real time object detection application in Android device.
PDF
3MB |
Abstract: | Deep neural networks have achieved promising results in object detection tasks. However, state-of-the-art networks are computationally expensive due to thousands of parameters, making them not efficient to deploy on hardware constrained systems such as mobile phones or edge devices. To this end, model compression approaches like pruning and quantization have shown promising improvements to reduce models’ complexity with low-performance costs. This work will address the possibilities to apply those model compression techniques to object detection models, enabling the models to work on edge devices. In this work, we first explore the state-of-the-art object detection model MobileNetv2-SSD, then use low-magnitude pruning to remove the redundant parameters in the model. We will further convert this model to TensorFlow Lite format with post-training quantization and deploy it to Android devices to evaluate the latency and accuracy of the new model on object detection tasks. The final model runs on Google Glass Enterprise Edition 2 in 10+ FPS with 72% of parameters pruned and the model is integer quantized without significant loss in accuracy. |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science BSc (56964) |
Link to this item: | https://purl.utwente.nl/essays/95943 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page