| Abstract: | Computer vision has made massive progress in recent years, thanks to hardware and algorithms development. Most methods are performance-driven meanwhile have a lack of consideration for energy efficiency. This dissertation proposes computational efficiency boosting methods for three different vision tasks: ultra-high resolution images segmentation, optical characters recognition for Unmanned Aerial Vehicles (UAV) based videos, and multiple object detection for UAV based videos. The pattern distribution of ultra-high resolution images is usually unbalanced. While part of an image contains complex and fine-grained patterns such as boundaries, most areas are composed of simple and repeated patterns. In the first chapter, we propose to learn a skip map, which can guide a segmentation network to skip simple patterns and hence reduce computational complexity. Specifically, the skip map highlights simple-pattern areas that can be down-sampled for processing at a lower resolution, while the remaining complex part is still segmented at the original resolution. Applied on the state-of-the-art ultra-high resolution image segmentation network GLNet, our proposed skip map saves more than 30% computation while maintaining comparable segmentation performance. In the second chapter, we propose an end-to-end system for UAV videos OCR framework. We first revisit RCNN⁰́₉s crop & resize training strategy and empirically find that it outperforms aligned RoI sampling on a real-world video text dataset captured by UAV. We further propose a multi-stage image processor that takes videos⁰́₉ redundancy, continuity, and mixed degradation into account to reduce energy consumption. Lastly, the model is pruned and quantized before deployed on Raspberry Pi. Our proposed energy-efficient video text spotting solution, dubbed as E℗øVTS, outperforms all previous methods by achieving a competitive tradeoff between energy efficiency and performance. In the last chapter, we propose an energy-efficient video multiple objects detection solution. Besides designing a fast multiple object detector, we propose a data synthesis and a knowledge transfer-based annotation method to overcome class imbalance and domain gap issues. This solution was implemented on LPCVC 2021 UVA challenge and judged to be the first-place winner. The electronic version of this dissertation is accessible from https://hdl.handle.net/1969.1/197206 |