Efficient Inference using Deep Convolutional Neural Networks on Resource-Constrained Platforms
نام عام مواد
[Thesis]
نام نخستين پديدآور
Motamedi, Mohammad
نام ساير پديدآوران
Ghiasi, Soheil
وضعیت نشر و پخش و غیره
نام ناشر، پخش کننده و غيره
University of California, Davis
تاریخ نشرو بخش و غیره
2019
مشخصات ظاهری
نام خاص و کميت اثر
123
یادداشتهای مربوط به پایان نامه ها
جزئيات پايان نامه و نوع درجه آن
Ph.D.
کسي که مدرک را اعطا کرده
University of California, Davis
امتياز متن
2019
یادداشتهای مربوط به خلاصه یا چکیده
متن يادداشت
Deep Convolutional Neural Networks (CNNs) exhibit remarkable performance in many pattern recognition, segmentation, classification, and comprehension tasks that were widely considered open problems for most of the computing history. For example, CNNs are shown to outperform humans in certain visual object recognition tasks. Given the significant potential of CNNs in advancing autonomy and intelligence in systems, the Internet of Things (IoT) research community has witnessed a surge in demand for CNN-enabled data processing, technically referred to as inference, for critical tasks, such as visual, voice and language comprehension. Inference using modern CNNs involves billions of operations on millions of parameters, and thus their deployment requires significant compute, storage, and energy resources. However, such resources are scarce in many resource-constrained IoT applications. Designing an efficient CNN architecture is the first step in alleviating this problem. Use of asymmetric kernels, breadth control techniques, and reduce-expand structures are among the most important approaches that can effectively decrease CNNs parameter budget and their computational intensity. The architectural efficiency can be further improved by eliminating ineffective neurons using pruning algorithms, and quantizing the parameters to decrease the model size. Hardware-driven optimization is the subsequent step in addressing the computational demands of deep neural networks. Mobile System on Chips (SoCs), which usually include a mobile GPU, a DSP, and a number of CPU cores, are great candidates for CNN inference on embedded platforms. Depending on the application, it is also possible to develop customized FPGA-based and ASIC-based accelerators. ASIC-based acceleration drastically outperforms other approaches in terms of both power consumption and execution time. However, using this approach is reasonable only if designing a new chip is economically justifiable for the target application. This dissertation aims to bridge the gap between computational demands of CNNs and computational capabilities of embedded platforms. We contend that one has to strike a judicious balance between functional requirements of a CNN, and its resource requirements, for an IoT application to be able to utilize the CNN. We investigate several concrete formulations of this broad concept, and propose effective approaches for addressing the identified challenges. First, we target platforms that are equipped with reconfigurable fabric, such as Field Programmable Gate Arrays (FPGA), and offer a framework for generation of optimized FPGA-based CNN accelerators. Our solution leverages an analytical approach to characterization and exploration of the accelerator design space through which, it synthesizes an efficient accelerator for a given CNN on a specific FPGA. Second, we investigate the problem of CNN inference on mobile SoCs, propose effective approaches for CNN parallelization targeting such platforms, and explore the underlying tradeoffs. Finally, in the last part of this dissertation, we investigate utilization of an existing optimized CNN model to automatically generate a competitive CNN for an IoT application whose objects of interest are a fraction of categories that the original CNN was designed to classify, such that the resource requirement of inference using the synthesized CNN is proportionally scaled down. We use the term resource scalability to refer to this concept and propose solutions for automated synthesis of context-aware, resource-scalable CNNs that meet the functional requirements of the target IoT application at fraction of the resource requirements of the original CNN.
موضوع (اسم عام یاعبارت اسمی عام)
موضوع مستند نشده
Computer engineering
موضوع مستند نشده
Electrical engineering
نام شخص به منزله سر شناسه - (مسئولیت معنوی درجه اول )