Efficient Processing of Deep Neural Networks. Vivienne Sze

Читать онлайн.
Название Efficient Processing of Deep Neural Networks
Автор произведения Vivienne Sze
Жанр Программы
Серия Synthesis Lectures on Computer Architecture
Издательство Программы
Год выпуска 0
isbn 9781681738338



Скачать книгу

partial sums after they have gone through a nonlinear function (i.e., the output activations).

       PART II

       Design of Hardware for Processing DNNs

      CHAPTER 3

       Key Metrics and Design Objectives

      Over the past few years, there has been a significant amount of research on efficient processing of DNNs. Accordingly, it is important to discuss the key metrics that one should consider when comparing and evaluating the strengths and weaknesses of different designs and proposed techniques and that should be incorporated into design considerations. While efficiency is often only associated with the number of operations per second per Watt (e.g., floating-point operations per second per Watt as FLOPS/W or tera-operations per second per Watt as TOPS/W), it is actually composed of many more metrics including accuracy, throughput, latency, energy consumption, power consumption, cost, flexibility, and scalability. Reporting a comprehensive set of these metrics is important in order to provide a complete picture of the trade-offs made by a proposed design or technique.

      In this chapter, we will

      • discuss the importance of each of these metrics;

      • breakdown the factors that affect each metric. When feasible, present equations that describe the relationship between the factors and the metrics;

      • describe how these metrics can be incorporated into design considerations for both the DNN hardware and the DNN model (i.e., workload); and

      • specify what should be reported for a given metric to enable proper evaluation.

      Finally, we will provide a case study on how one might bring all these metrics together for a holistic evaluation of a given approach. But first, we will discuss each of the metrics.

      Accuracy is used to indicate the quality of the result for a given task. The fact that DNNs can achieve state-of-the-art accuracy on a wide range of tasks is one of the key reasons driving the popularity and wide use of DNNs today. The units used to measure accuracy depend on the task. For instance, for image classification, accuracy is reported as the percentage of correctly classified images, while for object detection, accuracy is reported as the mean average precision (mAP), which is related to the trade off between the true positive rate and false positive rate.

      Achieving high accuracy on difficult tasks or datasets typically requires more complex DNN models (e.g., a larger number