Customizable Computing. Yu-Ting Chen

Читать онлайн.
Название Customizable Computing
Автор произведения Yu-Ting Chen
Жанр Программы
Серия Synthesis Lectures on Computer Architecture
Издательство Программы
Год выпуска 0
isbn 9781627059640



Скачать книгу

SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE

      Lecture #33

      Series Editor: Margaret Martonosi, Princeton University

      Series ISSN

      Print 1935-3235 Electronic 1935-3243

       Customizable Computing

      Yu-Ting Chen, Jason Cong, Michael Gill, Glenn Reinman, and Bingjun Xiao

      University of California, Los Angeles

       SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE #33

       ABSTRACT

      Since the end of Dennard scaling in the early 2000s, improving the energy efficiency of computation has been the main concern of the research community and industry. The large energy efficiency gap between general-purpose processors and application-specific integrated circuits (ASICs) motivates the exploration of customizable architectures, where one can adapt the architecture to the workload. In this Synthesis lecture, we present an overview and introduction of the recent developments on energy-efficient customizable architectures, including customizable cores and accelerators, on-chip memory customization, and interconnect optimization. In addition to a discussion of the general techniques and classification of different approaches used in each area, we also highlight and illustrate some of the most successful design examples in each category and discuss their impact on performance and energy efficiency. We hope that this work captures the state-of-the-art research and development on customizable architectures and serves as a useful reference basis for further research, design, and implementation for large-scale deployment in future computing systems.

       KEYWORDS

      accelerator architectures, memory architecture, multiprocessor interconnection, parallel architectures, reconfigurable architectures, memory, green computing

       Contents

       Acknowledgments

       1 Introduction

       2 Road Map

       2.1 Customizable System-On-Chip Design

       2.1.1 Compute Resources

       2.1.2 On-Chip Memory Hierarchy

       2.1.3 Network-On-Chip

       2.2 Software Layer

       3 Customization of Cores

       3.1 Introduction

       3.2 Dynamic Core Scaling and Defeaturing

       3.3 Core Fusion

       3.4 Customized Instruction Set Extensions

       3.4.1 Vector Instructions

       3.4.2 Custom Compute Engines

       3.4.3 Reconfigurable Instruction Sets

       3.4.4 Compiler Support for Custom Instructions

       4 Loosely Coupled Compute Engines

       4.1 Introduction

       4.2 Loosely Coupled Accelerators

       4.2.1 Wire-Speed Processor

       4.2.2 Comparing Hardware and Software LCA Management

       4.2.3 Utilizing LCAs

       4.3 Accelerators using Field Programmable Gate Arrays

       4.4 Coarse-Grain Reconfigurable Arrays

       4.4.1 Static Mapping

       4.4.2 Run-Time Mapping

       4.4.3 CHARM

       4.4.4 Using Composable Accelerators

       5 On-Chip Memory Customization

       5.1 Introduction

       5.1.1 Caches and Buffers (Scratchpads)

       5.1.2 On-Chip Memory System Customizations

       5.2 CPU Cache Customizations

       5.2.1 Coarse-Grain Customization Strategies

       5.2.2 Fine-Grain Customization Strategies

       5.3 Buffers for Accelerator-Rich Architectures

       5.3.1 Shared Buffer System Design for Accelerators

       5.3.2 Customization of Buffers Inside an Accelerator

       5.4 Providing Buffers in Caches for CPUs and Accelerators

       5.4.1 Providing Software-Managed Scratchpads for CPUs

       5.4.2 Providing Buffers for Accelerators

       5.5 Caches with Disparate Memory Technologies

       5.5.1 Coarse-Grain Customization Strategies

       5.5.2 Fine-Grain Customization Strategies

       6 Interconnect Customization

       6.1 Introduction

       6.2 Topology Customization

       6.2.1 Application-Specific Topology Synthesis

       6.2.2 Reconfigurable Shortcut Insertion

       6.2.3 Partial Crossbar Synthesis and Reconfiguration

       6.3 Routing Customization

       6.3.1 Application-Aware Deadlock-Free Routing

       6.3.2 Data Flow Synthesis

       6.4 Customization Enabled by New Device/Circuit Technologies

       6.4.1 Optical Interconnects

       6.4.2 Radio-Frequency Interconnects

       6.4.3 RRAM-Based Interconnects