Table of Contents:
  • 1. Introduction
  • 1.1 CMOS scaling and the rise of specialization
  • 1.2 What will we build now?
  • 1.2.1 Performance, power, and area
  • 1.2.2 Flexibility
  • 1.3 The cost of specialization
  • 1.4 Good applications for acceleration
  • 2. Computations and compilers
  • 2.1 Direct specification
  • 2.2 Compilers
  • 2.3 High-level synthesis
  • 2.4 Domain-specific languages
  • 3. Image processing with stencil pipelines
  • 3.1 Image signal processors
  • 3.2 Example applications
  • 4. Darkroom: a stencil language for image processing
  • 4.1 Language description
  • 4.2 A simple pipeline in darkroom
  • 4.3 Optimal synthesis of line-buffered pipelines
  • 4.3.1 Generating line-buffered pipelines
  • 4.3.2 Shift operator
  • 4.3.3 Finding optimal shifts
  • 4.4 Implementation
  • 4.4.1 ASIC and FPGA synthesis
  • 4.4.2 CPU compilation
  • 4.5 Evaluation
  • 4.5.1 Scheduling for hardware synthesis
  • 4.5.2 Scheduling for general-purpose processors
  • 4.6 Summary
  • 5. Programming CPU/FPGA systems from Halide
  • 5.1 The Halide language
  • 5.2 Mapping Halide to hardware
  • 5.3 Compiler implementation
  • 5.3.1 Architecture parameter extraction
  • 5.3.2 IR transformation
  • 5.3.3 Loop perfection optimization
  • 5.3.4 Code generation
  • 5.4 Implementation and evaluation
  • 5.4.1 Programmability and efficiency
  • 5.4.2 Quality of hardware generation
  • 5.5 Conclusion
  • 6. Interfacing with specialized hardware
  • 6.1 Common interfaces
  • 6.2 The challenge of interfaces
  • 6.3 Solutions to the interface problem
  • 6.3.1 Compiler support
  • 6.3.2 Library interface
  • 6.3.3 API plus DSL
  • 6.4 Drivers for darkroom and halide on FPGA
  • 6.4.1 Memory and coherency
  • 6.4.2 Running the hardware
  • 6.4.3 Generating systems and drivers
  • 6.4.4 Generating the whole stack with Halide
  • 6.4.5 Heterogeneous system performance
  • 7. Conclusions and future directions
  • Bibliography
  • Authors' biographies.