General-purpose graphics processor architectures /
| Main Authors: | , , |
|---|---|
| Corporate Author: | |
| Format: | eBook |
| Language: | English |
| Published: |
[San Rafael, California] :
Morgan & Claypool,
2018.
|
| Series: | Synthesis digital library of engineering and computer science.
Synthesis lectures in computer architecture ; # 44. |
| Subjects: | |
| Online Access: | Connect to the full text of this electronic book (PDF) |
Table of Contents:
- 1. Introduction
- 1.1 The landscape of computation accelerators
- 1.2 GPU hardware basics
- 1.3 A brief history of GPUs
- 1.4 Book outline
- 2. Programming model
- 2.1 Execution model
- 2.2 GPU instruction set architectures
- 2.2.1 NVIDIA GPU instruction set architectures
- 2.2.2 AMD graphics core next instruction set architecture
- 3. The SIMT core: instruction and register data flow
- 3.1 One-loop approximation
- 3.1.1 SIMT execution masking
- 3.1.2 SIMT deadlock and stackless SIMT architectures
- 3.1.3 Warp scheduling
- 3.2 Two-loop approximation
- 3.3 Three-loop approximation
- 3.3.1 Operand collector
- 3.3.2 Instruction replay: handling structural hazards
- 3.4 Research directions on branch divergence
- 3.4.1 Warp compaction
- 3.4.2 Intra-warp divergent path management
- 3.4.3 Adding MIMD capability
- 3.4.4 Complexity-effective divergence management
- 3.5 Research directions on scalarization and affine execution
- 3.5.1 Detection of uniform or affine variables
- 3.5.2 Exploiting uniform or affine variables in GPU
- 3.6 Research directions on register file architecture
- 3.6.1 Hierarchical register file
- 3.6.2 Drowsy state register file
- 3.6.3 Register file virtualization
- 3.6.4 Partitioned register file
- 3.6.5 RegLess
- 4. Memory system
- 4.1 First-level memory structures
- 4.1.1 Scratchpad memory and L1 data cache
- 4.1.2 L1 texture cache
- 4.1.3 Unified texture and data cache
- 4.2 On-chip interconnection network
- 4.3 Memory partition unit
- 4.3.1 L2 cache
- 4.3.2 Atomic operations
- 4.3.3 Memory access scheduler
- 4.4 Research directions for GPU memory systems
- 4.4.1 Memory access scheduling and interconnection network design
- 4.4.2 Caching effectiveness
- 4.4.3 Memory request prioritization and cache bypassing
- 4.4.4 Exploiting inter-warp heterogeneity
- 4.4.5 Coordinated cache bypassing
- 4.4.6 Adaptive cache management
- 4.4.7 Cache prioritization
- 4.4.8 Virtual memory page placement
- 4.4.9 Data placement
- 4.4.10 Multi-chip-module GPUs
- 5. Crosscutting research on GPU computing architectures
- 5.1 Thread scheduling
- 5.1.1 Research on assignment of threadblocks to cores
- 5.1.2 Research on cycle-by-cycle scheduling decisions
- 5.1.3 Research on scheduling multiple kernels
- 5.1.4 Fine-grain synchronization aware scheduling
- 5.2 Alternative ways of expressing parallelism
- 5.3 Support for transactional memory
- 5.3.1 Kilo TM
- 5.3.2 Warp TM and temporal conflict detection
- 5.4 Heterogeneous systems
- Bibliography
- Authors' biographies.