General-purpose graphics processor architectures /

Bibliographic Details
Main Authors:	Aamodt, Tor M. (Author), Fung, Wilson Wai Lun (Author), Rogers, Timothy G. (Author)
Corporate Author:	Morgan & Claypool Publishers
Format:	eBook
Language:	English
Published:	[San Rafael, California] : Morgan & Claypool, 2018.
Series:	Synthesis digital library of engineering and computer science. Synthesis lectures in computer architecture ; # 44.
Subjects:	Computer architecture. Graphics processing units. Computer architecture GPGPU Electronic books.
Online Access:	Connect to the full text of this electronic book (PDF)

Table of Contents:

1. Introduction
1.1 The landscape of computation accelerators
1.2 GPU hardware basics
1.3 A brief history of GPUs
1.4 Book outline
2. Programming model
2.1 Execution model
2.2 GPU instruction set architectures
2.2.1 NVIDIA GPU instruction set architectures
2.2.2 AMD graphics core next instruction set architecture
3. The SIMT core: instruction and register data flow
3.1 One-loop approximation
3.1.1 SIMT execution masking
3.1.2 SIMT deadlock and stackless SIMT architectures
3.1.3 Warp scheduling
3.2 Two-loop approximation
3.3 Three-loop approximation
3.3.1 Operand collector
3.3.2 Instruction replay: handling structural hazards
3.4 Research directions on branch divergence
3.4.1 Warp compaction
3.4.2 Intra-warp divergent path management
3.4.3 Adding MIMD capability
3.4.4 Complexity-effective divergence management
3.5 Research directions on scalarization and affine execution
3.5.1 Detection of uniform or affine variables
3.5.2 Exploiting uniform or affine variables in GPU
3.6 Research directions on register file architecture
3.6.1 Hierarchical register file
3.6.2 Drowsy state register file
3.6.3 Register file virtualization
3.6.4 Partitioned register file
3.6.5 RegLess
4. Memory system
4.1 First-level memory structures
4.1.1 Scratchpad memory and L1 data cache
4.1.2 L1 texture cache
4.1.3 Unified texture and data cache
4.2 On-chip interconnection network
4.3 Memory partition unit
4.3.1 L2 cache
4.3.2 Atomic operations
4.3.3 Memory access scheduler
4.4 Research directions for GPU memory systems
4.4.1 Memory access scheduling and interconnection network design
4.4.2 Caching effectiveness
4.4.3 Memory request prioritization and cache bypassing
4.4.4 Exploiting inter-warp heterogeneity
4.4.5 Coordinated cache bypassing
4.4.6 Adaptive cache management
4.4.7 Cache prioritization
4.4.8 Virtual memory page placement
4.4.9 Data placement
4.4.10 Multi-chip-module GPUs
5. Crosscutting research on GPU computing architectures
5.1 Thread scheduling
5.1.1 Research on assignment of threadblocks to cores
5.1.2 Research on cycle-by-cycle scheduling decisions
5.1.3 Research on scheduling multiple kernels
5.1.4 Fine-grain synchronization aware scheduling
5.2 Alternative ways of expressing parallelism
5.3 Support for transactional memory
5.3.1 Kilo TM
5.3.2 Warp TM and temporal conflict detection
5.4 Heterogeneous systems
Bibliography
Authors' biographies.