ChatGPT : principles and architecture /
"ChatGPT: Principles and Architecture bridges the knowledge gap between theoretical AI concepts and their practical applications. It equips industry professionals and researchers with a deeper understanding of large language models, enabling them to effectively leverage these technologies in th...
| Main Author: | |
|---|---|
| Corporate Author: | |
| Format: | eBook |
| Language: | English |
| Published: |
Amsterdam, Netherlands :
Elsevier Inc.,
[2025]
|
| Edition: | First edition. |
| Subjects: | |
| Online Access: | Connect to the full text of this electronic book |
Table of Contents:
- Front Cover
- ChatGPT
- Copyright Page
- Contents
- Preface
- Main Content of the Book
- Target Audience for This Book
- Contact the Author
- Acknowledgments
- 1 A new milestone in artificial intelligence-ChatGPT
- 1.1 The development history of ChatGPT
- 1.2 The capability level of ChatGPT
- 1.3 The technical evolution of large language models
- 1.3.1 Symbolism versus connectionism
- 1.3.2 Transformer
- 1.3.3 Unsupervised pretraining
- 1.3.4 Supervised fine-tuning
- 1.3.5 Human feedback reinforcement learning
- 1.4 The technology stack of large language model
- 1.5 The impact of large language models
- 1.6 The challenges of training or deploying large models
- 1.6.1 Computational power
- 1.6.2 Data
- 1.6.3 Engineering
- 1.7 The limitations of large language models
- 1.8 Summary
- 2 In-depth understanding of the transformer model
- 2.1 Introduction to the transformer model
- 2.2 Self-attention mechanism
- 2.2.1 The calculation process of self-attention
- 2.2.2 The essence of the self-attention mechanism
- 2.2.3 The advantages and limitations of the self-attention mechanism
- 2.3 Multihead attention mechanism
- 2.3.1 Implementation of multihead attention
- 2.3.2 The role of multihead attention
- 2.3.3 Optimization of multihead attention
- 2.4 Feedforward neural network
- 2.5 Residual connection
- 2.6 Layer normalization
- 2.7 Position encoding
- 2.7.1 Design and implementation of positional encoding
- 2.7.2 Variants of positional encoding
- 2.7.3 The advantages and limitations of positional encoding
- 2.8 Training and optimization
- 2.8.1 Loss function
- 2.8.2 Optimizer
- 2.8.3 Learning rate adjustment strategy
- 2.8.4 Regularization
- 2.8.5 Other training and optimization techniques
- 2.9 Summary
- 3 Generative pretraining
- 3.1 Introduction to generative pretraining.
- 3.2 Generative pretraining model
- 3.3 The generative pretraining process
- 3.3.1 The objectives of generative pretraining
- 3.3.2 The error backpropagation process in generative pretraining
- 3.4 Supervised fine-tuning
- 3.4.1 The principles of supervised fine-tuning
- 3.4.2 Supervised fine-tuning for specific tasks
- 3.4.3 Fine-tuning steps
- 3.5 Summary
- 4 Unsupervised multitask and zero-shot learning
- 4.1 Encoder and decoder
- 4.2 GPT-2
- 4.2.1 Layer normalization
- 4.2.2 Orthogonal initialization
- 4.2.3 Reversible tokenization
- 4.2.4 Learnable relative positional encoding
- 4.3 Unsupervised multitask learning
- 4.4 The relationship between multitask and zero-shot learning
- 4.5 The autoregressive generation process of GPT-2
- 4.5.1 Subword unit embeddings
- 4.5.2 Autoregressive process
- 4.6 Summary
- 5 Sparse attention and content-based learning
- 5.1 GPT-3
- 5.2 The sparse transformer
- 5.2.1 Characteristics of the sparse transformer
- 5.2.1.1 Sparse attention patterns
- 5.2.1.2 Alternating dense and sparse attention patterns
- 5.2.1.3 Learnable relative positional encodings
- 5.2.2 Local banded attention
- 5.2.3 Cross-layer sparse connections
- 5.3 Meta-learning and in-context learning
- 5.3.1 Meta-learning
- 5.3.2 In-context learning
- 5.4 Bayesian inference of concept distributions
- 5.4.1 Implicit fine-tuning
- 5.4.2 Bayesian inference
- 5.5 Thought chains
- 5.6 Summary
- 6 Pretraining strategies for large language models
- 6.1 Pre-training datasets
- 6.2 Processing of pretraining data
- 6.3 Distributed training patterns
- 6.3.1 Data parallelism
- 6.3.2 Model parallelism
- 6.4 Technical approaches to distributed training
- 6.4.1 Pathways
- 6.4.2 Megatron-LM
- 6.4.3 ZeRO
- 6.5 Examples of training strategies
- 6.5.1 Training framework
- 6.5.2 Parameter stability.
- 6.5.3 Optimizing training settings
- 6.5.4 BF16
- 6.5.5 Other factors
- 6.6 Summary
- 7 Proximal policy optimization
- 7.1 Traditional policy gradient methods
- 7.1.1 The principles of policy gradient methods
- 7.1.2 Importance sampling
- 7.1.3 Advantage function
- 7.2 Actor-Critic
- 7.2.1 Algorithm steps
- 7.2.2 Value function and policy update
- 7.2.3 Issues and challenges
- 7.3 Trust region policy optimization
- 7.3.1 Optimization objectives
- 7.3.2 Limitations
- 7.4 Principles of the proximal policy optimization algorithm
- 7.5 Summary
- 8 Human feedback reinforcement learning
- 8.1 Reinforcement learning in ChatGPT
- 8.2 InstructGPT training dataset
- 8.2.1 Sources of fine-tuning datasets
- 8.2.2 Annotation standards
- 8.2.3 Data analysis
- 8.3 Training stages of human feedback reinforcement learning
- 8.3.1 Supervised fine-tuning
- 8.3.2 Reward modeling
- 8.3.3 Reinforcement learning
- 8.4 Reward modeling algorithms
- 8.4.1 Reward scores
- 8.4.2 Loss function
- 8.5 PPO in InstructGPT
- 8.6 Multiturn dialogue capability
- 8.7 The necessity of human feedback reinforcement learning
- 8.8 Summary
- 9 Low-resource domain transfer of large language models
- 9.1 Self-instruct
- 9.1.1 Instruction generation
- 9.1.2 Task classification identification
- 9.1.3 Instance generation
- 9.1.4 Filtering
- 9.2 Constitutional artificial intelligence
- 9.3 Low-rank adaptation
- 9.3.1 Model training and deployment
- 9.3.2 Choice of rank
- 9.4 Quantization
- 9.5 SparseGPT
- 9.6 Case studies
- 9.6.1 Base model
- 9.6.2 Instruction-following model
- 9.6.3 Medical field
- 9.6.4 Judicial field
- 9.7 Summary
- 10 Middleware
- 10.1 LangChain
- 10.2 AutoGPT
- 10.3 Competitors in middleware frameworks
- 10.4 Summary
- 11 The future path of large language models
- 11.1 The path to strong artificial intelligence.
- 11.2 Data resource depletion
- 11.3 Limitations of autoregressive models
- 11.4 Embodied intelligence
- 11.4.1 Challenges of embodied intelligence
- 11.4.2 PaLM-E
- 11.4.3 ChatGPT for robotics
- 11.5 Summary
- Index
- Back Cover.