ChatGPT : principles and architecture /

"ChatGPT: Principles and Architecture bridges the knowledge gap between theoretical AI concepts and their practical applications. It equips industry professionals and researchers with a deeper understanding of large language models, enabling them to effectively leverage these technologies in th...

Full description

Bibliographic Details
Main Author: Cheng, Ge (Author)
Corporate Author: ScienceDirect (Online service)
Format: eBook
Language:English
Published: Amsterdam, Netherlands : Elsevier Inc., [2025]
Edition:First edition.
Subjects:
Online Access:Connect to the full text of this electronic book
Table of Contents:
  • Front Cover
  • ChatGPT
  • Copyright Page
  • Contents
  • Preface
  • Main Content of the Book
  • Target Audience for This Book
  • Contact the Author
  • Acknowledgments
  • 1 A new milestone in artificial intelligence-ChatGPT
  • 1.1 The development history of ChatGPT
  • 1.2 The capability level of ChatGPT
  • 1.3 The technical evolution of large language models
  • 1.3.1 Symbolism versus connectionism
  • 1.3.2 Transformer
  • 1.3.3 Unsupervised pretraining
  • 1.3.4 Supervised fine-tuning
  • 1.3.5 Human feedback reinforcement learning
  • 1.4 The technology stack of large language model
  • 1.5 The impact of large language models
  • 1.6 The challenges of training or deploying large models
  • 1.6.1 Computational power
  • 1.6.2 Data
  • 1.6.3 Engineering
  • 1.7 The limitations of large language models
  • 1.8 Summary
  • 2 In-depth understanding of the transformer model
  • 2.1 Introduction to the transformer model
  • 2.2 Self-attention mechanism
  • 2.2.1 The calculation process of self-attention
  • 2.2.2 The essence of the self-attention mechanism
  • 2.2.3 The advantages and limitations of the self-attention mechanism
  • 2.3 Multihead attention mechanism
  • 2.3.1 Implementation of multihead attention
  • 2.3.2 The role of multihead attention
  • 2.3.3 Optimization of multihead attention
  • 2.4 Feedforward neural network
  • 2.5 Residual connection
  • 2.6 Layer normalization
  • 2.7 Position encoding
  • 2.7.1 Design and implementation of positional encoding
  • 2.7.2 Variants of positional encoding
  • 2.7.3 The advantages and limitations of positional encoding
  • 2.8 Training and optimization
  • 2.8.1 Loss function
  • 2.8.2 Optimizer
  • 2.8.3 Learning rate adjustment strategy
  • 2.8.4 Regularization
  • 2.8.5 Other training and optimization techniques
  • 2.9 Summary
  • 3 Generative pretraining
  • 3.1 Introduction to generative pretraining.
  • 3.2 Generative pretraining model
  • 3.3 The generative pretraining process
  • 3.3.1 The objectives of generative pretraining
  • 3.3.2 The error backpropagation process in generative pretraining
  • 3.4 Supervised fine-tuning
  • 3.4.1 The principles of supervised fine-tuning
  • 3.4.2 Supervised fine-tuning for specific tasks
  • 3.4.3 Fine-tuning steps
  • 3.5 Summary
  • 4 Unsupervised multitask and zero-shot learning
  • 4.1 Encoder and decoder
  • 4.2 GPT-2
  • 4.2.1 Layer normalization
  • 4.2.2 Orthogonal initialization
  • 4.2.3 Reversible tokenization
  • 4.2.4 Learnable relative positional encoding
  • 4.3 Unsupervised multitask learning
  • 4.4 The relationship between multitask and zero-shot learning
  • 4.5 The autoregressive generation process of GPT-2
  • 4.5.1 Subword unit embeddings
  • 4.5.2 Autoregressive process
  • 4.6 Summary
  • 5 Sparse attention and content-based learning
  • 5.1 GPT-3
  • 5.2 The sparse transformer
  • 5.2.1 Characteristics of the sparse transformer
  • 5.2.1.1 Sparse attention patterns
  • 5.2.1.2 Alternating dense and sparse attention patterns
  • 5.2.1.3 Learnable relative positional encodings
  • 5.2.2 Local banded attention
  • 5.2.3 Cross-layer sparse connections
  • 5.3 Meta-learning and in-context learning
  • 5.3.1 Meta-learning
  • 5.3.2 In-context learning
  • 5.4 Bayesian inference of concept distributions
  • 5.4.1 Implicit fine-tuning
  • 5.4.2 Bayesian inference
  • 5.5 Thought chains
  • 5.6 Summary
  • 6 Pretraining strategies for large language models
  • 6.1 Pre-training datasets
  • 6.2 Processing of pretraining data
  • 6.3 Distributed training patterns
  • 6.3.1 Data parallelism
  • 6.3.2 Model parallelism
  • 6.4 Technical approaches to distributed training
  • 6.4.1 Pathways
  • 6.4.2 Megatron-LM
  • 6.4.3 ZeRO
  • 6.5 Examples of training strategies
  • 6.5.1 Training framework
  • 6.5.2 Parameter stability.
  • 6.5.3 Optimizing training settings
  • 6.5.4 BF16
  • 6.5.5 Other factors
  • 6.6 Summary
  • 7 Proximal policy optimization
  • 7.1 Traditional policy gradient methods
  • 7.1.1 The principles of policy gradient methods
  • 7.1.2 Importance sampling
  • 7.1.3 Advantage function
  • 7.2 Actor-Critic
  • 7.2.1 Algorithm steps
  • 7.2.2 Value function and policy update
  • 7.2.3 Issues and challenges
  • 7.3 Trust region policy optimization
  • 7.3.1 Optimization objectives
  • 7.3.2 Limitations
  • 7.4 Principles of the proximal policy optimization algorithm
  • 7.5 Summary
  • 8 Human feedback reinforcement learning
  • 8.1 Reinforcement learning in ChatGPT
  • 8.2 InstructGPT training dataset
  • 8.2.1 Sources of fine-tuning datasets
  • 8.2.2 Annotation standards
  • 8.2.3 Data analysis
  • 8.3 Training stages of human feedback reinforcement learning
  • 8.3.1 Supervised fine-tuning
  • 8.3.2 Reward modeling
  • 8.3.3 Reinforcement learning
  • 8.4 Reward modeling algorithms
  • 8.4.1 Reward scores
  • 8.4.2 Loss function
  • 8.5 PPO in InstructGPT
  • 8.6 Multiturn dialogue capability
  • 8.7 The necessity of human feedback reinforcement learning
  • 8.8 Summary
  • 9 Low-resource domain transfer of large language models
  • 9.1 Self-instruct
  • 9.1.1 Instruction generation
  • 9.1.2 Task classification identification
  • 9.1.3 Instance generation
  • 9.1.4 Filtering
  • 9.2 Constitutional artificial intelligence
  • 9.3 Low-rank adaptation
  • 9.3.1 Model training and deployment
  • 9.3.2 Choice of rank
  • 9.4 Quantization
  • 9.5 SparseGPT
  • 9.6 Case studies
  • 9.6.1 Base model
  • 9.6.2 Instruction-following model
  • 9.6.3 Medical field
  • 9.6.4 Judicial field
  • 9.7 Summary
  • 10 Middleware
  • 10.1 LangChain
  • 10.2 AutoGPT
  • 10.3 Competitors in middleware frameworks
  • 10.4 Summary
  • 11 The future path of large language models
  • 11.1 The path to strong artificial intelligence.
  • 11.2 Data resource depletion
  • 11.3 Limitations of autoregressive models
  • 11.4 Embodied intelligence
  • 11.4.1 Challenges of embodied intelligence
  • 11.4.2 PaLM-E
  • 11.4.3 ChatGPT for robotics
  • 11.5 Summary
  • Index
  • Back Cover.