Analysis of microarray gene expression data /

After genomic sequencing, microarray technology has emerged as a widely used platform for genomic studies in the life sciences. Microarray technology provides a systematic way to survey DNA and RNA variation. With the abundance of data produced from microarray studies, however, the ultimate impact o...

Full description

Bibliographic Details
Main Author: Lee, Mei-Ling Ting
Corporate Author: SpringerLink (Online service)
Format: eBook
Language:English
Published: Boston : Kluwer Academic, [2004]
Subjects:
Online Access:Connect to the full text of this electronic book
Connect to the full text of this electronic book
Table of Contents:
  • Part I Genome Probing Using Microarrays
  • 2. DNA, RNA, Proteins, and Gene Expression 7
  • 2.1 The Molecules of Life 7
  • 2.2 Genes 8
  • 2.3 DNA 9
  • 2.4 RNA 12
  • 2.5 The Genetic Code 13
  • 2.6 Proteins 14
  • 2.7 Gene Expression and Microarrays 15
  • 2.8 Complementary DNA (cDNA) 16
  • 2.9 Nucleic Acid Hybridization 16
  • 3. Microarray Technology 19
  • 3.1 Transcriptional Profiling 20
  • 3.1.1 Sequencing-based Transcriptional Profiling 20
  • 3.1.2 Hybridization-based Transcriptional Profiling 22
  • 3.2 Microarray Technological Platforms 23
  • 3.3 Probe Selection and Synthesis 24
  • 3.4 Array Manufacturing 30
  • 3.5 Target Labeling 31
  • 3.6 Hybridization 34
  • 3.7 Scanning and Image Analysis 35
  • 3.8 Microarray Data 36
  • 3.8.1 Spotted Array Data 36
  • 3.8.2 In-situ Oligonucleotide Array Data 37
  • 3.9 So I Have My Microarray Data
  • What's Next? 39
  • 3.9.1 Confirming Microarray Results 39
  • 3.9.2 Northern Blot Analysis 40
  • 3.9.3 Reverse-transcription PCR and Quantitative Real-time RT-PCR 40
  • 4. Inherent Variability in Array Data 45
  • 4.1 Genetic Populations 45
  • 4.2 Variability in Gene Expression Levels 47
  • 4.2.1 Variability Due to Specimen Sampling 47
  • 4.2.2 Variability Due to Cell Cycle Regulation 48
  • 4.2.3 Experimental Variability 48
  • 4.3 Test the Variability by Replication 50
  • 4.3.1 Duplicated Spots 50
  • 4.3.2 Multiple Arrays and Biological Replications 51
  • 5. Background Noise 53
  • 5.1 Pixel-by-pixel Analysis of Individual Spots 53
  • 5.2 General Models for Background Noise 56
  • 5.2.1 Additive Background Noise 57
  • 5.2.2 Correction for Background Noise 58
  • 5.2.3 Example: Replication Test Data Set 59
  • 5.2.4 Noise Models for GeneChip Arrays 62
  • 5.2.5 Elusive Nature of Background Noise 63
  • 6. Transformation and Normalization 67
  • 6.1 Data Transformations 67
  • 6.1.1 Logarithmic Transformation 67
  • 6.1.2 Square Root Transformation 68
  • 6.1.3 Box-Cox Transformation Family 69
  • 6.1.4 Affine Transformation 69
  • 6.1.5 The Generalized-log Transformation 71
  • 6.2 Data Normalization 72
  • 6.2.1 Normalization Across G Genes 74
  • 6.2.2 Example: Mouse Juvenile Cystic Kidney Data Set 75
  • 6.2.3 Normalization Across G Genes and N Samples 77
  • 6.2.4 Color Effects and MA Plots 78
  • 6.2.5 Normalization Based on LOWESS Function 80
  • 6.2.6 Normalization Based on Rank-invariant Genes 82
  • 6.2.7 Normalization Based on a Sample Pool 82
  • 6.2.8 Global Normalization Using ANOVA Models 82
  • 6.2.9 Other Normalization Issues 83
  • 7. Missing Values in Array Data 85
  • 7.1 Missing Values in Array Data 85
  • 7.1.1 Sources of Problem 85
  • 7.2 Statistical Classification of Missing Data 86
  • 7.3 Missing Values in Replicated Designs 88
  • 7.4 Imputation of Missing Values 89
  • 8. Saturated Intensity Readings 93
  • 8.1 Saturated Intensity Readings 93
  • 8.2 Multiple Power-levels for Spotted Arrays 93
  • 8.2.1 Imputing Saturated Intensity Readings 95
  • 8.3 High Intensities in Oligonucleotide Arrays 97
  • Part II Statistical Models and Analysis
  • 9. Experimental Design 103
  • 9.1 Factors Involved in Experiments 103
  • 9.2 Types of Design Structures 106
  • 9.3 Common Practice in Microarray Studies 112
  • 9.3.1 Reference Design 112
  • 9.3.2 Time-course Experiment 114
  • 9.3.3 Color Reversal 115
  • 9.3.4 Loop Design 116
  • 9.3.5 Example: Time-course Loop Design 117
  • 10. ANOVA Models for Microarray Data 121
  • 10.1 A Basic Log-linear Model 121
  • 10.2 ANOVA With Multiple Factors 123
  • 10.2.1 Main Effects 123
  • 10.2.2 Interaction Effects 123
  • 10.3 A Generic Fixed-Effects ANOVA Model 124
  • 10.3.1 Estimation for Interaction Effects 126
  • 10.4 Two-stage Estimation Procedures 126
  • 10.5 Identifying Differentially Expressed Genes 130
  • 10.5.1 Standard MSE-based Approach 130
  • 10.5.2 Other Approaches 132
  • 10.5.3 Modified MSE-based Approach 132
  • 10.6 Mixed-effects Models 135
  • 10.7 ANOVA for Split-plot Design 136
  • 10.8 Log Intensity Versus Log Ratio 138
  • 11. Multiple Testing in Microarray Studies 143
  • 11.1 Hypothesis Testing for Any Individual Gene 143
  • 11.2 Multiple Testing for the Entire Gene Set 144
  • 11.2.1 Framework for Multiple Testing 144
  • 11.2.2 Test Statistic for Each Gene 145
  • 11.2.3 Two Error Control Criteria in Multiple Testing 146
  • 11.2.4 Implementation Algorithms 147
  • 11.2.5 Example of Multiple Testing Algorithms 152
  • 12. Permutation Tests in Microarray Data 157
  • 12.2 Permutation Tests in Microarray Studies 160
  • 12.2.1 Exchangeability in Microarray Designs 160
  • 12.2.2 Limitation of Having Few Permutations 162
  • 12.2.3 Pooling Test Results Across Genes 162
  • 12.3 Lipopolysaccharide-E. coli Data Set 163
  • 12.3.1 Statistical Model 164
  • 12.3.2 Permutation Testing and Results 166
  • 13. Bayesian Methods for Microarray Data 171
  • 13.1 Mixture Model for Gene Expression 171
  • 13.1.1 Variations on the Mixture Model 173
  • 13.1.2 Example of Gamma Models 175
  • 13.2 Mixture Model for Differential Expression 176
  • 13.2.1 Mixture Model for Color Ratio Data 176
  • 13.2.2 Relation of Mixture Model to ANOVA Model 180
  • 13.2.3 Bayes Interpretation of Mixture Model 182
  • 13.3 Empirical Bayes Methods 183
  • 13.3.1 Example of Empirical Bayes Fitting 184
  • 13.4 Hierarchical Bayes Models 187
  • 13.4.1 Example of Hierarchical Modeling 189
  • 14. Power and Sample Size Considerations 193
  • 14.1 Test Hypotheses in Microarray Studies 194
  • 14.2 Distributions of Estimated Differential Expression 196
  • 14.3 Summary Measures of Estimated Differential Expression 196
  • 14.4 Multiple Testing Framework 197
  • 14.5 Dependencies of Estimation Errors 199
  • 14.6 Familywise Type I Error Control 200
  • 14.6.1 Type I Error Control: the Sidak Approach 201
  • 14.6.2 Type I Error Control: the Bonferroni Approach 203
  • 14.7 Familywise Type II Error Control 204
  • 14.7.1 Type II Error Control: the Sidak Approach 206
  • 14.7.2 Type II Error Control: the Bonferroni Approach 206
  • 14.8 Contrast of Planning and Implementation in Multiple Testing 207
  • 14.9 Power Calculations for Different Summary Measures 208
  • 14.9.1 Designs with Linear Summary Measure 208
  • 14.9.2 Numerical Example for Linear Summary 210
  • 14.9.3 Designs with Quadratic Summary Measure 211
  • 14.9.4 Numerical Example for Quadratic Summary 213
  • 14.10 A Bayesian Perspective on Power and Sample Size 214
  • 14.10.1 Connection to Local Discovery Rates 215
  • 14.10.2 Representative Local True Discovery Rate 215
  • 14.10.3 Numerical Example for TDR and FDR 216
  • 14.11 Applications to Standard Designs 216
  • 14.11.1 Treatment-control Designs 217
  • 14.11.2 Sample Size for a Treatment-control Design 218
  • 14.11.3 Multiple-treatment Designs 221
  • 14.11.4 Power Table for a Multiple-treatment Design 224
  • 14.11.5 Time-course and Similar Multiple-treatment Designs 227
  • 14.12 Relation Between Power, Replication and Design 228
  • 14.12.1 Effects of Replication 228
  • 14.12.2 Controlling Sources of Variability 229
  • 14.13 Assessing Power from Microarray Pilot Studies 230
  • 14.13.1 Example 1: Juvenile Cystic Kidney Disease 230
  • 14.13.2 Example 2: Opioid Dependence 231
  • Part III Unsupervised Exploratory Analysis
  • 15. Cluster Analysis 237
  • 15.1 Distance and Similarity Measures 238
  • 15.2 Distance Measures 239
  • 15.2.1 Properties of Distance Measures 239
  • 15.2.2 Minkowski Distance Measures 240
  • 15.2.3 Mahalanobis Distance 241
  • 15.3 Similarity Measures 241
  • 15.3.1 Inner Product 241
  • 15.3.2 Pearson Correlation Coefficient 242
  • 15.3.3 Spearman Rank Correlation Coefficient 243
  • 15.4 Inter-cluster Distance 243
  • 15.4.1 Mahalanobis Inter-cluster Distance 244
  • 15.4.2 Neighbor-based Inter-cluster Distance 244
  • 15.5 Hierarchical Clustering 244
  • 15.5.1 Single Linkage Method 245
  • 15.5.2 Complete Linkage Method 245
  • 15.5.3 Average Linkage Clustering 245
  • 15.5.4 Centroid Linkage Method 246
  • 15.5.5 Median Linkage Clustering 246
  • 15.5.6 Ward's Clustering Method 246
  • -- 15.5.7 Applications 246
  • 15.5.8 Comparisons of Clustering Algorithms 247
  • 15.6 K-means Clustering 247
  • 15.7 Bayesian Cluster Analysis 248
  • 15.8 Two-way Clustering Methods 248
  • 15.9 Reliability of Clustering Patterns for Microarray Data 249
  • 16. Principal Components and Singular Value Decomposition 251
  • 16.1 Principal Component Analysis 251
  • 16.1.1 Applications of Dominant Principal Components 253
  • 16.2 Singular-value Decomposition 254
  • 16.3 Computational Procedures for SVD 255
  • 16.4 Eigengenes and Eigenarrays 256
  • 16.5 Fraction of Eigenexpression 256
  • 16.6 Generalized Singular Value Decomposition 257
  • 16.7 Robust Singular Value Decomposition 257
  • 17. Self-Organizing Maps 261
  • 17.1 The Basic Logic of a SOM 261
  • 17.2 The SOM Updating Algorithm 265
  • 17.3 Program GENECLUSTER 267
  • 17.4 Supervised SOM 268
  • 17.5 Applications 268
  • 17.5.1 Using SOM to Cluster Genes 268
  • 17.5.2 Using SOM to Cluster Tumors 269
  • 17.5.3 Multiclass Cancer Diagnosis 270
  • Part IV Supervised Learning Methods
  • 18. Discrimination and Classification 277
  • 18.1 Fisher's Linear Discriminant Analysis 278
  • 18.2 Maximum Likelihood Discriminant Rules 279
  • 18.3 Bayesian Classification 280
  • 18.4 k-Nearest Neighbor Classifier 281
  • 18.5 Neighborhood Analysis 282
  • 18.6 A Gene-casting Weighted Voting Scheme 283
  • 18.7 Example: Classification of Leukemia Samples 284.