Analysis of microarray gene expression data /

After genomic sequencing, microarray technology has emerged as a widely used platform for genomic studies in the life sciences. Microarray technology provides a systematic way to survey DNA and RNA variation. With the abundance of data produced from microarray studies, however, the ultimate impact o...

Full description

Bibliographic Details
Main Author:	Lee, Mei-Ling Ting
Corporate Author:	SpringerLink (Online service)
Format:	eBook
Language:	English
Published:	Boston : Kluwer Academic, [2004]
Subjects:	DNA microarrays > Statistical methods. Gene expression > Statistical methods. Statistics, general. Cancer Research. Human Genetics. Mathematical Biology in General. Evolutionary Biology. Puces à ADN > Méthodes statistiques. Expression génique > Méthodes statistiques. SCIENCE > Life Sciences > Genetics & Genomics. Oligonucleotide Array Sequence Analysis > methods. Gene Expression. Oligonucleotide Array Sequence Analysis > statistics & numerical data. Electronic books.
Online Access:	Connect to the full text of this electronic book Connect to the full text of this electronic book

Table of Contents:

Part I Genome Probing Using Microarrays
2. DNA, RNA, Proteins, and Gene Expression 7
2.1 The Molecules of Life 7
2.2 Genes 8
2.3 DNA 9
2.4 RNA 12
2.5 The Genetic Code 13
2.6 Proteins 14
2.7 Gene Expression and Microarrays 15
2.8 Complementary DNA (cDNA) 16
2.9 Nucleic Acid Hybridization 16
3. Microarray Technology 19
3.1 Transcriptional Profiling 20
3.1.1 Sequencing-based Transcriptional Profiling 20
3.1.2 Hybridization-based Transcriptional Profiling 22
3.2 Microarray Technological Platforms 23
3.3 Probe Selection and Synthesis 24
3.4 Array Manufacturing 30
3.5 Target Labeling 31
3.6 Hybridization 34
3.7 Scanning and Image Analysis 35
3.8 Microarray Data 36
3.8.1 Spotted Array Data 36
3.8.2 In-situ Oligonucleotide Array Data 37
3.9 So I Have My Microarray Data
What's Next? 39
3.9.1 Confirming Microarray Results 39
3.9.2 Northern Blot Analysis 40
3.9.3 Reverse-transcription PCR and Quantitative Real-time RT-PCR 40
4. Inherent Variability in Array Data 45
4.1 Genetic Populations 45
4.2 Variability in Gene Expression Levels 47
4.2.1 Variability Due to Specimen Sampling 47
4.2.2 Variability Due to Cell Cycle Regulation 48
4.2.3 Experimental Variability 48
4.3 Test the Variability by Replication 50
4.3.1 Duplicated Spots 50
4.3.2 Multiple Arrays and Biological Replications 51
5. Background Noise 53
5.1 Pixel-by-pixel Analysis of Individual Spots 53
5.2 General Models for Background Noise 56
5.2.1 Additive Background Noise 57
5.2.2 Correction for Background Noise 58
5.2.3 Example: Replication Test Data Set 59
5.2.4 Noise Models for GeneChip Arrays 62
5.2.5 Elusive Nature of Background Noise 63
6. Transformation and Normalization 67
6.1 Data Transformations 67
6.1.1 Logarithmic Transformation 67
6.1.2 Square Root Transformation 68
6.1.3 Box-Cox Transformation Family 69
6.1.4 Affine Transformation 69
6.1.5 The Generalized-log Transformation 71
6.2 Data Normalization 72
6.2.1 Normalization Across G Genes 74
6.2.2 Example: Mouse Juvenile Cystic Kidney Data Set 75
6.2.3 Normalization Across G Genes and N Samples 77
6.2.4 Color Effects and MA Plots 78
6.2.5 Normalization Based on LOWESS Function 80
6.2.6 Normalization Based on Rank-invariant Genes 82
6.2.7 Normalization Based on a Sample Pool 82
6.2.8 Global Normalization Using ANOVA Models 82
6.2.9 Other Normalization Issues 83
7. Missing Values in Array Data 85
7.1 Missing Values in Array Data 85
7.1.1 Sources of Problem 85
7.2 Statistical Classification of Missing Data 86
7.3 Missing Values in Replicated Designs 88
7.4 Imputation of Missing Values 89
8. Saturated Intensity Readings 93
8.1 Saturated Intensity Readings 93
8.2 Multiple Power-levels for Spotted Arrays 93
8.2.1 Imputing Saturated Intensity Readings 95
8.3 High Intensities in Oligonucleotide Arrays 97
Part II Statistical Models and Analysis
9. Experimental Design 103
9.1 Factors Involved in Experiments 103
9.2 Types of Design Structures 106
9.3 Common Practice in Microarray Studies 112
9.3.1 Reference Design 112
9.3.2 Time-course Experiment 114
9.3.3 Color Reversal 115
9.3.4 Loop Design 116
9.3.5 Example: Time-course Loop Design 117
10. ANOVA Models for Microarray Data 121
10.1 A Basic Log-linear Model 121
10.2 ANOVA With Multiple Factors 123
10.2.1 Main Effects 123
10.2.2 Interaction Effects 123
10.3 A Generic Fixed-Effects ANOVA Model 124
10.3.1 Estimation for Interaction Effects 126
10.4 Two-stage Estimation Procedures 126
10.5 Identifying Differentially Expressed Genes 130
10.5.1 Standard MSE-based Approach 130
10.5.2 Other Approaches 132
10.5.3 Modified MSE-based Approach 132
10.6 Mixed-effects Models 135
10.7 ANOVA for Split-plot Design 136
10.8 Log Intensity Versus Log Ratio 138
11. Multiple Testing in Microarray Studies 143
11.1 Hypothesis Testing for Any Individual Gene 143
11.2 Multiple Testing for the Entire Gene Set 144
11.2.1 Framework for Multiple Testing 144
11.2.2 Test Statistic for Each Gene 145
11.2.3 Two Error Control Criteria in Multiple Testing 146
11.2.4 Implementation Algorithms 147
11.2.5 Example of Multiple Testing Algorithms 152
12. Permutation Tests in Microarray Data 157
12.2 Permutation Tests in Microarray Studies 160
12.2.1 Exchangeability in Microarray Designs 160
12.2.2 Limitation of Having Few Permutations 162
12.2.3 Pooling Test Results Across Genes 162
12.3 Lipopolysaccharide-E. coli Data Set 163
12.3.1 Statistical Model 164
12.3.2 Permutation Testing and Results 166
13. Bayesian Methods for Microarray Data 171
13.1 Mixture Model for Gene Expression 171
13.1.1 Variations on the Mixture Model 173
13.1.2 Example of Gamma Models 175
13.2 Mixture Model for Differential Expression 176
13.2.1 Mixture Model for Color Ratio Data 176
13.2.2 Relation of Mixture Model to ANOVA Model 180
13.2.3 Bayes Interpretation of Mixture Model 182
13.3 Empirical Bayes Methods 183
13.3.1 Example of Empirical Bayes Fitting 184
13.4 Hierarchical Bayes Models 187
13.4.1 Example of Hierarchical Modeling 189
14. Power and Sample Size Considerations 193
14.1 Test Hypotheses in Microarray Studies 194
14.2 Distributions of Estimated Differential Expression 196
14.3 Summary Measures of Estimated Differential Expression 196
14.4 Multiple Testing Framework 197
14.5 Dependencies of Estimation Errors 199
14.6 Familywise Type I Error Control 200
14.6.1 Type I Error Control: the Sidak Approach 201
14.6.2 Type I Error Control: the Bonferroni Approach 203
14.7 Familywise Type II Error Control 204
14.7.1 Type II Error Control: the Sidak Approach 206
14.7.2 Type II Error Control: the Bonferroni Approach 206
14.8 Contrast of Planning and Implementation in Multiple Testing 207
14.9 Power Calculations for Different Summary Measures 208
14.9.1 Designs with Linear Summary Measure 208
14.9.2 Numerical Example for Linear Summary 210
14.9.3 Designs with Quadratic Summary Measure 211
14.9.4 Numerical Example for Quadratic Summary 213
14.10 A Bayesian Perspective on Power and Sample Size 214
14.10.1 Connection to Local Discovery Rates 215
14.10.2 Representative Local True Discovery Rate 215
14.10.3 Numerical Example for TDR and FDR 216
14.11 Applications to Standard Designs 216
14.11.1 Treatment-control Designs 217
14.11.2 Sample Size for a Treatment-control Design 218
14.11.3 Multiple-treatment Designs 221
14.11.4 Power Table for a Multiple-treatment Design 224
14.11.5 Time-course and Similar Multiple-treatment Designs 227
14.12 Relation Between Power, Replication and Design 228
14.12.1 Effects of Replication 228
14.12.2 Controlling Sources of Variability 229
14.13 Assessing Power from Microarray Pilot Studies 230
14.13.1 Example 1: Juvenile Cystic Kidney Disease 230
14.13.2 Example 2: Opioid Dependence 231
Part III Unsupervised Exploratory Analysis
15. Cluster Analysis 237
15.1 Distance and Similarity Measures 238
15.2 Distance Measures 239
15.2.1 Properties of Distance Measures 239
15.2.2 Minkowski Distance Measures 240
15.2.3 Mahalanobis Distance 241
15.3 Similarity Measures 241
15.3.1 Inner Product 241
15.3.2 Pearson Correlation Coefficient 242
15.3.3 Spearman Rank Correlation Coefficient 243
15.4 Inter-cluster Distance 243
15.4.1 Mahalanobis Inter-cluster Distance 244
15.4.2 Neighbor-based Inter-cluster Distance 244
15.5 Hierarchical Clustering 244
15.5.1 Single Linkage Method 245
15.5.2 Complete Linkage Method 245
15.5.3 Average Linkage Clustering 245
15.5.4 Centroid Linkage Method 246
15.5.5 Median Linkage Clustering 246
15.5.6 Ward's Clustering Method 246
-- 15.5.7 Applications 246
15.5.8 Comparisons of Clustering Algorithms 247
15.6 K-means Clustering 247
15.7 Bayesian Cluster Analysis 248
15.8 Two-way Clustering Methods 248
15.9 Reliability of Clustering Patterns for Microarray Data 249
16. Principal Components and Singular Value Decomposition 251
16.1 Principal Component Analysis 251
16.1.1 Applications of Dominant Principal Components 253
16.2 Singular-value Decomposition 254
16.3 Computational Procedures for SVD 255
16.4 Eigengenes and Eigenarrays 256
16.5 Fraction of Eigenexpression 256
16.6 Generalized Singular Value Decomposition 257
16.7 Robust Singular Value Decomposition 257
17. Self-Organizing Maps 261
17.1 The Basic Logic of a SOM 261
17.2 The SOM Updating Algorithm 265
17.3 Program GENECLUSTER 267
17.4 Supervised SOM 268
17.5 Applications 268
17.5.1 Using SOM to Cluster Genes 268
17.5.2 Using SOM to Cluster Tumors 269
17.5.3 Multiclass Cancer Diagnosis 270
Part IV Supervised Learning Methods
18. Discrimination and Classification 277
18.1 Fisher's Linear Discriminant Analysis 278
18.2 Maximum Likelihood Discriminant Rules 279
18.3 Bayesian Classification 280
18.4 k-Nearest Neighbor Classifier 281
18.5 Neighborhood Analysis 282
18.6 A Gene-casting Weighted Voting Scheme 283
18.7 Example: Classification of Leukemia Samples 284.