Analysis of microarray gene expression data /
After genomic sequencing, microarray technology has emerged as a widely used platform for genomic studies in the life sciences. Microarray technology provides a systematic way to survey DNA and RNA variation. With the abundance of data produced from microarray studies, however, the ultimate impact o...
| Main Author: | |
|---|---|
| Corporate Author: | |
| Format: | eBook |
| Language: | English |
| Published: |
Boston :
Kluwer Academic,
[2004]
|
| Subjects: | |
| Online Access: | Connect to the full text of this electronic book Connect to the full text of this electronic book |
Table of Contents:
- Part I Genome Probing Using Microarrays
- 2. DNA, RNA, Proteins, and Gene Expression 7
- 2.1 The Molecules of Life 7
- 2.2 Genes 8
- 2.3 DNA 9
- 2.4 RNA 12
- 2.5 The Genetic Code 13
- 2.6 Proteins 14
- 2.7 Gene Expression and Microarrays 15
- 2.8 Complementary DNA (cDNA) 16
- 2.9 Nucleic Acid Hybridization 16
- 3. Microarray Technology 19
- 3.1 Transcriptional Profiling 20
- 3.1.1 Sequencing-based Transcriptional Profiling 20
- 3.1.2 Hybridization-based Transcriptional Profiling 22
- 3.2 Microarray Technological Platforms 23
- 3.3 Probe Selection and Synthesis 24
- 3.4 Array Manufacturing 30
- 3.5 Target Labeling 31
- 3.6 Hybridization 34
- 3.7 Scanning and Image Analysis 35
- 3.8 Microarray Data 36
- 3.8.1 Spotted Array Data 36
- 3.8.2 In-situ Oligonucleotide Array Data 37
- 3.9 So I Have My Microarray Data
- What's Next? 39
- 3.9.1 Confirming Microarray Results 39
- 3.9.2 Northern Blot Analysis 40
- 3.9.3 Reverse-transcription PCR and Quantitative Real-time RT-PCR 40
- 4. Inherent Variability in Array Data 45
- 4.1 Genetic Populations 45
- 4.2 Variability in Gene Expression Levels 47
- 4.2.1 Variability Due to Specimen Sampling 47
- 4.2.2 Variability Due to Cell Cycle Regulation 48
- 4.2.3 Experimental Variability 48
- 4.3 Test the Variability by Replication 50
- 4.3.1 Duplicated Spots 50
- 4.3.2 Multiple Arrays and Biological Replications 51
- 5. Background Noise 53
- 5.1 Pixel-by-pixel Analysis of Individual Spots 53
- 5.2 General Models for Background Noise 56
- 5.2.1 Additive Background Noise 57
- 5.2.2 Correction for Background Noise 58
- 5.2.3 Example: Replication Test Data Set 59
- 5.2.4 Noise Models for GeneChip Arrays 62
- 5.2.5 Elusive Nature of Background Noise 63
- 6. Transformation and Normalization 67
- 6.1 Data Transformations 67
- 6.1.1 Logarithmic Transformation 67
- 6.1.2 Square Root Transformation 68
- 6.1.3 Box-Cox Transformation Family 69
- 6.1.4 Affine Transformation 69
- 6.1.5 The Generalized-log Transformation 71
- 6.2 Data Normalization 72
- 6.2.1 Normalization Across G Genes 74
- 6.2.2 Example: Mouse Juvenile Cystic Kidney Data Set 75
- 6.2.3 Normalization Across G Genes and N Samples 77
- 6.2.4 Color Effects and MA Plots 78
- 6.2.5 Normalization Based on LOWESS Function 80
- 6.2.6 Normalization Based on Rank-invariant Genes 82
- 6.2.7 Normalization Based on a Sample Pool 82
- 6.2.8 Global Normalization Using ANOVA Models 82
- 6.2.9 Other Normalization Issues 83
- 7. Missing Values in Array Data 85
- 7.1 Missing Values in Array Data 85
- 7.1.1 Sources of Problem 85
- 7.2 Statistical Classification of Missing Data 86
- 7.3 Missing Values in Replicated Designs 88
- 7.4 Imputation of Missing Values 89
- 8. Saturated Intensity Readings 93
- 8.1 Saturated Intensity Readings 93
- 8.2 Multiple Power-levels for Spotted Arrays 93
- 8.2.1 Imputing Saturated Intensity Readings 95
- 8.3 High Intensities in Oligonucleotide Arrays 97
- Part II Statistical Models and Analysis
- 9. Experimental Design 103
- 9.1 Factors Involved in Experiments 103
- 9.2 Types of Design Structures 106
- 9.3 Common Practice in Microarray Studies 112
- 9.3.1 Reference Design 112
- 9.3.2 Time-course Experiment 114
- 9.3.3 Color Reversal 115
- 9.3.4 Loop Design 116
- 9.3.5 Example: Time-course Loop Design 117
- 10. ANOVA Models for Microarray Data 121
- 10.1 A Basic Log-linear Model 121
- 10.2 ANOVA With Multiple Factors 123
- 10.2.1 Main Effects 123
- 10.2.2 Interaction Effects 123
- 10.3 A Generic Fixed-Effects ANOVA Model 124
- 10.3.1 Estimation for Interaction Effects 126
- 10.4 Two-stage Estimation Procedures 126
- 10.5 Identifying Differentially Expressed Genes 130
- 10.5.1 Standard MSE-based Approach 130
- 10.5.2 Other Approaches 132
- 10.5.3 Modified MSE-based Approach 132
- 10.6 Mixed-effects Models 135
- 10.7 ANOVA for Split-plot Design 136
- 10.8 Log Intensity Versus Log Ratio 138
- 11. Multiple Testing in Microarray Studies 143
- 11.1 Hypothesis Testing for Any Individual Gene 143
- 11.2 Multiple Testing for the Entire Gene Set 144
- 11.2.1 Framework for Multiple Testing 144
- 11.2.2 Test Statistic for Each Gene 145
- 11.2.3 Two Error Control Criteria in Multiple Testing 146
- 11.2.4 Implementation Algorithms 147
- 11.2.5 Example of Multiple Testing Algorithms 152
- 12. Permutation Tests in Microarray Data 157
- 12.2 Permutation Tests in Microarray Studies 160
- 12.2.1 Exchangeability in Microarray Designs 160
- 12.2.2 Limitation of Having Few Permutations 162
- 12.2.3 Pooling Test Results Across Genes 162
- 12.3 Lipopolysaccharide-E. coli Data Set 163
- 12.3.1 Statistical Model 164
- 12.3.2 Permutation Testing and Results 166
- 13. Bayesian Methods for Microarray Data 171
- 13.1 Mixture Model for Gene Expression 171
- 13.1.1 Variations on the Mixture Model 173
- 13.1.2 Example of Gamma Models 175
- 13.2 Mixture Model for Differential Expression 176
- 13.2.1 Mixture Model for Color Ratio Data 176
- 13.2.2 Relation of Mixture Model to ANOVA Model 180
- 13.2.3 Bayes Interpretation of Mixture Model 182
- 13.3 Empirical Bayes Methods 183
- 13.3.1 Example of Empirical Bayes Fitting 184
- 13.4 Hierarchical Bayes Models 187
- 13.4.1 Example of Hierarchical Modeling 189
- 14. Power and Sample Size Considerations 193
- 14.1 Test Hypotheses in Microarray Studies 194
- 14.2 Distributions of Estimated Differential Expression 196
- 14.3 Summary Measures of Estimated Differential Expression 196
- 14.4 Multiple Testing Framework 197
- 14.5 Dependencies of Estimation Errors 199
- 14.6 Familywise Type I Error Control 200
- 14.6.1 Type I Error Control: the Sidak Approach 201
- 14.6.2 Type I Error Control: the Bonferroni Approach 203
- 14.7 Familywise Type II Error Control 204
- 14.7.1 Type II Error Control: the Sidak Approach 206
- 14.7.2 Type II Error Control: the Bonferroni Approach 206
- 14.8 Contrast of Planning and Implementation in Multiple Testing 207
- 14.9 Power Calculations for Different Summary Measures 208
- 14.9.1 Designs with Linear Summary Measure 208
- 14.9.2 Numerical Example for Linear Summary 210
- 14.9.3 Designs with Quadratic Summary Measure 211
- 14.9.4 Numerical Example for Quadratic Summary 213
- 14.10 A Bayesian Perspective on Power and Sample Size 214
- 14.10.1 Connection to Local Discovery Rates 215
- 14.10.2 Representative Local True Discovery Rate 215
- 14.10.3 Numerical Example for TDR and FDR 216
- 14.11 Applications to Standard Designs 216
- 14.11.1 Treatment-control Designs 217
- 14.11.2 Sample Size for a Treatment-control Design 218
- 14.11.3 Multiple-treatment Designs 221
- 14.11.4 Power Table for a Multiple-treatment Design 224
- 14.11.5 Time-course and Similar Multiple-treatment Designs 227
- 14.12 Relation Between Power, Replication and Design 228
- 14.12.1 Effects of Replication 228
- 14.12.2 Controlling Sources of Variability 229
- 14.13 Assessing Power from Microarray Pilot Studies 230
- 14.13.1 Example 1: Juvenile Cystic Kidney Disease 230
- 14.13.2 Example 2: Opioid Dependence 231
- Part III Unsupervised Exploratory Analysis
- 15. Cluster Analysis 237
- 15.1 Distance and Similarity Measures 238
- 15.2 Distance Measures 239
- 15.2.1 Properties of Distance Measures 239
- 15.2.2 Minkowski Distance Measures 240
- 15.2.3 Mahalanobis Distance 241
- 15.3 Similarity Measures 241
- 15.3.1 Inner Product 241
- 15.3.2 Pearson Correlation Coefficient 242
- 15.3.3 Spearman Rank Correlation Coefficient 243
- 15.4 Inter-cluster Distance 243
- 15.4.1 Mahalanobis Inter-cluster Distance 244
- 15.4.2 Neighbor-based Inter-cluster Distance 244
- 15.5 Hierarchical Clustering 244
- 15.5.1 Single Linkage Method 245
- 15.5.2 Complete Linkage Method 245
- 15.5.3 Average Linkage Clustering 245
- 15.5.4 Centroid Linkage Method 246
- 15.5.5 Median Linkage Clustering 246
- 15.5.6 Ward's Clustering Method 246
- -- 15.5.7 Applications 246
- 15.5.8 Comparisons of Clustering Algorithms 247
- 15.6 K-means Clustering 247
- 15.7 Bayesian Cluster Analysis 248
- 15.8 Two-way Clustering Methods 248
- 15.9 Reliability of Clustering Patterns for Microarray Data 249
- 16. Principal Components and Singular Value Decomposition 251
- 16.1 Principal Component Analysis 251
- 16.1.1 Applications of Dominant Principal Components 253
- 16.2 Singular-value Decomposition 254
- 16.3 Computational Procedures for SVD 255
- 16.4 Eigengenes and Eigenarrays 256
- 16.5 Fraction of Eigenexpression 256
- 16.6 Generalized Singular Value Decomposition 257
- 16.7 Robust Singular Value Decomposition 257
- 17. Self-Organizing Maps 261
- 17.1 The Basic Logic of a SOM 261
- 17.2 The SOM Updating Algorithm 265
- 17.3 Program GENECLUSTER 267
- 17.4 Supervised SOM 268
- 17.5 Applications 268
- 17.5.1 Using SOM to Cluster Genes 268
- 17.5.2 Using SOM to Cluster Tumors 269
- 17.5.3 Multiclass Cancer Diagnosis 270
- Part IV Supervised Learning Methods
- 18. Discrimination and Classification 277
- 18.1 Fisher's Linear Discriminant Analysis 278
- 18.2 Maximum Likelihood Discriminant Rules 279
- 18.3 Bayesian Classification 280
- 18.4 k-Nearest Neighbor Classifier 281
- 18.5 Neighborhood Analysis 282
- 18.6 A Gene-casting Weighted Voting Scheme 283
- 18.7 Example: Classification of Leukemia Samples 284.