Bayesian models for DNA microarray data analysis /

Bibliographic Details
Main Author:	Lee, Kyeong Eun, 1971-
Other Authors:	Mallick, Bani K. (Thesis advisor), Calvin, James A. (Thesis advisor)
Format:	Thesis eBook
Language:	English
Published:	[College Station, Tex.] : [Texas A&M University], [2005]
Subjects:	Major statistics. Cox's Proportional Hazard Model Probit Regression Model Weibull Regression Model Wavelet DNA microarray Mixture of Dirichlet Processes Curve Clustering Bayesian Variable Selection Survival Analysis
Online Access:	Link to OAK Trust copy

Description
Abstract:	Selection of significant genes via expression patterns is important in a microarray problem. Owing to small sample size and large number of variables (genes), the selection process can be unstable. This research proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables in a regression setting and use a Bayesian mixture prior to perform the variable selection. Due to the binary nature of the data, the posterior distributions of the parameters are not in explicit form, and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the posterior distributions. The Bayesian model is flexible enough to identify the significant genes as well as to perform future predictions. The method is applied to cancer classification via cDNA microarrays. In particular, the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify the set of significant genes to classify BRCA1 and others. Microarray data can also be applied to survival models. We address the issue of how to reduce the dimension in building model by selecting significant genes as well as assessing the estimated survival curves. Additionally, we consider the well known Weibull regression and semiparametric proportional hazards (PH) models for survival analysis. With microarray data, we need to consider the case where the number of covariates p exceeds the number of samples n. Specfically, for a given vector of response values, which are times to event (death or censored times) and p gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the responsible genes, which are controlling the survival time. This approach enables us to estimate the survival curve when n << p. In our approach, rather than fixing the number of selected genes, we will assign a prior distribution to this number. The approach creates additional flexibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in effect works as a penalty. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology with (a) diffuse large B-cell lymphoma (DLBCL) complementary DNA (cDNA) data and (b) Breast Carcinoma data. Lastly, we propose a mixture of Dirichlet process models using discrete wavelet transform for a curve clustering. In order to characterize these time-course gene expresssions, we consider them as trajectory functions of time and gene-specific parameters and obtain their wavelet coefficients by a discrete wavelet transform. We then build cluster curves using a mixture of Dirichlet process priors.
Item Description:	"Major Subject: Statistics" Title from author supplied metadata (automated record created on Sep. 21, 2005.) Vita. Abstract. Electronic resource.
Format:	Mode of access: World Wide Web. System requirements: World Wide Web access and Adobe Acrobat Reader.
Bibliography:	Includes bibliographical references.