Education

PhD

Abstract: Motivated by issues raised by the analysis of gene expressions data, this thesis focuses on the impact of dependence on the properties of multiple testing procedures for high-dimensional data. We propose a methodology based on a Factor Analysis model for the correlation structure. Model parameters are estimated thanks to an EM algorithm and an ad hoc methodology is defined to determine the model that fits best the covariance structure.
Moreover, the factor structure provides a general framework to deal with dependence in multiple testing. Two main issues are more particularly considered: the estimation of π0, the proportion of true null hypotheses, and the control of error rates. The proposed framework leads to less variability in the estimations of both π0 and the number of false-positives. Consequently, it shows large improvements of power and stability of simultaneous inference with respect to existing multiple testing procedures.
These results are illustrated by real data from microarray experiments and the proposed methodology is implemented in a R package called FAMT.

Key words : Multiple testing, Dependence, Factor Analysis, Proportion of null hypotheses, FDR, FAMT (R package)



Jury: