Abstract.
The rapid development of microarray technologies has raised many challenging problems in experiment design and data analysis. Although many numerical algorithms have been successfully applied to analyze gene expression data, the effects of variations and uncertainties in measured gene expression levels across samples and experiments have been largely ignored in the literature. In this article, in the context of hierarchical clustering algorithms, we introduce a statistical resampling method to assess the reliability of gene clusters identified from any hierarchical clustering method. Using the clustering trees constructed from the resampled data, we can evaluate the confidence value for each node in the observed clustering tree. A majority-rule consensus tree can be obtained, showing clusters that only occur in a majority of the resampled trees. We illustrate our proposed methods with applications to two published data sets. Although the methods are discussed in the context of hierarchical clustering methods, they can be applied with other cluster-identification methods for gene expression data to assess the reliability of any gene cluster of interest.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Author information
Authors and Affiliations
Additional information
Electronic Publication
Rights and permissions
About this article
Cite this article
Zhang, K., Zhao, H. Assessing reliability of gene clusters from gene expression data. Funct Integr Genomics 1, 156–173 (2000). https://doi.org/10.1007/s101420000019
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s101420000019