Parallel Evaluation of Large Scale Hierarchical Clustering Results
Abstract
Abstract ⎯ Data clustering refers to the automatic grouping of object based on their similarity, i.e., similar objects should be in the same group and dissimilar objects should be in different groups. In particular, for hierarchical clustering algorithms there is also the notion of a hierarchy in which the objects and the cluster fit. Clustering is a fundamental task in data mining, machine learning, information retrieval, bioinformatics, and image analysis, among others. It is important to evaluate the result of clustering algorithms. However most evaluations approaches are geared towards nonhierarchical clustering approaches; this research explores how to use traditional validity measures to evaluate and assess hierarchical clustering results.
Key Terms ⎯ Clustering, Data Clustering, Hierarchical Clustering, and Validity Measures.