Parallel Evaluation of Large Scale Hierarchical Clustering Results

Cruz Rodríguez, David

Öffnen

SP-12_Articulo Final_David Cruz.pdf (5.179Mb)

Datum

2012

Autor

Cruz Rodríguez, David

Metadata

Zur Langanzeige

Zusammenfassung

Abstract ⎯ Data clustering refers to the automatic grouping of object based on their similarity, i.e., similar objects should be in the same group and dissimilar objects should be in different groups. In particular, for hierarchical clustering algorithms there is also the notion of a hierarchy in which the objects and the cluster fit. Clustering is a fundamental task in data mining, machine learning, information retrieval, bioinformatics, and image analysis, among others. It is important to evaluate the result of clustering algorithms. However most evaluations approaches are geared towards nonhierarchical clustering approaches; this research explores how to use traditional validity measures to evaluate and assess hierarchical clustering results. Key Terms ⎯ Clustering, Data Clustering, Hierarchical Clustering, and Validity Measures.

URI

http://hdl.handle.net/20.500.12475/426

Collections

Computer Engineering