A comparison in cluster validation techniques

González-Toledo, Marggie D.

Publication:

A comparison in cluster validation techniques

dc.contributor.advisor	Acuña-Fernández, Edgar
dc.contributor.author	González-Toledo, Marggie D.
dc.contributor.college	College of Arts and Sciences - Sciences	en_US
dc.contributor.committee	Lorenzo, Edgardo
dc.contributor.committee	Saito, Tokuji
dc.contributor.department	Department of Mathematics	en_US
dc.contributor.representative	Ortiz, Jorge L.
dc.date.accessioned	2019-04-15T15:50:44Z
dc.date.available	2019-04-15T15:50:44Z
dc.date.issued	2004
dc.description.abstract	Clustering may be defined as a process that aims to find partitions of similar objects. It is an unsupervised recognition procedure since there are no predefined classes that indicate grouping properties in the data set. Researchers have extensively studied clustering since it arise in many application domains in engineering, social science, and biology. The basic problem in clustering is to decide the optimal number of clusters, or partitions, that fits a data set. Sometimes the clusters obtained after we applying some clustering algorithms does not represent the structure that the data set really has. For this reason we need quantitative measures to evaluate the results of a clustering algorithm. This task is named Cluster Validity. This thesis includes a description about the clustering algorithms, and its validation techniques. Our main goal is to identify which cluster validation techniques is most efficient in order to divide a given data set. In this research it was done applying seven cluster validation techniques along with three clustering algorithms on ten different data sets. The results were obtained using the R programming language and environment for statistical computing. This software can be download from the page http://www.r-project.org/ [1].	en_US
dc.description.abstract	Análisis de Conglomerados puede definirse como el proceso que intenta encontrar particiones de objectos similares. Es un procedimiento de reconocimiento no supervisado porque no hay clases predefinidas que indiquen propiedades de agrupamiento en la base de datos. Decidir el número de particiones en los que se debe dividir un conjunto de datos es un problema que hay que enfrentar cuando se trabaja con análisis de conglomerados. En algunas ocasiones los grupos obtenidos después de aplicar algún algorítmo de conglomerados, no representan la estructura real que la base de datos posee. Por esta razón se necesitan medidas cuantitativas para evaluar el resultado del algoritmo de conglomerados. Esta tarea es llamada Validación de Conglomerados. Esta tesis incluye una descripción de los algorítmos de conglomerados, así como de las técnicas de validación. Nuestra meta principal es identificar que técnica de validación de conglomerados es más efectiva cuando se trata de identificar si un conjunto de datos está bien dividido. En esta investigación se aplicaron siete técnicas de validación junto con tres algorítmos de conglomerados en diez bases de datos diferentes. Los resultados fueron obtenidos usando el lenguaje de programación y ambiente para computación estadística R que puede obtenerse accesando la página electrónica http://www.r-project.org/ [1].	en_US
dc.description.graduationYear	2004	en_US
dc.identifier.uri	https://hdl.handle.net/20.500.11801/2007
dc.language.iso	English	en_US
dc.rights.holder	(c) 2004 Marggie D. González-Toledo	en_US
dc.rights.license	All rights reserved	en_US
dc.subject	Cluster validation techniques	en_US
dc.title	A comparison in cluster validation techniques	en_US
dc.type	Thesis	en_US
dspace.entity.type	Publication
thesis.degree.discipline	Mathematical Statistics	en_US
thesis.degree.level	M.S.	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: MATE_GonzalezToledoM_2005.pdf
Size:: 510.54 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Theses & Dissertations

Publication: A comparison in cluster validation techniques

Files

Original bundle

Collections

Publication:

A comparison in cluster validation techniques