Publication:
Comparación de algoritmos para clustering de "streams" de series de tiempo

dc.contributor.advisor Acuña-Fernández, Edgar
dc.contributor.author Aparicio-Carrasco, Ana M.
dc.contributor.college College of Arts and Sciences - Sciences en_US
dc.contributor.committee Urintsev, Alexander
dc.contributor.committee González, Ana Carmen
dc.contributor.department Department of Mathematics en_US
dc.contributor.representative Valentín Rullán, Ricky
dc.date.accessioned 2018-09-14T19:48:09Z
dc.date.available 2018-09-14T19:48:09Z
dc.date.issued 2012-05
dc.description.abstract In recent years, technological advances have resulted in a huge increment in data production as in the evolution of methods that facilitated its collection. The data that arrive continuously and massively with infinite tendency are known as data streams. The source of these data is, for instance, sensors, bank personal transactions and automated measuring tools among others. The algorithms for processing this kind of data must provide rapid and real time responses, which implies that they must maintain a decision model all the time. The clustering of data streams by variables finds groups of variables (data streams) with similar behavior over time. In this work we compare two different approaches of algorithms for clustering of data streams by variables: ODAC, a divisive hierarchical algorithm and CORREL that operates over the Sliding Windows model and performs clustering by partitioning. Based on the experimental it is study concluded that ODAC outperforms CORREL because of its performance and independence from the distribution of data streams. However, it required a big amount of data points (“examples”) to discover the inherent clustering structure.
dc.description.abstract En años recientes, los avances tecnológicos han dado lugar a un enorme incremento en la producción de datos y han facilitado su recolección. Los datos que llegan continuamente, de forma masiva y con tendencia infinita se conocen como data streams. Sus fuentes son por ejemplo sensores, transacciones bancarias y mediciones automatizadas. Los algoritmos destinados a procesar estos datos deben brindar respuestas rápidas y en tiempo real, lo que implica mantener un modelo de decisión en todo momento. El clustering de data streams por variables busca encontrar grupos de data streams (variables) con un comportamiento similar a lo largo del tiempo. En el presente trabajo se comparan dos enfoques de algoritmos de clustering de data streams por variables: ODAC, un algoritmo jerárquico divisivo y CORREL que utiliza el modelo Sliding Windows y realiza el clustering por particionamiento. Basado en el estudio experimental se concluye que ODAC supera a CORREL, debido a su rendimiento e independencia de la distribución de las variables. Sin embargo, este requiere que el conjunto de datos posea una una gran cantidad de observaciones (ejemplos) para descubrir la estructura de clusters subyacente.
dc.description.graduationSemester Spring en_US
dc.description.graduationYear 2012 en_US
dc.identifier.uri https://hdl.handle.net/20.500.11801/894
dc.language.iso es en_US
dc.rights.holder (c)2012 Ana M. Aparicio Carrasco en_US
dc.rights.license All rights reserved en_US
dc.subject Algorithms en_US
dc.subject Clustering en_US
dc.subject Variables en_US
dc.subject Streams en_US
dc.subject.lcsh Algorithms en_US
dc.subject.lcsh Variables (Mathematics) en_US
dc.subject.lcsh Data mining en_US
dc.subject.lcsh Cluster analysis -- Data processing en_US
dc.title Comparación de algoritmos para clustering de "streams" de series de tiempo en_US
dc.title.alternative Comparison of algorithms for clustering time series data streams en_US
dc.type Thesis en_US
dspace.entity.type Publication
thesis.degree.discipline Computer Science en_US
thesis.degree.level M.S. en_US
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
MATE_AparicioCarrascoAM_2012.pdf
Size:
932.9 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.64 KB
Format:
Item-specific license agreed upon to submission
Description: