Contributions to parallel and distributed computing in knowledge discovery and data mining

Lozano-Inca, Elio

Publication:

Contributions to parallel and distributed computing in knowledge discovery and data mining

dc.contributor.advisor	Acuña-Fernández, Edgar
dc.contributor.author	Lozano-Inca, Elio
dc.contributor.college	College of Engineering	en_US
dc.contributor.committee	Bollman, Dorothy
dc.contributor.committee	Acar, Robert
dc.contributor.committee	Vega, Jose Fernando
dc.contributor.department	Department of Electrical and Computer Engineering	en_US
dc.contributor.representative	Castellanos, Dorial
dc.date.accessioned	2019-02-12T15:30:47Z
dc.date.available	2019-02-12T15:30:47Z
dc.date.issued	2006
dc.description.abstract	Recently databases are increasing continuously without bound, due to new data acquisition technologies. One challenge is how to gain knowledge from these large data sets. In this thesis, we analyze and improve the algorithmic solution of four problems related to knowledge discovery and data mining, making use of parallel computing; we also compare our results with related works. We design two parallel algorithms for outlier detection; the first one is for finding distance-based outliers based on nested loops along with randomization and the use of a pruning rule. The second parallel algorithm is for detecting density-based local outliers. In both cases data parallelism is used. The star coordinates plot is a useful visualization technique, but it has some drawbacks. We enhance the traditional star coordinates plot introducing new parameters that will allow us to visualize the data points in two dimensions as polygons and in three dimensions as polyhedrons. In order to visualize large data sets and reduce its computational time, a parallel algorithm is also designed. We design a new meta-classifier algorithm, and its performance is compared with base classifier algorithms and Bagged based meta-classifier algorithms. Our meta-classifier algorithm gives better results compared to other meta-classifier algorithms. For speeding up its computation time as well as making it suitable for large data sets a parallel algorithm is developed. We develop a meta-clustering algorithm and compare its performance with two Bagged based meta-clustering algorithms, and hypergraph partitioning meta-clustering algorithm. Our proposed meta-clustering algorithm gives results close to the best clustering algorithm, and is more robust to the data dependency problem. A parallel algorithm to compute four meta-clustering algorithm is also designed. The experimental results of our collection of sequential and parallel programs is tested in two different clusters of Linux-based workstations using real-world databases available in the Machine Learning Repository of the University of California at Irvine.	en_US
dc.description.graduationSemester	Fall	en_US
dc.description.graduationYear	2006	en_US
dc.description.sponsorship	The Office of Naval Research (ONR) provided the grant number. N00014-03-1-0359.	en_US
dc.identifier.uri	https://hdl.handle.net/20.500.11801/1798
dc.language.iso	English	en_US
dc.rights.holder	(c) 2006 Elio Lozano Inca	en_US
dc.rights.license	All rights reserved	en_US
dc.subject	Parallel and distributed computing	en_US
dc.subject	Data mining	en_US
dc.title	Contributions to parallel and distributed computing in knowledge discovery and data mining	en_US
dc.type	Dissertation	en_US
dspace.entity.type	Publication
thesis.degree.discipline	Computing and Information Sciences and Engineering	en_US
thesis.degree.level	Ph.D.	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: CIIC_LozanoIncaE_2006.pdf
Size:: 1.01 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Theses & Dissertations

Publication: Contributions to parallel and distributed computing in knowledge discovery and data mining

Files

Original bundle

Collections

Publication:

Contributions to parallel and distributed computing in knowledge discovery and data mining