Publication:
A computational environment for data preprocessing in supervised classification

dc.contributor.advisor Acuña-Fernández, Edgar
dc.contributor.author Rodríguez, Caroline K.
dc.contributor.college College of Arts and Sciences - Sciences en_US
dc.contributor.committee Bollman, Dorothy
dc.contributor.committee Vásquez, Pedro
dc.contributor.department Department of Mathematics en_US
dc.contributor.representative Rullán, Agustin
dc.date.accessioned 2019-04-15T15:50:44Z
dc.date.available 2019-04-15T15:50:44Z
dc.date.issued 2004
dc.description.abstract In this thesis, a data preprocessing environment has been created, for use in a supervised classification context, with the Windows platform of the R programming language and environment for statistical computing and graphics.. The functions that compose the environment have been selected based on the results of empirical studies on the effects of the data preprocessing techniques investigated on the misclassification error of well-known classifiers used on real datasets. Visualization techniques were also included in the environment to support data exploration, as well as data preprocessing decisions. The techniques considered in this thesis were applied to twelve real datasets found at the Machine Learning Database Repository at the University of California, Irvine. The datasets varied in the number of dimensions from 4 to 60, in the number of observations from 150 to 4435, and in the number of classes from 3 to 7. Other existing studies on data preprocessing study the effects of applying these techniques to the whole dataset, but not by class. The functions that form the data preprocessing environment were placed in a package that can be downloaded to the R directory R_HOME/library and then, loaded to the user’s workspace to create a data preprocessing environment for supervised classification applications. Future investigations may explore the use of these functions for data mining projects that involve very-high dimensional and very large datasets. en_US
dc.description.abstract En esta tesis, se ha creado un ambiente de pre-procesamiento de datos para usarse en aplicaciones de clasificación supervisada para la plataforma de Windows del lenguaje de programación y ambiente estadístico y gráfico llamado R. Las funciones que componen el ambiente han sido seleccionadas en base a los resultados de estudios empíricos sobre el efecto del prprocesamiento de datos en el error de la mala clasificación de tres clasificadores muy conocidos. Las doce bases de datos usadas, cuyas dimensionalidades varían de 4 a 60, número de observaciones de 150 a 4435 y número de clases de 3 a 7, fueron tomadas del Machine Learning Database Repository at the University of California, Irvine. Otros estudios existen en el área de pre-procesamiento de datos, pero aplican las técnicas mencionadas a datos completos y no a los datos agrupados por clase. Las funciones codificadas han sido empaquetadas y el paquete puede ser bajado al directorio de R “R_HOME/library”. Una vez ahí, el usuario puede montar el paquete en su “workspace”, creando así un ambiente propicio para el pre- procesamiento de datos para aplicaciones de clasificación supervisada. Investigaciones futuras podrán explorar el uso de estas funciones para proyectos de minería de datos. en_US
dc.description.graduationYear 2004 en_US
dc.description.sponsorship The Office of Naval Research (ONR) – (grant number: N00014-03-1-0359) PRECISE group of the Engineering School at the University of Puerto Rico at Mayagüez funded by NSF grant EIA 99-77071 en_US
dc.identifier.uri https://hdl.handle.net/20.500.11801/2010
dc.language.iso English en_US
dc.rights.holder (c) 2004 Caroline K. Rodríguez en_US
dc.rights.license All rights reserved en_US
dc.subject Supervised classification en_US
dc.title A computational environment for data preprocessing in supervised classification en_US
dc.type Thesis en_US
dspace.entity.type Publication
thesis.degree.discipline Scientific Computing en_US
thesis.degree.level M.S. en_US
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
MATE_RodriguezC_2004.pdf
Size:
1.33 MB
Format:
Adobe Portable Document Format
Description: