Show simple item record

dc.contributor.advisorAcuña-Fernández, Edgar
dc.contributor.authorRodríguez, Caroline K.
dc.description.abstractIn this thesis, a data preprocessing environment has been created, for use in a supervised classification context, with the Windows platform of the R programming language and environment for statistical computing and graphics.. The functions that compose the environment have been selected based on the results of empirical studies on the effects of the data preprocessing techniques investigated on the misclassification error of well-known classifiers used on real datasets. Visualization techniques were also included in the environment to support data exploration, as well as data preprocessing decisions. The techniques considered in this thesis were applied to twelve real datasets found at the Machine Learning Database Repository at the University of California, Irvine. The datasets varied in the number of dimensions from 4 to 60, in the number of observations from 150 to 4435, and in the number of classes from 3 to 7. Other existing studies on data preprocessing study the effects of applying these techniques to the whole dataset, but not by class. The functions that form the data preprocessing environment were placed in a package that can be downloaded to the R directory R_HOME/library and then, loaded to the user’s workspace to create a data preprocessing environment for supervised classification applications. Future investigations may explore the use of these functions for data mining projects that involve very-high dimensional and very large datasets.en_US
dc.description.abstractEn esta tesis, se ha creado un ambiente de pre-procesamiento de datos para usarse en aplicaciones de clasificación supervisada para la plataforma de Windows del lenguaje de programación y ambiente estadístico y gráfico llamado R. Las funciones que componen el ambiente han sido seleccionadas en base a los resultados de estudios empíricos sobre el efecto del prprocesamiento de datos en el error de la mala clasificación de tres clasificadores muy conocidos. Las doce bases de datos usadas, cuyas dimensionalidades varían de 4 a 60, número de observaciones de 150 a 4435 y número de clases de 3 a 7, fueron tomadas del Machine Learning Database Repository at the University of California, Irvine. Otros estudios existen en el área de pre-procesamiento de datos, pero aplican las técnicas mencionadas a datos completos y no a los datos agrupados por clase. Las funciones codificadas han sido empaquetadas y el paquete puede ser bajado al directorio de R “R_HOME/library”. Una vez ahí, el usuario puede montar el paquete en su “workspace”, creando así un ambiente propicio para el pre- procesamiento de datos para aplicaciones de clasificación supervisada. Investigaciones futuras podrán explorar el uso de estas funciones para proyectos de minería de datos.en_US
dc.description.sponsorshipThe Office of Naval Research (ONR) – (grant number: N00014-03-1-0359) PRECISE group of the Engineering School at the University of Puerto Rico at Mayagüez funded by NSF grant EIA 99-77071en_US
dc.subjectSupervised classificationen_US
dc.titleA computational environment for data preprocessing in supervised classificationen_US
dc.rights.licenseAll rights reserveden_US
dc.rights.holder(c) 2004 Caroline K. Rodríguezen_US
dc.contributor.committeeBollman, Dorothy
dc.contributor.committeeVásquez, Pedro
dc.contributor.representativeRullán, Agustin Computingen_US
dc.contributor.collegeCollege of Arts and Sciences - Sciencesen_US
dc.contributor.departmentDepartment of Mathematicsen_US

Files in this item


This item appears in the following Collection(s)

  • Theses & Dissertations
    Items included under this collection are theses, dissertations, and project reports submitted as a requirement for completing a degree at UPR-Mayagüez.

Show simple item record

All rights reserved
Except where otherwise noted, this item's license is described as All Rights Reserved