Rodríguez, Caroline K.
Loading...
1 results
Publication Search Results
Now showing 1 - 1 of 1
Publication A computational environment for data preprocessing in supervised classification(2004) Rodríguez, Caroline K.; Acuña-Fernández, Edgar; College of Arts and Sciences - Sciences; Bollman, Dorothy; Vásquez, Pedro; Department of Mathematics; Rullán, AgustinIn this thesis, a data preprocessing environment has been created, for use in a supervised classification context, with the Windows platform of the R programming language and environment for statistical computing and graphics.. The functions that compose the environment have been selected based on the results of empirical studies on the effects of the data preprocessing techniques investigated on the misclassification error of well-known classifiers used on real datasets. Visualization techniques were also included in the environment to support data exploration, as well as data preprocessing decisions. The techniques considered in this thesis were applied to twelve real datasets found at the Machine Learning Database Repository at the University of California, Irvine. The datasets varied in the number of dimensions from 4 to 60, in the number of observations from 150 to 4435, and in the number of classes from 3 to 7. Other existing studies on data preprocessing study the effects of applying these techniques to the whole dataset, but not by class. The functions that form the data preprocessing environment were placed in a package that can be downloaded to the R directory R_HOME/library and then, loaded to the user’s workspace to create a data preprocessing environment for supervised classification applications. Future investigations may explore the use of these functions for data mining projects that involve very-high dimensional and very large datasets.