On applications of rough sets theory to knowledge discovery
Coaquira-Nina, Frida R.
MetadataShow full item record
Knowledge Discovery in Databases (KDD) is the nontrivial extraction of implicit, previously unknown and potentially useful information from data. Data preprocessing is a step of the KDD process that reduces the complexity of the data and offers better conditions to subsequent analysis. Rough sets theory, where sets are approximated using elementary sets, is another approach for developing methods for the KDD process. In this doctoral Thesis, we propose new algorithms based on Rough sets theory for three data preprocessing steps: Discretization, feature selection, and instance selection. In Discretization, continuous features are transformed into new categorical features. This is required for some KDD algorithms working strictly with categorical features. In Feature selection, the new subset of features leads to a new dataset of lower dimension, where it is easier to perform a KDD task. When a dataset is very large, an instance selection process is required to decrease the computational complexity of the KDD process. In addition to that, we combine a partitioning clustering algorithm with the Rough sets approach obtaining comparable results to a hierarchical clustering algorithm used along with rough sets. The new methods proposed in this thesis have been tested on datasets taken from the Machine Learning Database Repository at the University of California at Irvine.