Salcedo Villanueva, Milena R.

Loading...
Profile Picture

Publication Search Results

Now showing 1 - 1 of 1
  • Publication
    Comparación de técnicas basadas en muestreo para solucionar el problema de dos clases desbalanceadas
    (2011-05) Salcedo Villanueva, Milena R.; Quintana Díaz, Julio C.; College of Arts and Sciences - Sciences; Santana Morant, Dámaris; Lorenzo González, Edgardo; Department of Mathematics; Wessel Beaver, Linda
    Nowadays, estimation and evaluation of classification models has increased as an area of research in pattern recognition in data bases. One of the main problems that contribute to performance deterioration of classification methods in relation to data sets occurs when classes are unbalanced, that is one or several classes have sizes significantly bigger than the others. Here particular attention has been given to the case where data are distributed in two unbalanced classes. In this thesis we present a comparative analysis of the effect of the use of sampling techniques to solve the problem of two unbalanced classes. The techniques analyzed were: random oversampling; oversampling “SMOTE” (Synthetic Minority Oversampling Technique); and combinations of oversampling “SMOTE” with the cleaning methods “ENN” (Edited Nearest Neighbor), and “Tomek-Link” (these last techniques also act as undersampling procedures). We evaluated the effects of their implementations on the following classification methods: logistic regression; linear discriminant; k-nearest neighbors; and decision trees. The purpose was to establish which of these methods showed better performance based on the results of the following evaluation metrics: misclassification rate; and the measures of “Noise”, “Silence”, “G” (based on the geometric mean) and “F”. The data sets that we used were: “CRX” and “GERMAN”, located at the webpage of Dr. Edgar Acuña, and the data sets named “EST1” y “EST2”. The combination of the oversampling “SMOTE” technique with the cleaning method ENN applied to these data sets was the most efficient in those cases where unbalanced sizes between two classes were significant.