Publication:
A decision tree-based approach for missing value imputation of mixed-type data

dc.contributor.advisor Dávila, Saylisse
dc.contributor.author Rosado Galindo, Heizel M.
dc.contributor.college College of Engineering en_US
dc.contributor.committee Torres, Wandaliz
dc.contributor.committee Artiles, Noel
dc.contributor.department Department of Industrial Engineering en_US
dc.contributor.representative De Hoyos, Moraima
dc.date.accessioned 2018-09-19T19:34:17Z
dc.date.available 2018-09-19T19:34:17Z
dc.date.issued 2017-05
dc.description.abstract Researchers and practitioners of many areas of knowledge frequently struggle with missing data. Missing data is a problem because almost all standard statistical methods assume that the information is complete. Missing value imputation offers a solution to this problem. The main contribution of this work lies on the development of a random forest-based imputation method that can handle any type of data, including high-dimensional data with non-linear complex interactions. The premise behind the proposed scheme is that a variable can be imputed taking into account only those variables that are related to it using feature selection. This work compares the performance of the proposed scheme with other two imputation methods commonly used in literature: KNN and missForest. The results suggest that the proposed method can be useful in complex categorical scenarios with high volume of missing values. The proposed method is an approximation of missForest that signi cantly reduces the amount of variables used in the imputation. en_US
dc.description.graduationSemester Spring en_US
dc.description.graduationYear 2017 en_US
dc.identifier.uri https://hdl.handle.net/20.500.11801/932
dc.language.iso en en_US
dc.rights.holder (c) 2017 Heizel M. Rosado Galindo en_US
dc.rights.license All rights reserved en_US
dc.subject Random forest en_US
dc.subject missForest en_US
dc.subject.lcsh Missing observations (Statistics) en_US
dc.subject.lcsh Multiple imputation (Statistics) en_US
dc.title A decision tree-based approach for missing value imputation of mixed-type data en_US
dc.type Thesis en_US
dspace.entity.type Publication
thesis.degree.discipline Industrial Engineering en_US
thesis.degree.level M.S. en_US
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
ININ_RosadoGalindoHM_2017.pdf
Size:
3.84 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.64 KB
Format:
Item-specific license agreed upon to submission
Description: