Publication:
A decision tree-based approach for missing value imputation of mixed-type data

Thumbnail Image
Authors
Rosado Galindo, Heizel M.
Embargoed Until
Advisor
Dávila, Saylisse
College
College of Engineering
Department
Department of Industrial Engineering
Degree Level
M.S.
Publisher
Date
2017-05
Abstract
Researchers and practitioners of many areas of knowledge frequently struggle with missing data. Missing data is a problem because almost all standard statistical methods assume that the information is complete. Missing value imputation offers a solution to this problem. The main contribution of this work lies on the development of a random forest-based imputation method that can handle any type of data, including high-dimensional data with non-linear complex interactions. The premise behind the proposed scheme is that a variable can be imputed taking into account only those variables that are related to it using feature selection. This work compares the performance of the proposed scheme with other two imputation methods commonly used in literature: KNN and missForest. The results suggest that the proposed method can be useful in complex categorical scenarios with high volume of missing values. The proposed method is an approximation of missForest that signi cantly reduces the amount of variables used in the imputation.
Keywords
Random forest,
missForest
Cite
Rosado Galindo, H. M. (2017). A decision tree-based approach for missing value imputation of mixed-type data [Thesis]. Retrieved from https://hdl.handle.net/20.500.11801/932