Semi-supervised document classification using ontologies

dc.contributor.advisor Acuña-Fernández, Edgar Aparicio-Carrasco, Roxana K. College of Engineering en_US
dc.contributor.committee Urintsev, Alexander
dc.contributor.committee Lozano, Elio
dc.contributor.committee Manian, Vidya
dc.contributor.department Department of Electrical and Computer Engineering en_US
dc.contributor.representative Calderón, Andrés 2019-02-12T15:30:45Z 2019-02-12T15:30:45Z 2011
dc.description.abstract Many modern applications of automatic document classification require learning accurately with little training data. Addressing the need to reduce the manual labeling process, the semi-supervised classification technique has been proposed. This technique use labeled and unlabeled data for training and it has shown to be effective in many cases. However, the use of unlabeled data for training is not always beneficial and it is difficult to know a priori when it will be work for a particular document collection. On the other hand, the emergence of web technologies has originated the collaborative development of ontologies. Ontologies are formal, explicit, detailed structures of concepts. In this thesis, we propose the use of Ontologies in order to improve automatic document classification, when we have little training data. We propose that making use of ontologies to assist the semi-supervised document classification can substantially improve the accuracy and efficiency of the semi-supervised technique. Many learning algorithms have been studied for text. One of the most effective is Support Vector Machines, which is the basis of this work. Our algorithm enhances the performance of Transductive Support Vector Machines through the use of ontologies. We report experimental results applying our algorithm to three different real-world text classification datasets. Our experimental results show an increment of accuracy of 4% on average and up to 20% for some datasets, in comparison with the traditional semi-supervised model. en_US
dc.description.graduationSemester Fall en_US
dc.description.graduationYear 2011 en_US
dc.description.sponsorship NSF en_US
dc.language.iso English en_US
dc.rights.holder (c) 2011 Roxanna K. Aparicio Carrasco en_US
dc.rights.license All rights reserved en_US
dc.subject Semi supervised document classification en_US
dc.subject Ontologies en_US
dc.title Semi-supervised document classification using ontologies en_US
dc.type Dissertation en_US
dspace.entity.type Publication Computing and Information Sciences and Engineering en_US Ph.D. en_US
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
1.39 MB
Adobe Portable Document Format