Semantic metadata extraction from open domain texts in natural language

Córdoba-Rodas, Angie P.

Publication:

Semantic metadata extraction from open domain texts in natural language

dc.contributor.advisor	Vega-Riveros, José F.
dc.contributor.author	Córdoba-Rodas, Angie P.
dc.contributor.college	College of Engineering	en_US
dc.contributor.committee	Rivera-Gallego, Wilson
dc.contributor.committee	Rodríguez-Martínez, Manuel
dc.contributor.department	Department of Electrical and Computer Engineering	en_US
dc.contributor.representative	Carroll, Kevin S.
dc.date.accessioned	2019-05-14T18:22:45Z
dc.date.available	2019-05-14T18:22:45Z
dc.date.issued	2013
dc.description.abstract	The information existing on the Web is growing immensely, and has posed a great challenge to users when searching for information and documents about a specific topic. Current search engines, though quite effective, fall short in many occasions in the relevance and accuracy of their results. Natural Language Processing (NLP) is a natural step towards understanding the searcher’s intent and the meaning of terms in context. In this research, a supervised learning algorithm was built to extract se- mantic metadata of the sentences from documents written in natural language. The training set for the system was a corpus which was built with semantic annotations of sentences from a paper on a specific subject. The semantic metadata describe the constituents of a sentence in terms of thematic roles. The constituents were obtained from the grammatical structure of the sentence using the Stanford University Natural Language Parser.	en_US
dc.description.abstract	La información existente en la Web está creciendo enormemente, y ha planteado un gran desafío a los usuarios en la búsqueda de información y documentos sobre un tema específico. Los motores de búsqueda actuales, aunque muy eficaces, se quedan cortos en muchas ocasiones en la relevancia y exactitud de sus resultados. El Procesamiento del Lenguaje Natural, NLP por sus siglas en inglés, es un paso natural hacia el entendimiento de la intención del usuario y el significado de los términos en su contexto. En esta investigación, un algoritmo de aprendizaje supervisado fue construido para extraer metadatos semánticos de las oraciones de documentos escritos en lenguaje natural. El conjunto de entrenamiento para el sistema fue un corpus construido con anotaciones semánticas sobre oraciones de un documento sobre un tema específico. Los metadatos semánticos describen los componentes de una oración en términos de papeles temáticos. Los componentes se obtuvieron de la estructura gramatical de la frase utilizando el Analizador de Lenguaje Natural de la Universidad de Stanford.	en_US
dc.description.graduationSemester	Spring (2nd Semester)	en_US
dc.description.graduationYear	2013	en_US
dc.identifier.uri	https://hdl.handle.net/20.500.11801/2189
dc.language.iso	English	en_US
dc.rights.holder	(c) 2013 Angie Paola Córdoba-Rodas	en_US
dc.rights.license	All rights reserved	en_US
dc.title	Semantic metadata extraction from open domain texts in natural language	en_US
dc.type	Thesis	en_US
dspace.entity.type	Publication
thesis.degree.discipline	Computer Engineering	en_US
thesis.degree.level	M.S.	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ICOM_CordobaRodasA_2013.pdf
Size:: 7.55 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Theses & Dissertations

Publication: Semantic metadata extraction from open domain texts in natural language

Files

Original bundle

Collections

Publication:

Semantic metadata extraction from open domain texts in natural language