Semantic metadata extraction from open domain texts in natural language

Córdoba-Rodas, Angie P.

Publication:

Semantic metadata extraction from open domain texts in natural language

Files

ICOM_CordobaRodasA_2013.pdf (7.55 MB)

Authors

Córdoba-Rodas, Angie P.

Advisor

Vega-Riveros, José F.

College

College of Engineering

Department

Department of Electrical and Computer Engineering

Degree Level

M.S.

Date

2013

Full item page

Abstract

The information existing on the Web is growing immensely, and has posed a great challenge to users when searching for information and documents about a specific topic. Current search engines, though quite effective, fall short in many occasions in the relevance and accuracy of their results. Natural Language Processing (NLP) is a natural step towards understanding the searcher’s intent and the meaning of terms in context. In this research, a supervised learning algorithm was built to extract se- mantic metadata of the sentences from documents written in natural language. The training set for the system was a corpus which was built with semantic annotations of sentences from a paper on a specific subject. The semantic metadata describe the constituents of a sentence in terms of thematic roles. The constituents were obtained from the grammatical structure of the sentence using the Stanford University Natural Language Parser.

La información existente en la Web está creciendo enormemente, y ha planteado un gran desafío a los usuarios en la búsqueda de información y documentos sobre un tema específico. Los motores de búsqueda actuales, aunque muy eficaces, se quedan cortos en muchas ocasiones en la relevancia y exactitud de sus resultados. El Procesamiento del Lenguaje Natural, NLP por sus siglas en inglés, es un paso natural hacia el entendimiento de la intención del usuario y el significado de los términos en su contexto. En esta investigación, un algoritmo de aprendizaje supervisado fue construido para extraer metadatos semánticos de las oraciones de documentos escritos en lenguaje natural. El conjunto de entrenamiento para el sistema fue un corpus construido con anotaciones semánticas sobre oraciones de un documento sobre un tema específico. Los metadatos semánticos describen los componentes de una oración en términos de papeles temáticos. Los componentes se obtuvieron de la estructura gramatical de la frase utilizando el Analizador de Lenguaje Natural de la Universidad de Stanford.

Usage Rights

Persistent URL

https://hdl.handle.net/20.500.11801/2189

Cite

Córdoba-Rodas, A. P. (2013). Semantic metadata extraction from open domain texts in natural language [Thesis]. Retrieved from https://hdl.handle.net/20.500.11801/2189

Collections

Theses & Dissertations

Publication:

Semantic metadata extraction from open domain texts in natural language

Files

Authors

Embargoed Until

Advisor

College

Department

Degree Level

Publisher

Date

Abstract

Keywords

Usage Rights

Persistent URL

Collections

Publication: Semantic metadata extraction from open domain texts in natural language

Files

Authors

Embargoed Until

Advisor

College

Department

Degree Level

Publisher

Date

Abstract

Keywords

Usage Rights

Persistent URL

Collections

Publication:

Semantic metadata extraction from open domain texts in natural language