Garzon-Alfonso, Cristian C.
Loading...
1 results
Publication Search Results
Now showing 1 - 1 of 1
Publication Classifying disease-related tweets in the Twitter Health Surveillance System(2018-12-05) Garzon-Alfonso, Cristian C.; Rodríguez-Martínez, Manuel; College of Engineering; Rivera Gallego, Wilson; Rivera Vega, Pedro; Department of Electrical and Computer Engineering; Hernandez, WilliamPublic health offcials, hospital directors, and other professionals related with health disciplines have to track and report disease outbreaks that affect populations around the world. Often, the data comes in reports and Comma Separated Values (CSV) files from hospitals, and private doctor's offces. Typically, these reports are generated manually, increasing the risk of human error contained in transcript, analysis, charts, and different indicators that are used by professional organizations such as the United States (US) Center for Disease Control (CDC), World Health Organization (WHO) or US Health & Human Services (HHS). The processing and understanding of all these data might take weeks and the offcial warnings to a population could arrive too late. Poor and undeserved communities normally are highly affected since limited access to medical services often means that medical care attends the outbreaks when the major part of the community is already affected. In this research we present the Twitter Health Surveillance (THS) application framework. THS is designed as an integrated platform to help health offcials collect tweets, determine if they are related with a medical condition, extract metadata out of them, and create a big data warehouse that can be used to further analyze the data. THS is built atop open source tools and provides the following value added services: Data Acquisition, Tweet Classification, and Big Data Warehousing. In order to validate THS, we have created a collection of roughly twelve thousands labelled tweets. These tweets contain one or more target medical terms, and the labels indicate if the tweet is related or not to a medical condition. We used this collection to test various machine learning models based on Recurrent and Convolutional Neural Networks. Our experiments show that we can classify tweets with 96% precision, 91% recall, and 86% F1 score. These results compare favorably with recent research on this area, and show the promise of our THS system.