Estudio comparativo entre INLA y MCMC en modelos lineales generalizados mixtos

Angarita-Valderrama, Andrea Katherine

Publication:

Estudio comparativo entre INLA y MCMC en modelos lineales generalizados mixtos

dc.contributor.advisor	Santana-Morant, Dámaris
dc.contributor.author	Angarita-Valderrama, Andrea Katherine
dc.contributor.college	College of Arts and Sciences - Sciences	en_US
dc.contributor.committee	Torres-Saavedra, Pedro
dc.contributor.committee	Macchiavelli, Raúl
dc.contributor.committee	Acuña-Fernández, Edgar
dc.contributor.department	Department of Mathematics	en_US
dc.contributor.representative	Ierkic, Henrick M.
dc.date.accessioned	2020-02-01T15:46:42Z
dc.date.available	2020-02-01T15:46:42Z
dc.date.issued	2019-12-09
dc.description.abstract	In this work a comparative study is carried out between Integrated Nested Laplace Approximation (INLA) and Markov Chain Monte Carlo (MCMC) for estimation in Generalized Linear Mixed Models (GLMMs) for count data and binary data. The comparison is made through a simulation studies and the analysis of two real data sets. Both methods are used to obtain an approximation of the posterior distributions that arise from the Bayesian inference. INLA uses Laplace approximations to approximate the posterior distributions while the MCMC generates samples from the posterior distributions using Gibbs Sampling. INLA is motivated by the computational challenges of MCMC, in particular Gibbs sampling, when working with large data sets. Besides the high computational demand of MCMC, some practitioners may find the definition of Bayesian models in programs such as JAGS a difficult task to accomplish. INLA offers some computational advantages over MCMC without sacrificing efficiency. Although the commands to fit the models with INLA are not straightforward for a non-expert user, the syntax resembles well-known R packages, and therefore, it could be better assimilated by practitioners. One of disadvantages of INLA is that is restricted to Latent Gaussian Models (LGMs). In the simulation studies, several factors were considered such as the sample size, the number of repeated measures and the precision of the random effect. The comparison of the two methods is done using the following performance measures: BIAS, the Mean Square Error (MSE) and Normalized Root Mean Square Error (NRMSE), and a measure of the time that it takes for the methods to produce estimates. The simulation studies suggest that in general INLA does not differ in terms of BIAS, MSE and NRMSE when compared to MCMC, but INLA is computationally more fast for any sample size, repeated measures and accuracy considered. In some scenarios, INLA does the estimation in a few seconds whereas the MCMC may takes hours to complete that task. The main reason for the good performance of INLA relies on the accuracy of the nested approximations. Finally, the two Bayesian methods for making inference are compared with the maximum likelihood using two real data sets to study the factors that influence the response variable. One data set corresponds to the first cycle of Mathematical Olympiads of Puerto Rico 2016 - 2017. The other data set comprises data for births in Puerto Rico in 2011. Estimates from the three methods led to similar conclusions in both data sets, but INLA is faster computationally than MCMC.	en_US
dc.description.abstract	En este trabajo se realiza un estudio comparativo entre Integrated Nested Laplace Approximation (INLA) y Markov Chain Monte Carlo (MCMC) para la estimación en Modelos Lineales Generalizados Mixtos (GLMMs, por sus siglas en inglés) para datos de conteo y datos binarios. La comparación es realizada mediante estudios de simulación y el análisis de dos conjuntos de datos reales. Ambos métodos se utilizan para obtener una aproximación de las distribuciones posteriores que surgen de la inferencia Bayesiana. INLA utiliza aproximaciones de Laplace para aproximar distribuciones posteriores mientras MCMC genera muestras de las distribuciones posteriores usando muestreo de Gibbs. INLA está motivado por los desafíos computacionales de MCMC, en particular el muestreo de Gibbs, cuando se trabaja con conjuntos grandes de datos. Además de la alta demanda computacional de MCMC, algunos profesionales pueden encontrar la formulación de modelos Bayesianos en programas como JAGS una tarea difícil de lograr. INLA ofrece algunas ventajas computacionales sobre MCMC sin sacrificar eficiencia. Aunque los comandos para ajustar los modelos con INLA no son sencillos para un usuario no experto, la sintaxis se asemeja a paquetes bien conocidos de R y, por lo tanto, los profesionales podrían asimilarla mejor. Una de las desventajas de INLA es que está restringida a los Modelos Gaussianos Latentes (LGMs). En los estudios de simulación se consideraron varios factores, como el tamaño de la muestra, el número de medidas repetidas y la precisión del efecto aleatorio. La comparación de los dos métodos se realiza utilizando las siguientes medidas de rendimiento: sesgo, el Error Cuadrático Medio (ECM) y la Raíz Normalizada del Error Cuadrático Medio (NRECM), y una medida del tiempo que tardan los métodos en producir las estimaciones. Las simulaciones mostraron que en general INLA no presenta diferencias en términos de sesgo, ECM y NRECM en comparación con MCMC, pero INLA es computacionalmente más rápido para cualquier tamaño de muestra, medidas repetidas y precisión considerada. En algunos escenarios, INLA realiza la estimación en unos pocos segundos, mientras que MCMC puede tardar horas en completar esa tarea. La razón principal del buen desempeño de INLA se basa en la precisión de las aproximaciones anidadas. Finalmente, los dos métodos Bayesianos para hacer inferencia se compararon con el de máxima verosimilitud usando dos conjuntos de datos reales para estudiar los factores que influyen en la variable respuesta. Un conjunto de datos corresponde al primer ciclo de Olimpiadas Matemáticas de Puerto Rico 2016 - 2017. El otro conjunto de datos comprende los datos de nacimientos en Puerto Rico en 2011. Las estimaciones de los tres métodos llevaron a conclusiones similares en ambos conjuntos de datos, pero INLA es más rápido computacionalmente que MCMC.	en_US
dc.description.graduationSemester	Fall	en_US
dc.description.graduationYear	2019	en_US
dc.identifier.uri	https://hdl.handle.net/20.500.11801/2557
dc.language.iso	es	en_US
dc.rights.holder	(c) 2019 Andrea Angarita Valderrama	en_US
dc.subject	MCMC	en_US
dc.subject	GLMM	en_US
dc.subject	INLA	en_US
dc.subject	Regresión Binaria	en_US
dc.subject	Regresión Poisson	en_US
dc.subject.lcsh	Laplace transformation	en_US
dc.subject.lcsh	Markov processes	en_US
dc.subject.lcsh	Monte Carlo method	en_US
dc.subject.lcsh	Linear models (Statistics)	en_US
dc.subject.lcsh	Bayesian statistical decision theory	en_US
dc.title	Estudio comparativo entre INLA y MCMC en modelos lineales generalizados mixtos	en_US
dc.type	Thesis	en_US
dspace.entity.type	Publication
thesis.degree.discipline	Mathematical Statistics	en_US
thesis.degree.level	M.S.	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: MATE_AngaritaValderramaA_2019.pdf
Size:: 1.9 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.88 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses & Dissertations

Publication: Estudio comparativo entre INLA y MCMC en modelos lineales generalizados mixtos

Files

Original bundle

License bundle

Collections

Publication:

Estudio comparativo entre INLA y MCMC en modelos lineales generalizados mixtos