## Publication: Analysis of Puerto Rico crime data (2012-2016) using small area estimation

No Thumbnail Available
##### Authors
Galán Rivera, Jean
Macchiavelli, Raúl E.
##### College
College of Arts and Sciences - Sciences
##### Department
Department of Mathematics
M.S.
2022-12-13
##### Abstract
According to The United Nations Office on Drugs and Crime reported that in 2012, Puerto Rico was ranked as the top territory in the United States with highest murder rate. There exists some methods that estimate crime count means and crime concentrations. However, past studies indicate that a certain number of crimes are not reported to the police due to many complicated situations. If the data are not abundant enough, the methods mentioned earlier will have less reliable conclusions due for detailed areas due to small sample sizes. Therefore, it is more reasonable to study the data using small area estimation, which allows modeling using additional auxiliary information like census records or geographic information, in order to obtain more accurate estimates. One of the most common models for small areas is the Nested Error Regression Model, which is typically used when the information on the response variable is available at a unit level. This model can be utilized for simple nested models and can be extended to the case of the generalized linear mixed model. However, when studying count data, it is common to see that it's distribution follows either a Poisson or a negative binomial distribution. Furthermore, generalized linear mixed models for count data typically utilizes log-linear models. It is very usual to find that the expected count is proportional to an exposure variable and for these cases it is recommended to model the rate in order to estimate the expected count through the expected rate. It is important to consider the most complete information possible in order to get more accurate estimates. The main objective of this research is to use Puerto Rico's crime data between 2012 and 2016 in order to study and understand Generalized Linear Mixed Models for counts having random effects in small areas. Preliminary results showed that the counts of crime of the data set have an overall mean of 25.92 crimes every 8 hours. Furthermore, analysis results showed that the crime counts followed negative binomial distribution. A multiple comparisons were made and found that in summary: (1) the mean crime count is higher for property crimes compared to personal crimes, (2) the mean personal crime count was higher at nights and the mean property crime count was higher at afternoons, and (3) the mean crime count was lower during the autumns.