Yazdani Lopez, Ramineh

Loading...
Profile Picture

Publication Search Results

Now showing 1 - 1 of 1
  • Publication
    Predictive analytics tools to improve on-time graduation rate for undergraduate students
    (2022-05-20) Yazdani Lopez, Ramineh; Rivera-Santiago, Roberto; College of Arts and Sciences - Sciences; Ríos-Soto, Karen R.; Santana-Morant, Dámaris; Department of Mathematics; Bartolomei-Suárez, Sonia M.
    The on-time graduation rate among private and state universities in Puerto Rico is significantly lower than in the mainland United States. This problem is noteworthy because it leads to substantial negative consequences for the student, both socially and economically, and for the educational institution and the local economy. This project aims to develop a predictive model that accurately detects early in their academic pursuit students at risk for not graduating on time. Various predictive models are developed to do this, and the best model, the one with the highest performance, is selected. The models fall into four categories: the classification Decision Tree (the type which takes on discrete values), ensemble (Random Forest and Boosting), probabilistic (Naïve Bayes and Logistic Regression), and neural network. This project uses a dataset containing information from 24432 undergraduate students at the University of Puerto Rico at Mayaguez provided by the Office of Planning, Research, and Institutional Improvement. The predictive performance of the models is evaluated in two scenarios: the first Group (Group I) includes both the first year of college and pre-college factors, and the other (Group II) only considers pre-college factors. The raw dataset is used to create three modified datasets by removing rows with missing values, imputation of missing values, and oversampling of the minority class, respectively. This study's classification evaluation metrics are Recall, F1-score and misclassification error. Overall, for both scenarios, the boosting model, trained on the dataset with rows containing missing values removed and trained on the oversampled dataset, is equally the most successful at predicting who will not graduate on time. This is demonstrated by a high classification Recall score and low prediction error. The imputation of missing values results in a slight improvement in classification evaluation metrics across all models.