Application of Logistic Regression Model in Prediction of Early Diabetes Across United States

I.Olufemi, C.Obunadike, A. Adefabi & D. Abimbola
1,2,3,4
Department of Computer Science, Austin Peay State University, Clarksville, USA
DOI –
http://doi.org/10.37502/IJSMR.2023.6502

Abstract

This study examines a case study and impact of predicting early diabetes in United States through the application of Logistic Regression Model. After comparing the predictive ability of machine learning algorithm (Binomial Logistic Model) to diabetes, the important features that causes diabetes were also studied. We predict the test data based on the important variables and compute the prediction accuracy using the Receiver Operating Characteristic (ROC) curve and Area Under Curve (AUC). From the correlation coefficient analysis, we can deduce that, out of the 16 PIE variables, only “Itching and Delayed healing” were statistically insignificant with the target variable (class) with a value of 83% and 33% respectively while “Alopecia and Gender/Sex” has a negative correlation with the target variable (class). In addition, the Lasso Regularization method was used to penalize our logistic regression model, and it was observed that the predictor variable “sudden_weight_loss” does not appear to be statistically significant in the model and the predictor variables “Polyuria and Polydipsa” contributed most to the prediction of Class “Positive” based on their parameter values and odd ratios. Since the confidence interval of our model falls between 93% and 99%, we are 95% confident that our AUC is accurate and thus, it indicates that our fitted model can predict diabetes status correctly.

Keywords: Machine Learning, Supervised Learning, Binomial Logistic Model, Early Diabetes Prediction.

References

  • American Diabetes Association. Diagnosis and classification of diabetes mellitus. Diabetes Care 2009, 32, S62–S67. [CrossRef] [PubMed]
  • CDC 2022 National Diabetes Statistics Report: 2022 National Diabetes Statistics Report.
  • Center for Disease Control and Prevention. National Diabetes Fact Sheet: National Estimates and General Information on Diabetes and Prediabetes in the United States, 2011; US Department of Health and Human Services, Centers for Disease Control and Prevention: Atlanta, GA, USA, 2011; Volume 201, pp. 2568–2569.
  • Centre For Disease Control and Prevention:  CDC Wonder https://wonder.cdc.gov/controller/datarequest/D76;jsessionid=F1696B2C464E3B34D922962F 0D4E
  • Hastie T, Tibshirani R, Friedman J (2008) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2 Eds, Springer
  • IDF Diabetes Atlas 2022 Report. Available from https://diabetesatlas.org/
  • IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045
  • International Diabetes Federation: 02 November 2021 Affecting one in 10 adults https://www.idf.org/news/240:diabetes-now-affects-one-in-10-adults-worldwide.html
  • James G, Witten D, Hastie T, et al. (2013) An Introduction to Statistical Learning with Applications in R. Springer.
  • Kaur, H.; Kumari, V. Predictive modelling and analytics for diabetes using a machine learning approach. Appl. Comput. Inform. 2020, 18, 90–100. [CrossRef]
  • Machine Learning Repository. Early-Stage Diabetes Risk Data Set. Available from: https://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset
  • Maniruzzaman, M.; Rahman, M.; Ahammed, B.; Abedin, M. Classification and prediction of diabetes disease using machine learning paradigm. Health Inf. Sci. Syst. 2020, 8, 1–14. [CrossRef] [PubMed]
  • Salmonella in the Caribbean – 2013 Interpreting Results of Case-Control Studies https://www.cdc.gov/training/SIC_CaseStudy/Interpreting_Odds_ptversion.pdf
  • Saeedi, P.; Petersohn, I.; Salpea, P.; Malanda, B.; Karuranga, S.; Unwin, N.; Colagiuri, S.; Guariguata, L.; Motala, A.A.; Ogurtsova, K.; et al. Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas. Diabetes Res. Clin. Pract. 2019, 157, 107843. [CrossRef] [PubMed]
  • Sisodia, D.; Sisodia, D.S. Prediction of diabetes using classification algorithms. Procedia Comput. Sci. 2018, 132, 1578–1585. [CrossRef]