Research Article | | Peer-Reviewed

Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients

Received: 14 April 2026     Accepted: 30 April 2026     Published: 10 June 2026
Views:       Downloads:
Abstract

Diabetes mellitus is a global health challenge which is associated with various complications such as cardiovascular disease, vision impairment, and kidney failure. Therefore, early detection and accurate prediction of diabetes risk play a significant role in improving the management of the disease and minimising the long-term health complications. Individual machine learning methods that have been applied exhibit various limitations, such as overfitting, which negatively influence the performance due to reduced generalisation capability and high variance, making the model more sensitive to specific data features. The study aimed to solve this issue by applying a stacked ensemble learning technique in enhancing the classification performance of diabetes using the Pima Indian Diabetes Data. The study incorporated various base learners: Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbours (KNN), Gradient–Boosting Machine (GBM) and Logistic regression as a meta-learner. The base models were trained using a 10-fold cross-validation approach to ensure a robust model and minimise overfitting. The study showed that the stacked ensemble technique achieved an average AUC of 0.84 and a standard deviation of 0.05 across all folds, showing a stable predictive performance. To improve on interpretability SHapley Additive exPlanations (SHAP) analysed the contribution of individual features, such as Glucose and Body Mass Index (BMI), which were influential in predicting diabetes risk. Further, the SHAP analysed the contribution of base learners to meta-learner prediction and found Gradient Boosting and Random Forest exerted stronger influence on the stacked ensemble compared to others. Overall, the stacking ensemble provided a robust and reliable approach for an improved diabetes classification performance. Furthermore, the integration of explainable artificial intelligence, such as SHAP, improves model transparency and interpretability among healthcare professionals.

Published in International Journal of Data Science and Analysis (Volume 12, Issue 1)
DOI 10.11648/j.ijdsa.20261201.12
Page(s) 10-16
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2026. Published by Science Publishing Group

Keywords

Diabetes, Stacking Ensemble, Cross-validation, SHAP, Model Interpretability, Machine Learning, Meta-classifier

References
[1] World Health Organization. Diabetes, November 2024. Accessed 2026-04-03.
[2] WHO Regional Office for Africa. Diabetes is a family affair in kenya, 2019. Accessed 2026-04-03.
[3] A. Mujumdar and V. Vaidehi. Diabetes prediction using machine learning algorithms. Procedia Computer Science, 165: 292–299, 2020.
[4] D. Das, N. Aayushman, S. Kumar, M. A. Hussain, and B. R. Reddy. Diabetes prediction using ensemble learning techniques. Procedia Computer Science, 258: 3155–3164, 2025.
[5] International Diabetes Federation. IDF Diabetes Atlas. International Diabetes Federation, 10 edition, 2021.
[6] S. Agrebi and A. Larbi. Use of artificial intelligence in infectious diseases. In Artificial Intelligence in Precision Health, pages 415–438. Elsevier, 2020.
[7] A. E. El-Bashbishy and H. M. El-Bakry. Pediatric diabetes prediction using deep learning. Scientific Reports, 14(1), 2024.
[8] J. J. Khanam and S. Y. Foo. A comparison of machine learning algorithms for diabetes prediction. ICT Express, 7(4): 432–439, 2021.
[9] R. Oluoch, H. Imboga, A. Waititu, S. Mwelu, I. Evance, and F. Ongeta. Stacked ensemble classifier for adoption of point-of-collection water treatment technology among households in western kenya. International Journal of Data Science and Analysis, 11(6): 171–177, 2025.
[10] A. Sadhu and A. Jadli. Early-stage diabetes risk prediction: A comparative analysis of classification algorithms. International Advanced Research Journal in Science, Engineering and Technology, 8(2): 193–201, 2021.
[11] H. F. Ahmad, H. Mukhtar, H. Alaqail, M. Seliaman, and A. Alhumam. Investigating health-related features and their impact on the prediction of diabetes using machine learning. Applied Sciences, 11(3): 1173, 2021.
[12] T. P. Latchoumi, J. Dayanika, and G. Archana. A comparative study of machine learning algorithms using quick-witted diabetic prevention. Annals of the Romanian Society for Cell Biology, pages 4249–4259, 2021.
[13] D. M. S. Rao and D. S. Sridhathri. Diabetes mellitus prediction using ensemble machine learning techniques. ITM Web of Conferences, 56: 05015, 2023.
[14] M. M. Islam, H. R. Rifat, M. S. B. Shahid, A. Akhter, M. A. Uddin, and K. M. M. Uddin. Explainable machine learning for efficient diabetes prediction using hyperparameter tuning, shap analysis, partial dependency, and lime. Engineering Reports, 2024.
[15] M. Kutlu, T. B. Donmez, and C. Freeman. Machine learning interpretability in diabetes risk assessment: A shap analysis. Computers and Electronics in Medicine, 1(1): 34–44, 2024.
[16] C. M. Chituru, S. Ho, and I. Chai. Diabetes risk prediction using shapley additive explanations for feature engineering. Journal of Informatics and Web Engineering, 4(2): 18–35, 2025.
[17] D. Mishra, S. M. Tripathi, A. Chaurasia, and P. K. Chaurasia. A review on ensemble learning methods: Machine learning approach. International Journal of Research Publication and Reviews, 6(2): 3795–3803, 2025.
[18] C. J. Ejiyi, Z. Qin, J. Amos, M. B. Ejiyi, A. Nnani, T. U. Ejiyi, V. K. Agbesi, C. Diokpo, and C. Okpara. A robust predictive diagnosis model for diabetes mellitus using shapley-incorporated machine learning algorithms. Healthcare Analytics, 3: 100166, 2023.
[19] A. Rawashdeh and B. S. Rawashdeh. The effect cloud accounting adoption on organisational performance in smes. International Journal of Data and Network Science, 7(1):411–424, 2023.
[20] A. Daza, C. F. P. Sánchez, G. Apaza-Perez, J. Pinto, and K. Z. Ramos. Stacking ensemble approach to diagnosing the disease of diabetes. Informatics in Medicine Unlocked, 44: 101427, 2023.
Cite This Article
  • APA Style

    Macharia, S. M., Imboga, H., Kamami, W., Mwelu, S. (2026). Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients. International Journal of Data Science and Analysis, 12(1), 10-16. https://doi.org/10.11648/j.ijdsa.20261201.12

    Copy | Download

    ACS Style

    Macharia, S. M.; Imboga, H.; Kamami, W.; Mwelu, S. Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients. Int. J. Data Sci. Anal. 2026, 12(1), 10-16. doi: 10.11648/j.ijdsa.20261201.12

    Copy | Download

    AMA Style

    Macharia SM, Imboga H, Kamami W, Mwelu S. Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients. Int J Data Sci Anal. 2026;12(1):10-16. doi: 10.11648/j.ijdsa.20261201.12

    Copy | Download

  • @article{10.11648/j.ijdsa.20261201.12,
      author = {Samuel Mwangi Macharia and Herbert Imboga and Wilson Kamami and Susan Mwelu},
      title = {Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients},
      journal = {International Journal of Data Science and Analysis},
      volume = {12},
      number = {1},
      pages = {10-16},
      doi = {10.11648/j.ijdsa.20261201.12},
      url = {https://doi.org/10.11648/j.ijdsa.20261201.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20261201.12},
      abstract = {Diabetes mellitus is a global health challenge which is associated with various complications such as cardiovascular disease, vision impairment, and kidney failure. Therefore, early detection and accurate prediction of diabetes risk play a significant role in improving the management of the disease and minimising the long-term health complications. Individual machine learning methods that have been applied exhibit various limitations, such as overfitting, which negatively influence the performance due to reduced generalisation capability and high variance, making the model more sensitive to specific data features. The study aimed to solve this issue by applying a stacked ensemble learning technique in enhancing the classification performance of diabetes using the Pima Indian Diabetes Data. The study incorporated various base learners: Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbours (KNN), Gradient–Boosting Machine (GBM) and Logistic regression as a meta-learner. The base models were trained using a 10-fold cross-validation approach to ensure a robust model and minimise overfitting. The study showed that the stacked ensemble technique achieved an average AUC of 0.84 and a standard deviation of 0.05 across all folds, showing a stable predictive performance. To improve on interpretability SHapley Additive exPlanations (SHAP) analysed the contribution of individual features, such as Glucose and Body Mass Index (BMI), which were influential in predicting diabetes risk. Further, the SHAP analysed the contribution of base learners to meta-learner prediction and found Gradient Boosting and Random Forest exerted stronger influence on the stacked ensemble compared to others. Overall, the stacking ensemble provided a robust and reliable approach for an improved diabetes classification performance. Furthermore, the integration of explainable artificial intelligence, such as SHAP, improves model transparency and interpretability among healthcare professionals.},
     year = {2026}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients
    AU  - Samuel Mwangi Macharia
    AU  - Herbert Imboga
    AU  - Wilson Kamami
    AU  - Susan Mwelu
    Y1  - 2026/06/10
    PY  - 2026
    N1  - https://doi.org/10.11648/j.ijdsa.20261201.12
    DO  - 10.11648/j.ijdsa.20261201.12
    T2  - International Journal of Data Science and Analysis
    JF  - International Journal of Data Science and Analysis
    JO  - International Journal of Data Science and Analysis
    SP  - 10
    EP  - 16
    PB  - Science Publishing Group
    SN  - 2575-1891
    UR  - https://doi.org/10.11648/j.ijdsa.20261201.12
    AB  - Diabetes mellitus is a global health challenge which is associated with various complications such as cardiovascular disease, vision impairment, and kidney failure. Therefore, early detection and accurate prediction of diabetes risk play a significant role in improving the management of the disease and minimising the long-term health complications. Individual machine learning methods that have been applied exhibit various limitations, such as overfitting, which negatively influence the performance due to reduced generalisation capability and high variance, making the model more sensitive to specific data features. The study aimed to solve this issue by applying a stacked ensemble learning technique in enhancing the classification performance of diabetes using the Pima Indian Diabetes Data. The study incorporated various base learners: Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbours (KNN), Gradient–Boosting Machine (GBM) and Logistic regression as a meta-learner. The base models were trained using a 10-fold cross-validation approach to ensure a robust model and minimise overfitting. The study showed that the stacked ensemble technique achieved an average AUC of 0.84 and a standard deviation of 0.05 across all folds, showing a stable predictive performance. To improve on interpretability SHapley Additive exPlanations (SHAP) analysed the contribution of individual features, such as Glucose and Body Mass Index (BMI), which were influential in predicting diabetes risk. Further, the SHAP analysed the contribution of base learners to meta-learner prediction and found Gradient Boosting and Random Forest exerted stronger influence on the stacked ensemble compared to others. Overall, the stacking ensemble provided a robust and reliable approach for an improved diabetes classification performance. Furthermore, the integration of explainable artificial intelligence, such as SHAP, improves model transparency and interpretability among healthcare professionals.
    VL  - 12
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Sections