Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients

Samuel Mwangi Macharia; Herbert Imboga; Wilson Kamami; Susan Mwelu

doi:doi:10.11648/j.ijdsa.20261201.12

Research Article |

| Peer-Reviewed

Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients

Samuel Mwangi Macharia^*

, Herbert Imboga

, Wilson Kamami

, Susan Mwelu

Published in International Journal of Data Science and Analysis (Volume 12, Issue 1)

Received: 14 April 2026 Accepted: 30 April 2026 Published: 10 June 2026

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Diabetes mellitus is a global health challenge which is associated with various complications such as cardiovascular disease, vision impairment, and kidney failure. Therefore, early detection and accurate prediction of diabetes risk play a significant role in improving the management of the disease and minimising the long-term health complications. Individual machine learning methods that have been applied exhibit various limitations, such as overfitting, which negatively influence the performance due to reduced generalisation capability and high variance, making the model more sensitive to specific data features. The study aimed to solve this issue by applying a stacked ensemble learning technique in enhancing the classification performance of diabetes using the Pima Indian Diabetes Data. The study incorporated various base learners: Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbours (KNN), Gradient–Boosting Machine (GBM) and Logistic regression as a meta-learner. The base models were trained using a 10-fold cross-validation approach to ensure a robust model and minimise overfitting. The study showed that the stacked ensemble technique achieved an average AUC of 0.84 and a standard deviation of 0.05 across all folds, showing a stable predictive performance. To improve on interpretability SHapley Additive exPlanations (SHAP) analysed the contribution of individual features, such as Glucose and Body Mass Index (BMI), which were influential in predicting diabetes risk. Further, the SHAP analysed the contribution of base learners to meta-learner prediction and found Gradient Boosting and Random Forest exerted stronger influence on the stacked ensemble compared to others. Overall, the stacking ensemble provided a robust and reliable approach for an improved diabetes classification performance. Furthermore, the integration of explainable artificial intelligence, such as SHAP, improves model transparency and interpretability among healthcare professionals.

Published in	International Journal of Data Science and Analysis (Volume 12, Issue 1)
DOI	10.11648/j.ijdsa.20261201.12
Page(s)	10-16
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2026. Published by Science Publishing Group

Keywords

Diabetes, Stacking Ensemble, Cross-validation, SHAP, Model Interpretability, Machine Learning, Meta-classifier

References

[1]	World Health Organization. Diabetes, November 2024. Accessed 2026-04-03. https://www.who.int/news-room/fact-sheets/detail/diabetes
[2]	WHO Regional Office for Africa. Diabetes is a family affair in kenya, 2019. Accessed 2026-04-03. https://www.who.int/news-room/fact-sheets/detail/diabet es
[3]	A. Mujumdar and V. Vaidehi. Diabetes prediction using machine learning algorithms. Procedia Computer Science, 165: 292–299, 2020. https://doi.org/10.1016/j.procs.2020.01.047
[4]	D. Das, N. Aayushman, S. Kumar, M. A. Hussain, and B. R. Reddy. Diabetes prediction using ensemble learning techniques. Procedia Computer Science, 258: 3155–3164, 2025. https://doi.org/10.1016/j.procs.2025.04.573
[5]	International Diabetes Federation. IDF Diabetes Atlas. International Diabetes Federation, 10 edition, 2021.
[6]	S. Agrebi and A. Larbi. Use of artificial intelligence in infectious diseases. In Artificial Intelligence in Precision Health, pages 415–438. Elsevier, 2020. https://doi.org/10.1016/b978-0-12-817133-2.00018-5
[7]	A. E. El-Bashbishy and H. M. El-Bakry. Pediatric diabetes prediction using deep learning. Scientific Reports, 14(1), 2024. https://doi.org/10.1038/s41598-024-51438-4
[8]	J. J. Khanam and S. Y. Foo. A comparison of machine learning algorithms for diabetes prediction. ICT Express, 7(4): 432–439, 2021. https://doi.org/10.1016/j.icte.2021.02.004
[9]	R. Oluoch, H. Imboga, A. Waititu, S. Mwelu, I. Evance, and F. Ongeta. Stacked ensemble classifier for adoption of point-of-collection water treatment technology among households in western kenya. International Journal of Data Science and Analysis, 11(6): 171–177, 2025. https://doi.org/10.11648/j.ijdsa.20251106.12
[10]	A. Sadhu and A. Jadli. Early-stage diabetes risk prediction: A comparative analysis of classification algorithms. International Advanced Research Journal in Science, Engineering and Technology, 8(2): 193–201, 2021. https://iarjset.com/wp-content/uploads/2021/03/IARJSET.2021.8228.pdf
[11]	H. F. Ahmad, H. Mukhtar, H. Alaqail, M. Seliaman, and A. Alhumam. Investigating health-related features and their impact on the prediction of diabetes using machine learning. Applied Sciences, 11(3): 1173, 2021. https://doi.org/10.3390/app11031173
[12]	T. P. Latchoumi, J. Dayanika, and G. Archana. A comparative study of machine learning algorithms using quick-witted diabetic prevention. Annals of the Romanian Society for Cell Biology, pages 4249–4259, 2021. http://annalsofrscb.ro/index.php/journal/article/view/2974
[13]	D. M. S. Rao and D. S. Sridhathri. Diabetes mellitus prediction using ensemble machine learning techniques. ITM Web of Conferences, 56: 05015, 2023. https://doi.org/10.1051/itmconf/20235605015
[14]	M. M. Islam, H. R. Rifat, M. S. B. Shahid, A. Akhter, M. A. Uddin, and K. M. M. Uddin. Explainable machine learning for efficient diabetes prediction using hyperparameter tuning, shap analysis, partial dependency, and lime. Engineering Reports, 2024. https://doi.org/10.1002/eng2.13080
[15]	M. Kutlu, T. B. Donmez, and C. Freeman. Machine learning interpretability in diabetes risk assessment: A shap analysis. Computers and Electronics in Medicine, 1(1): 34–44, 2024. https://doi.org/10.69882/adba.cem.2024075
[16]	C. M. Chituru, S. Ho, and I. Chai. Diabetes risk prediction using shapley additive explanations for feature engineering. Journal of Informatics and Web Engineering, 4(2): 18–35, 2025. https://doi.org/10.33093/jiwe.2025.4.2.2
[17]	D. Mishra, S. M. Tripathi, A. Chaurasia, and P. K. Chaurasia. A review on ensemble learning methods: Machine learning approach. International Journal of Research Publication and Reviews, 6(2): 3795–3803, 2025. https://doi.org/10.55248/gengpi.6.0225.0971
[18]	C. J. Ejiyi, Z. Qin, J. Amos, M. B. Ejiyi, A. Nnani, T. U. Ejiyi, V. K. Agbesi, C. Diokpo, and C. Okpara. A robust predictive diagnosis model for diabetes mellitus using shapley-incorporated machine learning algorithms. Healthcare Analytics, 3: 100166, 2023. https://doi.org/10.1016/j.health.2023.100166
[19]	A. Rawashdeh and B. S. Rawashdeh. The effect cloud accounting adoption on organisational performance in smes. International Journal of Data and Network Science, 7(1):411–424, 2023. https://doi.org/10.5267/j.ijdns.2022.9.005
[20]	A. Daza, C. F. P. Sánchez, G. Apaza-Perez, J. Pinto, and K. Z. Ramos. Stacking ensemble approach to diagnosing the disease of diabetes. Informatics in Medicine Unlocked, 44: 101427, 2023. https://doi.org/10.1016/j.imu.2023.101427

Cite This Article

Plain Text BibTeX RIS

APA Style

Macharia, S. M., Imboga, H., Kamami, W., Mwelu, S. (2026). Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients. International Journal of Data Science and Analysis, 12(1), 10-16. https://doi.org/10.11648/j.ijdsa.20261201.12

Copy | Download

ACS Style

Macharia, S. M.; Imboga, H.; Kamami, W.; Mwelu, S. Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients. Int. J. Data Sci. Anal. 2026, 12(1), 10-16. doi: 10.11648/j.ijdsa.20261201.12

Copy | Download

AMA Style

Macharia SM, Imboga H, Kamami W, Mwelu S. Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients. Int J Data Sci Anal. 2026;12(1):10-16. doi: 10.11648/j.ijdsa.20261201.12

Copy | Download

@article{10.11648/j.ijdsa.20261201.12,
  author = {Samuel Mwangi Macharia and Herbert Imboga and Wilson Kamami and Susan Mwelu},
  title = {Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients},
  journal = {International Journal of Data Science and Analysis},
  volume = {12},
  number = {1},
  pages = {10-16},
  doi = {10.11648/j.ijdsa.20261201.12},
  url = {https://doi.org/10.11648/j.ijdsa.20261201.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20261201.12},
  abstract = {Diabetes mellitus is a global health challenge which is associated with various complications such as cardiovascular disease, vision impairment, and kidney failure. Therefore, early detection and accurate prediction of diabetes risk play a significant role in improving the management of the disease and minimising the long-term health complications. Individual machine learning methods that have been applied exhibit various limitations, such as overfitting, which negatively influence the performance due to reduced generalisation capability and high variance, making the model more sensitive to specific data features. The study aimed to solve this issue by applying a stacked ensemble learning technique in enhancing the classification performance of diabetes using the Pima Indian Diabetes Data. The study incorporated various base learners: Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbours (KNN), Gradient–Boosting Machine (GBM) and Logistic regression as a meta-learner. The base models were trained using a 10-fold cross-validation approach to ensure a robust model and minimise overfitting. The study showed that the stacked ensemble technique achieved an average AUC of 0.84 and a standard deviation of 0.05 across all folds, showing a stable predictive performance. To improve on interpretability SHapley Additive exPlanations (SHAP) analysed the contribution of individual features, such as Glucose and Body Mass Index (BMI), which were influential in predicting diabetes risk. Further, the SHAP analysed the contribution of base learners to meta-learner prediction and found Gradient Boosting and Random Forest exerted stronger influence on the stacked ensemble compared to others. Overall, the stacking ensemble provided a robust and reliable approach for an improved diabetes classification performance. Furthermore, the integration of explainable artificial intelligence, such as SHAP, improves model transparency and interpretability among healthcare professionals.},
 year = {2026}
}

Copy | Download

TY  - JOUR
T1  - Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients
AU  - Samuel Mwangi Macharia
AU  - Herbert Imboga
AU  - Wilson Kamami
AU  - Susan Mwelu
Y1  - 2026/06/10
PY  - 2026
N1  - https://doi.org/10.11648/j.ijdsa.20261201.12
DO  - 10.11648/j.ijdsa.20261201.12
T2  - International Journal of Data Science and Analysis
JF  - International Journal of Data Science and Analysis
JO  - International Journal of Data Science and Analysis
SP  - 10
EP  - 16
PB  - Science Publishing Group
SN  - 2575-1891
UR  - https://doi.org/10.11648/j.ijdsa.20261201.12
AB  - Diabetes mellitus is a global health challenge which is associated with various complications such as cardiovascular disease, vision impairment, and kidney failure. Therefore, early detection and accurate prediction of diabetes risk play a significant role in improving the management of the disease and minimising the long-term health complications. Individual machine learning methods that have been applied exhibit various limitations, such as overfitting, which negatively influence the performance due to reduced generalisation capability and high variance, making the model more sensitive to specific data features. The study aimed to solve this issue by applying a stacked ensemble learning technique in enhancing the classification performance of diabetes using the Pima Indian Diabetes Data. The study incorporated various base learners: Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbours (KNN), Gradient–Boosting Machine (GBM) and Logistic regression as a meta-learner. The base models were trained using a 10-fold cross-validation approach to ensure a robust model and minimise overfitting. The study showed that the stacked ensemble technique achieved an average AUC of 0.84 and a standard deviation of 0.05 across all folds, showing a stable predictive performance. To improve on interpretability SHapley Additive exPlanations (SHAP) analysed the contribution of individual features, such as Glucose and Body Mass Index (BMI), which were influential in predicting diabetes risk. Further, the SHAP analysed the contribution of base learners to meta-learner prediction and found Gradient Boosting and Random Forest exerted stronger influence on the stacked ensemble compared to others. Overall, the stacking ensemble provided a robust and reliable approach for an improved diabetes classification performance. Furthermore, the integration of explainable artificial intelligence, such as SHAP, improves model transparency and interpretability among healthcare professionals.
VL  - 12
IS  - 1
ER  -

Copy | Download

Author Information

Samuel Mwangi Macharia

Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya

Contact Email

http://orcid.org/0009-0003-1135-6256
Herbert Imboga

Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya

Contact Email

http://orcid.org/0009-0003-9963-4977
Wilson Kamami

Department of Data Science and Artificial Intelligence, International Business Science and Technology University (ISBAT), Kampala, Uganda

Contact Email

http://orcid.org/0009-0000-2828-8438
Susan Mwelu

Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya

Contact Email

http://orcid.org/0009-0005-9570-9112

Download PDF

Submit an Article

Sections

Plain Text BibTeX RIS

APA Style

Macharia, S. M., Imboga, H., Kamami, W., Mwelu, S. (2026). Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients. International Journal of Data Science and Analysis, 12(1), 10-16. https://doi.org/10.11648/j.ijdsa.20261201.12

Copy | Download

ACS Style

Macharia, S. M.; Imboga, H.; Kamami, W.; Mwelu, S. Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients. Int. J. Data Sci. Anal. 2026, 12(1), 10-16. doi: 10.11648/j.ijdsa.20261201.12

Copy | Download

AMA Style

Macharia SM, Imboga H, Kamami W, Mwelu S. Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients. Int J Data Sci Anal. 2026;12(1):10-16. doi: 10.11648/j.ijdsa.20261201.12

Copy | Download

@article{10.11648/j.ijdsa.20261201.12,
  author = {Samuel Mwangi Macharia and Herbert Imboga and Wilson Kamami and Susan Mwelu},
  title = {Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients},
  journal = {International Journal of Data Science and Analysis},
  volume = {12},
  number = {1},
  pages = {10-16},
  doi = {10.11648/j.ijdsa.20261201.12},
  url = {https://doi.org/10.11648/j.ijdsa.20261201.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20261201.12},
  abstract = {Diabetes mellitus is a global health challenge which is associated with various complications such as cardiovascular disease, vision impairment, and kidney failure. Therefore, early detection and accurate prediction of diabetes risk play a significant role in improving the management of the disease and minimising the long-term health complications. Individual machine learning methods that have been applied exhibit various limitations, such as overfitting, which negatively influence the performance due to reduced generalisation capability and high variance, making the model more sensitive to specific data features. The study aimed to solve this issue by applying a stacked ensemble learning technique in enhancing the classification performance of diabetes using the Pima Indian Diabetes Data. The study incorporated various base learners: Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbours (KNN), Gradient–Boosting Machine (GBM) and Logistic regression as a meta-learner. The base models were trained using a 10-fold cross-validation approach to ensure a robust model and minimise overfitting. The study showed that the stacked ensemble technique achieved an average AUC of 0.84 and a standard deviation of 0.05 across all folds, showing a stable predictive performance. To improve on interpretability SHapley Additive exPlanations (SHAP) analysed the contribution of individual features, such as Glucose and Body Mass Index (BMI), which were influential in predicting diabetes risk. Further, the SHAP analysed the contribution of base learners to meta-learner prediction and found Gradient Boosting and Random Forest exerted stronger influence on the stacked ensemble compared to others. Overall, the stacking ensemble provided a robust and reliable approach for an improved diabetes classification performance. Furthermore, the integration of explainable artificial intelligence, such as SHAP, improves model transparency and interpretability among healthcare professionals.},
 year = {2026}
}

Copy | Download

TY  - JOUR
T1  - Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients
AU  - Samuel Mwangi Macharia
AU  - Herbert Imboga
AU  - Wilson Kamami
AU  - Susan Mwelu
Y1  - 2026/06/10
PY  - 2026
N1  - https://doi.org/10.11648/j.ijdsa.20261201.12
DO  - 10.11648/j.ijdsa.20261201.12
T2  - International Journal of Data Science and Analysis
JF  - International Journal of Data Science and Analysis
JO  - International Journal of Data Science and Analysis
SP  - 10
EP  - 16
PB  - Science Publishing Group
SN  - 2575-1891
UR  - https://doi.org/10.11648/j.ijdsa.20261201.12
AB  - Diabetes mellitus is a global health challenge which is associated with various complications such as cardiovascular disease, vision impairment, and kidney failure. Therefore, early detection and accurate prediction of diabetes risk play a significant role in improving the management of the disease and minimising the long-term health complications. Individual machine learning methods that have been applied exhibit various limitations, such as overfitting, which negatively influence the performance due to reduced generalisation capability and high variance, making the model more sensitive to specific data features. The study aimed to solve this issue by applying a stacked ensemble learning technique in enhancing the classification performance of diabetes using the Pima Indian Diabetes Data. The study incorporated various base learners: Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbours (KNN), Gradient–Boosting Machine (GBM) and Logistic regression as a meta-learner. The base models were trained using a 10-fold cross-validation approach to ensure a robust model and minimise overfitting. The study showed that the stacked ensemble technique achieved an average AUC of 0.84 and a standard deviation of 0.05 across all folds, showing a stable predictive performance. To improve on interpretability SHapley Additive exPlanations (SHAP) analysed the contribution of individual features, such as Glucose and Body Mass Index (BMI), which were influential in predicting diabetes risk. Further, the SHAP analysed the contribution of base learners to meta-learner prediction and found Gradient Boosting and Random Forest exerted stronger influence on the stacked ensemble compared to others. Overall, the stacking ensemble provided a robust and reliable approach for an improved diabetes classification performance. Furthermore, the integration of explainable artificial intelligence, such as SHAP, improves model transparency and interpretability among healthcare professionals.
VL  - 12
IS  - 1
ER  -

Copy | Download