Research Article | | Peer-Reviewed

Leveraging Machine Learning Models to Predict HIV/AIDS Treatment Interruption in Patients in Machakos County, Kenya

Received: 2 October 2025     Accepted: 17 October 2025     Published: 7 November 2025
Views:       Downloads:
Abstract

HIV/AIDS remains a major global health challenge, with Sub-Saharan Africa carrying the highest burden. In Kenya, where adult prevalence is 4.3%, treatment interruption (IIT) continues to undermine antiretroviral therapy (ART) outcomes. This study applied machine learning (ML) to identify predictors of IIT and guide interventions in Machakos County, where prevalence is 3.3% and relies on manual appointment management of patients, physical tracing and phone tracing of patients. A retrospective cross-sectional study used secondary data from KenyaEMR covering 14,339 adults on ART between 2020 and 2024. Data preprocessing included cleaning, anonymization, imputation, encoding, LASSO feature selection, and SMOTE oversampling. Descriptive statistics and chi-square tests assessed associations, while Random Forest (RF), XGBoost, and Support Vector Machine (SVM) models were trained and validated to predict IIT. Overall, 910 patients (6%) experienced IIT. Risk was highest among adolescents and young adults (15-24 years), single individuals, urban residents, patients with viral load ≥1000 cps, those on ART <12 months, TB co-infected, and non-DTG regimen users. Poor adherence, unstable status, lack of phone ownership, and shorter refill durations also predicted IIT. Non-significant factors included sex, CD4 count, counseling, and clinic workload. Among models, RF achieved the best performance (recall 0.97, precision 0.87, F1 0.92, AUROC 0.96, accuracy 0.91), outperforming XGBoost and SVM. IIT in Machakos County is shaped by demographic, clinical, socioeconomic, and health system factors. Random Forest showed the best predictive capacity, highlighting the value of ML for early identification of at-risk patients. Strategies should include DTG scale-up, early retention support, multi-month dispensing, and digital health interventions. Integrating predictive analytics into EMRs can strengthen HIV program outcomes.

Published in International Journal of Data Science and Analysis (Volume 11, Issue 6)
DOI 10.11648/j.ijdsa.20251106.11
Page(s) 158-170
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

HIV/AIDS Treatment Interruption, Antiretroviral Therapy Adherence, Machine Learning in Healthcare, XGBoost, Random Forest, Support Vector Machine, Electronic Medical Records, Machakos County

References
[1] Alum, E., Okechukwu, U., & Emmanuel, I. F. (n.d.). Curtailing HIV/AIDS Spread: Impact of Religious Leaders. Retrieved February 18, 2025, from
[2] Bhavinkumar, K., Umer, S., Sham, T., & Areeb, S. (2024). Diabetes Prediction Using Machine Learning. International Journal of Novel Research and Development (IJNRD).
[3] Fauci, A. S., & Lane, H. C. (2020). Four decades of HIV/AIDS — Much accomplished, much to do. New England Journal of Medicine, 383(1), 1-4.
[4] Jain, T., Jain, A., Hada, P., Kumar, H., Verma, V., & Patni, A. (2021). Machine learning techniques for prediction of mental health. Proceedings of the IEEE International Conference on Intelligent and Computing Research Applications (ICIRCA), 1606-1613.
[5] Liyew, C. M., & Melese, H. A. (2021). Machine learning techniques to predict daily rainfall amount. Journal of Big Data, 8(1), 153.
[6] Mahesh, B. (2019). Machine learning algorithms: A review. International Journal of Science and Research (IJSR), 9(1), 381-386.
[7] Schonlau, M., & Zou, R. Y. (2020). The random forest algorithm for statistical learning. Statistical Methods in Medical Research, 29(7), 2074-2094.
[8] Ministry of Health, Kenya. (2022a). HIV Prevention Delivery Landscape in Kenya. Nairobi: Government of Kenya. Ministry of Health, Kenya. (2022b). Kenya World AIDS Day Report. Nairobi: Government of Kenya.
[9] Naar, S., Outlaw, A., MacDonell, K., Jones, M., White, J., Secord, E., & Templin, T. (2023). Information-Motivation-Behavioral Skills model in youth newly starting antiretroviral treatment. AIDS and Behavior, 27(8), 2785-2790.
[10] Pisner, D., & Schnyer, D. (2020). Support vector machine. In Machine Learning: Methods and Applications to Brain Disorders (pp. 101-121). Elsevier.
[11] Safynaz, A.-F., Abeer, M., & Sayed, M. (2021). Applying different machine learning techniques for prediction of COVID-19 severity. IEEE Access, 9, 135697-135707.
[12] Taisheng, L. (2021). Chinese guidelines for the diagnosis and treatment of HIV/AIDS (2021 edition). Infectious Diseases & Immunity, 1(1), 1-10.
[13] Thomadakis, C., Yiannoutsos, C. T., Pantazis, N., Diero, L., Mwangi, A., Musick, B. S., Wools-Kaloustian, K., & Touloumi, G. (2023). The effect of HIV treatment interruption on subsequent immunological response. American Journal of Epidemiology, 192(7), 1181-1191.
[14] Uwishema, O., Taylor, C., Lawal, L., Hamiidah, N., Robert, I., Nasir, A., Chalhoub, E., Sun, J., Akin, B. T., Adanur, I., Mwazighe, R. M., & Onyeaka, H. (2022). The syndemic burden of HIV/AIDS in Africa amidst the COVID-19 pandemic. Immunity, Inflammation and Disease, 10(1), 26-32.
[15] Ross, D. P. T. (2020). Reasons cited for the interruption of antiretroviral treatment in the Bloemfontein/Mangaung area (Doctoral dissertation, University of the Free State).
[16] Stockman, J., Friedman, J., Sundberg, J., Harris, E., & Bailey, L. (2022). Predictive analytics using machine learning to identify ART clients at risk of treatment interruption in Mozambique and Nigeria. JAIDS Journal of Acquired Immune Deficiency Syndromes, 90(2), 154-160.
[17] Ogbechie, M. D., Fischer Walker, C., Lee, M. T., Abba Gana, A., Oduola, A., Idemudia, A., & Persaud, N. E. (2023). Predicting treatment interruption among people living with HIV in Nigeria: A machine learning approach. JMIR AI, 2(1), e44432.
[18] Esra, R., Carstens, J., Le Roux, S., Mabuto, T., Eisenstein, M., Keiser, O., & Sharpey-Schafer, K. (2023). Validation and improvement of a machine learning model to predict interruptions in antiretroviral treatment in South Africa. JAIDS Journal of Acquired Immune Deficiency Syndromes, 92(1), 42-49.
[19] Jackins, V., Vimal, S., Kaliappan, M., & Lee, M. Y. (2021). AI-based smart prediction of clinical disease using random forest classifier and Naïve Bayes. The Journal of Supercomputing, 77(5), 5198-5219.
[20] Tiomoko, M., Schnoor, E., Seddik, M. E. A., Colin, I., & Virmaux, A. (2022). Deciphering Lasso-based classification through large dimensional analysis of the iterative soft-thresholding algorithm. Proceedings of the 39th International Conference on Machine Learning, 21449-21477.
[21] Araveeporn, A. (2021). The higher-order of adaptive Lasso and elastic net methods for classification on high dimensional data. Mathematics, 9(1091).
[22] Menéndez-Arias, L. (2009). Mutation rates and intrinsic fidelity of retroviral reverse transcriptases. Viruses, 1(3), 1137-1165.
[23] Mansky, L. M., & Temin, H. M. (1995). Lower in vivo mutation rate of HIV-1 than predicted from the fidelity of purified reverse transcriptase. Journal of Virology, 69(8), 5087-5094.
[24] Boyer, P. L., Sarafianos, S. G., Arnold, E., & Hughes, S. H. (2014). The M184V mutation reduces the polymerase activity of human immunodeficiency virus type 1 reverse transcriptase. Journal of Virology, 88(8), 4744-4753.
[25] Sarafianos, S. G., Marchand, B., Das, K., Himmel, D. M., Parniak, M. A., Hughes, S. H., & Arnold, E. (2009). Structure and function of HIV-1 reverse transcriptase: Molecular mechanisms of polymerization and inhibition. Journal of Molecular Biology, 385(3), 693-713.
[26] Coffin, J. M. (2013). HIV population dynamics in vivo: Implications for genetic variation, pathogenesis, and therapy. Science, 267(5197), 483-489.
[27] Dlamini, N., et al. (2023). Machine learning models for predicting virological failure among people living with HIV. BMC Medical Informatics and Decision Making, 23(217).
[28] Frontiers in Microbiology. (2025). Applications of artificial intelligence in HIV research: A review of current advances. Frontiers in Microbiology, 16, 1541942.
[29] Mtisi, E. L., Mushy, S. E., Mkawe, S. G., Ndjovu, A., Mboggo, E., Mlay, B. S., & Muya, A. (2023). Risk factors for interruption in treatment among HIV-infected adolescents attending care clinics in Tanzania. AIDS Research and Therapy, 20(1), 19.
[30] Ikpe, S., Gambo, A., Nowak, R., Sorkin, J., Charurat, M., O’Connor, T., & Stafford, K. (2024). Predictors of interruptions in antiretroviral therapy among people living with HIV in Nigeria: A retrospective cohort study. medRxiv.
[31] Tomescu, S., Crompton, T., Adebayo, J., Kinge, C. W., Akpan, F., Rennick, M., & Pisa, P. T. (2021). Factors associated with interruption in treatment among people living with HIV in USAID-supported states in Nigeria: A retrospective study (2000-2020). BMC Public Health, 21(1), 2194.
[32] Mbatia, R. J., Mtisi, E. L., Ismail, A., Henjewele, C. V., Moshi, S. J., Christopher, A. K., & Matiko, E. J. (2023). Interruptions in treatment among adults on antiretroviral therapy before and after test-and-treat policy in Tanzania. PLoS ONE, 18(11), e0292740.
[33] Akpan, U., Kakanfo, K., Ekele, O. D., Ukpong, K., Toyo, O., Nwaokoro, P., & Bateganya, M. (2023). Predictors of treatment interruption among patients on antiretroviral therapy in Akwa Ibom, Nigeria: Outcomes after 12 months. AIDS Care, 35(1), 114-122.
[34] Kim, H., Goldsmith, J. V., Sengupta, S., Mahmood, A., Powell, M. P., Bhatt, J., & Bhuyan, S. S. (2019). Mobile health applications and e-health literacy: Opportunities and concerns for cancer patients and caregivers. Journal of Cancer Education, 34(1), 3-8.
[35] Chang, H. Y., Hou, Y. P., Yeh, F. H., & Lee, S. S. (2020). The impact of an mHealth app on knowledge, skills, and anxiety about dressing changes: A randomized controlled trial. Journal of Advanced Nursing, 76(4), 1046-1056.
[36] Nsoh, M., Tshimwanga, K. E., Ngum, B. A., Mgasa, A., Otieno, M. O., Moali, B., & Halle-Ekane, G. E. (2021). Predictors of antiretroviral therapy interruptions and factors influencing return to care in Cameroon. African Health Sciences, 21(1), 29-38.
Cite This Article
  • APA Style

    Odundo, C., Katila, C., Njuki, S., Onyango, L., Makori, F. (2025). Leveraging Machine Learning Models to Predict HIV/AIDS Treatment Interruption in Patients in Machakos County, Kenya. International Journal of Data Science and Analysis, 11(6), 158-170. https://doi.org/10.11648/j.ijdsa.20251106.11

    Copy | Download

    ACS Style

    Odundo, C.; Katila, C.; Njuki, S.; Onyango, L.; Makori, F. Leveraging Machine Learning Models to Predict HIV/AIDS Treatment Interruption in Patients in Machakos County, Kenya. Int. J. Data Sci. Anal. 2025, 11(6), 158-170. doi: 10.11648/j.ijdsa.20251106.11

    Copy | Download

    AMA Style

    Odundo C, Katila C, Njuki S, Onyango L, Makori F. Leveraging Machine Learning Models to Predict HIV/AIDS Treatment Interruption in Patients in Machakos County, Kenya. Int J Data Sci Anal. 2025;11(6):158-170. doi: 10.11648/j.ijdsa.20251106.11

    Copy | Download

  • @article{10.11648/j.ijdsa.20251106.11,
      author = {Clifford Odundo and Charles Katila and Sam Njuki and Lena Onyango and Felix Makori},
      title = {Leveraging Machine Learning Models to Predict HIV/AIDS Treatment Interruption in Patients in Machakos County, Kenya
    },
      journal = {International Journal of Data Science and Analysis},
      volume = {11},
      number = {6},
      pages = {158-170},
      doi = {10.11648/j.ijdsa.20251106.11},
      url = {https://doi.org/10.11648/j.ijdsa.20251106.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20251106.11},
      abstract = {HIV/AIDS remains a major global health challenge, with Sub-Saharan Africa carrying the highest burden. In Kenya, where adult prevalence is 4.3%, treatment interruption (IIT) continues to undermine antiretroviral therapy (ART) outcomes. This study applied machine learning (ML) to identify predictors of IIT and guide interventions in Machakos County, where prevalence is 3.3% and relies on manual appointment management of patients, physical tracing and phone tracing of patients. A retrospective cross-sectional study used secondary data from KenyaEMR covering 14,339 adults on ART between 2020 and 2024. Data preprocessing included cleaning, anonymization, imputation, encoding, LASSO feature selection, and SMOTE oversampling. Descriptive statistics and chi-square tests assessed associations, while Random Forest (RF), XGBoost, and Support Vector Machine (SVM) models were trained and validated to predict IIT. Overall, 910 patients (6%) experienced IIT. Risk was highest among adolescents and young adults (15-24 years), single individuals, urban residents, patients with viral load ≥1000 cps, those on ART <12 months, TB co-infected, and non-DTG regimen users. Poor adherence, unstable status, lack of phone ownership, and shorter refill durations also predicted IIT. Non-significant factors included sex, CD4 count, counseling, and clinic workload. Among models, RF achieved the best performance (recall 0.97, precision 0.87, F1 0.92, AUROC 0.96, accuracy 0.91), outperforming XGBoost and SVM. IIT in Machakos County is shaped by demographic, clinical, socioeconomic, and health system factors. Random Forest showed the best predictive capacity, highlighting the value of ML for early identification of at-risk patients. Strategies should include DTG scale-up, early retention support, multi-month dispensing, and digital health interventions. Integrating predictive analytics into EMRs can strengthen HIV program outcomes.
    },
     year = {2025}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Leveraging Machine Learning Models to Predict HIV/AIDS Treatment Interruption in Patients in Machakos County, Kenya
    
    AU  - Clifford Odundo
    AU  - Charles Katila
    AU  - Sam Njuki
    AU  - Lena Onyango
    AU  - Felix Makori
    Y1  - 2025/11/07
    PY  - 2025
    N1  - https://doi.org/10.11648/j.ijdsa.20251106.11
    DO  - 10.11648/j.ijdsa.20251106.11
    T2  - International Journal of Data Science and Analysis
    JF  - International Journal of Data Science and Analysis
    JO  - International Journal of Data Science and Analysis
    SP  - 158
    EP  - 170
    PB  - Science Publishing Group
    SN  - 2575-1891
    UR  - https://doi.org/10.11648/j.ijdsa.20251106.11
    AB  - HIV/AIDS remains a major global health challenge, with Sub-Saharan Africa carrying the highest burden. In Kenya, where adult prevalence is 4.3%, treatment interruption (IIT) continues to undermine antiretroviral therapy (ART) outcomes. This study applied machine learning (ML) to identify predictors of IIT and guide interventions in Machakos County, where prevalence is 3.3% and relies on manual appointment management of patients, physical tracing and phone tracing of patients. A retrospective cross-sectional study used secondary data from KenyaEMR covering 14,339 adults on ART between 2020 and 2024. Data preprocessing included cleaning, anonymization, imputation, encoding, LASSO feature selection, and SMOTE oversampling. Descriptive statistics and chi-square tests assessed associations, while Random Forest (RF), XGBoost, and Support Vector Machine (SVM) models were trained and validated to predict IIT. Overall, 910 patients (6%) experienced IIT. Risk was highest among adolescents and young adults (15-24 years), single individuals, urban residents, patients with viral load ≥1000 cps, those on ART <12 months, TB co-infected, and non-DTG regimen users. Poor adherence, unstable status, lack of phone ownership, and shorter refill durations also predicted IIT. Non-significant factors included sex, CD4 count, counseling, and clinic workload. Among models, RF achieved the best performance (recall 0.97, precision 0.87, F1 0.92, AUROC 0.96, accuracy 0.91), outperforming XGBoost and SVM. IIT in Machakos County is shaped by demographic, clinical, socioeconomic, and health system factors. Random Forest showed the best predictive capacity, highlighting the value of ML for early identification of at-risk patients. Strategies should include DTG scale-up, early retention support, multi-month dispensing, and digital health interventions. Integrating predictive analytics into EMRs can strengthen HIV program outcomes.
    
    VL  - 11
    IS  - 6
    ER  - 

    Copy | Download

Author Information
  • Sections