Explaining Cholesterol-Related Coronary Artery Disease Risk Using Machine Learning and SHAP

Keywords: Coronary Artery Disease, Dyslipidemia, Logistic Regression, Random Forest, SHAP Explainability

Abstract

Coronary Artery Disease (CAD) remains a leading cause of global mortality, with dyslipidemia recognized as a major modifiable risk factor. This study investigates the relationship between serum lipid parameters and CAD using the Z-Alizadeh Sani clinical dataset comprising 303 patients with 55 clinical, biochemical, and electrocardiographic attributes. Logistic Regression (LR) and Random Forest (RF) models were developed to predict CAD status, supported by a standardized preprocessing pipeline, multi-split train–test evaluation (70/30, 80/20, 90/10), and performance assessment using Accuracy, Precision, Recall, F1-Score, and AUC-ROC. SHapley Additive exPlanations (SHAP) were employed to enhance model interpretability and quantify the contribution of lipid-related and clinical features to individual predictions. The RF model consistently outperformed LR across all split configurations, achieving a maximum AUC of 0.96, while LR attained an AUC of 0.90. SHAP analysis revealed that total cholesterol (CHOL) and low-density lipoprotein (LDL) were strong positive predictors of CAD, whereas high-density lipoprotein (HDL) exhibited a protective effect, in line with established cardiovascular pathophysiology. These findings demonstrate that integrating explainable machine learning with routine clinical lipid profiles can provide accurate and transparent decision support for early CAD risk stratification.

Downloads

Download data is not yet available.

Author Biographies

Eka Pandu Cynthia

Department of Artificial Intelligence, Faculty of Computing and Meta Technology, Sultan Idris Education University. Perak, Malaysia.

Suzani Mohamad Samuri

Department of Artificial Intelligence, Faculty of Computing and Meta Technology, Sultan Idris Education University. Perak, Malaysia.

Wang Shir Li

Department of Artificial Intelligence, Faculty of Computing and Meta Technology, Sultan Idris Education University. Perak, Malaysia.

Alabbas Hussein Saeed

Department of General Practitioners, Faculty of Medicine, Hasanuddin University. Makassar, Indonesia.

Inggih Permana

Department of Informatics Engineering, Faculty of Science and Technology, State Islamic University of Sultan Syarif Kasim Riau. Pekanbaru, Indonesia.

Febi Yanto

Department of Informatics Engineering, Faculty of Science and Technology, State Islamic University of Sultan Syarif Kasim Riau. Pekanbaru, Indonesia.

This is an open access article, licensed under CC-BY-SA

Creative Commons License
Published
        Views : 65
2026-03-19
    Downloads : 52
How to Cite
[1]
E. P. Cynthia, S. Mohamad Samuri, W. Shir Li, A. H. Saeed, I. Permana, and F. Yanto, “Explaining Cholesterol-Related Coronary Artery Disease Risk Using Machine Learning and SHAP”, International Journal of Recent Technology and Applied Science, vol. 8, no. 1, pp. 13-24, Mar. 2026.
Section
Articles

References

World Health Organization, “Cardiovascular Diseases (CVDs),” Jun. 11, 2021. [Online]. Available: https: //www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds. [Accessed: August 7, 2025].

S. Yusuf et al., “Modifiable risk factors, cardiovascular disease, and mortality in 155 722 individuals from 21 high-income, middle-income, and low-income countries (PURE): a prospective cohort study,” The Lancet, vol. 395, no. 10226, pp. 795–808, Mar. 2020, doi: 10.1016/S0140-6736(19)32008-2.

A. Rahmiyah, I. B. Pakki, and I. M. Ramdan, “Analysis of Risk Factors for the Incidence of Coronary Heart Disease,” Indonesian Journal of Global Health Research, vol. 7, no. 6, Dec. 2025, doi: 10.37287/ijghr.v7i6.163.

A. S. Nugroho, E. Astutik, and T. D. Tama, “Risk Factors for Coronary Heart Disease in Productive Age Group in Indonesia,” Malaysian Journal of Medicine and Health Sciences, vol. 18, no. 2, pp. 99–105, Mar. 2022.

P. Libby, “The changing landscape of atherosclerosis,” Nature, vol. 592, no. 7855, pp. 524–533, Apr. 2021, doi: 10.1038/s41586-021-03392-8.

A. Ratnadhiyani, D. Wulandari, and Hermansyah, “Dominant Risk Factors Coronary Artery Disease in Cardiac Patients in the Cardiac Clinic of the Hospital Bengkulu Province,” Indonesian Journal of Health Service and Research, vol. 6, no. 2, pp. 69–75, 2024, doi: 10.36566/ijhsrd/Vol6.Iss2/280

M. Sayadi, V. Varadarajan, F. Sadoughi, S. Chopannejad, and M. Langarizadeh, “A Machine Learning Model for Detection of Coronary Artery Disease Using Noninvasive Clinical Parameters,” Life, vol. 12, no. 11, p. 1933, Nov. 2022, doi: 10.3390/life12111933

A. Maach, J. Elalami, N. Elalami, and E. H. El Mazoudi, “An Intelligent Decision Support Ensemble Voting Model for Coronary Artery Disease Prediction in Smart Healthcare Monitoring Environments,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 9, 2022, doi: 10.14569/IJACSA.2022.0130984.

C. Krittanawong, H. U. H. Virk, S. Bangalore, et al., “Machine learning prediction in cardiovascular diseases: a meta-analysis,” Sci. Rep., vol. 10, no. 1, Art. no. 16057, Sep. 2020, doi: 10.1038/s41598-020-72685-1

L. Moyé, “Statistical Methods for Cardiovascular Researchers,” Circ. Res., vol. 118, no. 3, pp. 439–453, Feb. 2016, doi: 10.1161/CIRCRESAHA.115.306305

J. C. Brown, T. E. Gerhardt, and E. Kwon, “Risk Factors for Coronary Artery Disease,” in StatPearls, Treasure Island (FL): StatPearls Publishing, Jan. 23, 2023. [Online]. Available: https: //www.ncbi.nlm.nih.gov/books/NBK554410/. [Accessed: August 7, 2025].

S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017, pp. 4765–4774.

T. Vu et al., “Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights from a Japanese Population-Based Study,” JMIR Cardio, vol. 9, p. e68066, May 2025, doi: 10.2196/68066.

P. Shah, M. Shukla, N. H. Dholakia, and H. Gupta, “Predicting cardiovascular risk with hybrid ensemble learning and explainable AI,” Scientific Reports, vol. 15, no. 1, p. 17927, May 2025, doi: 10.1038/s41598-025-01650-7.

C. M. M. Mansoor, S. K. Chettri, and H. M. M. Naleer, “Development of an efficient novel method for coronary artery disease prediction using machine learning and deep learning techniques,” Technology and Health Care, vol. 32, no. 6, pp. 4545–4569, 2024, doi: 10.3233/THC-240740.

R. Alizadehsani et al., “A data mining approach for diagnosis of coronary artery disease,” Computer Methods and Programs in Biomedicine, vol. 111, no. 1, pp. 52–61, Jul. 2013, doi: 10.1016/j.cmpb.2013.03.004

I. Rahmawati, D. Dwiana, and R. S. Ratiyun, “Relationship of Diabetes Mellitus (DM) with Coronary Heart Disease (CHD) in Patients Who Treat Heart Poly,” Journal of Nursing and Public Health, vol. 10, no. 1, pp. 69–75, May 2022, doi: 10.37676/jnph.v10i1.2383.

A. H. Elmi, A. Abdullahi, and M. A. Barre, “A machine learning approach to cardiovascular disease prediction with advanced feature selection,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 33, no. 2, pp. 1030–1041, Feb. 2024, doi: 10.11591/ijeecs.v33.i2.pp1030-1041.

W. Dong et al., “Interpretable machine learning analysis of immunoinflammatory biomarkers for predicting CHD among NAFLD patients,” Cardiovascular Diabetology, vol. 24, no. 1, p. 263, Jul. 2025, doi: 10.1186/s12933-025-02818-1.

C. Xu, F. Shi, W. Ding et al., “Development and validation of a machine learning model for cardiovascular disease risk prediction in type 2 diabetes patients,” Scientific Reports, vol. 15, p. 32818, 2025, doi: 10.1038/s41598-025-18443-7.

Y. Chen et al., “Machine learning-based coronary heart disease diagnosis model for type 2 diabetes patients,” Frontiers in Endocrinology, vol. 16, p. 1550793, May 2025, doi: 10.3389/fendo.2025.1550793.

S. Bajaj and A. Khan, “Antioxidants and diabetes,” Indian Journal of Endocrinology and Metabolism, vol. 16, no. Suppl 2, pp. S267–S271, Dec. 2012, doi: 10.4103/2230-8210.104057.

J. Han et al., “Predicting low density lipoprotein cholesterol target attainment using machine learning in patients with coronary artery disease receiving moderate-dose statin therapy,” Scientific Reports, vol. 15, no. 1, p. 5346, Feb. 2025, doi: 10.1038/s41598-025-88693-y.

A. Ciołek and G. Piotrowski, “Comparison of Diagnostic Parameters of Acute Coronary Syndromes in Patients with and without Cancer: A Multifactorial Analysis,” Current Oncology, vol. 31, no. 8, pp. 4769–4780, Aug. 2024, doi: 10.3390/curroncol31080357.

S. Hilary et al., “Effect of ketogenic diets on lipid metabolism in adults: protocol for a systematic review,” BMJ Open, vol. 14, no. 9, p. e076938, Sep. 2024, doi: 10.1136/bmjopen-2023-076938

I. Wajid, L. Dan, and Q. Wang, “Hybrid Ensemble Approaches for Cardiovascular Disease Prediction: Leveraging Interpretable AI for Clinical Insight,” Intelligence-Based Medicine, vol. 12, p. 100297, Sep. 2025, doi: 10.1016/j.ibmed.2025.100297.

J. H. Joloudari et al., “FCM-DNN: diagnosing coronary artery disease by deep accuracy fuzzy C-means clustering model,” Mathematical Biosciences and Engineering, vol. 19, no. 4, pp. 3609–3635, Feb. 2022, doi: 10.3934/mbe.2022167.

H. E. Massari, N. Gherabi, S. Mhammedi, and Z. Sabouri, “Ontology-Based Decision Tree Model for Prediction of Cardiovascular Disease,” Indian Journal of Computer Science and Engineering, vol. 13, no. 3, pp. 851–859, Jun. 2022, doi: 10.21817/indjcse/2022/v13i3/221303143.

R. I. Sari, “Health education about coronary heart disease in rural areas,” Jerkin Health Education Journal, vol. 8, no. 1, pp. 41–49, 2025.

T. Liu, A. Krentz, L. Lu, and V. Curcin, “Machine learning based prediction models for cardiovascular disease risk using electronic health records data: systematic review and meta-analysis,” European Heart Journal - Digital Health, vol. 6, no. 1, pp. 7–22, Oct. 2024, doi: 10.1093/ehjdh/ztae080

T. J. H. Mim et al., “Machine Learning Approaches for Cardiovascular Disease Prediction: A Comparative Study,” Biomedical Materials & Devices, 2025, doi: 10.1007/s44174-025-00564-.

I. Jahan, M. T. R. Laskar, C. Peng, and J. X. Huang, “A comprehensive evaluation of large Language models on benchmark biomedical text processing tasks,” Computers in Biology and Medicine, vol. 171, p. 108189, Mar. 2024, doi: 10.1016/j.compbiomed.2024.108189.

D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 1, p. 6, Jan. 2020, doi: 10.1186/s12864-019-6413-7.