Treffer: Fine tuned CatBoost machine learning approach for early detection of cardiovascular disease through predictive modeling.

Title:
Fine tuned CatBoost machine learning approach for early detection of cardiovascular disease through predictive modeling.
Authors:
Hamid M; Department of Computer Science, Government College Women University Sialkot, Sialkot, 51310, Pakistan., Hajjej F; Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, 11671, Saudi Arabia. fshajjej@pnu.edu.sa., Alluhaidan AS; Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, 11671, Saudi Arabia., Bin Mannie NW; Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, 11671, Saudi Arabia.
Source:
Scientific reports [Sci Rep] 2025 Aug 25; Vol. 15 (1), pp. 31199. Date of Electronic Publication: 2025 Aug 25.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Nature Publishing Group Country of Publication: England NLM ID: 101563288 Publication Model: Electronic Cited Medium: Internet ISSN: 2045-2322 (Electronic) Linking ISSN: 20452322 NLM ISO Abbreviation: Sci Rep Subsets: MEDLINE
Imprint Name(s):
Original Publication: London : Nature Publishing Group, copyright 2011-
References:
Eur Heart J. 2021 Sep 7;42(34):3227-3337. (PMID: 34458905)
Am J Prev Cardiol. 2022 Mar 15;10:100335. (PMID: 35342890)
Math Biosci Eng. 2024 Jan 29;21(2):2943-2969. (PMID: 38454714)
BMJ Open. 2021 Jul 23;11(7):e044779. (PMID: 34301649)
Diagnostics (Basel). 2024 Aug 28;14(17):. (PMID: 39272680)
Diabetes Metab Syndr. 2020 May - Jun;14(3):247-250. (PMID: 32247212)
Comput Math Methods Med. 2022 May 2;2022:6517716. (PMID: 35547562)
Int J Cardiol. 2025 Feb 1;420:132757. (PMID: 39615697)
Comput Med Imaging Graph. 2023 Dec;110:102313. (PMID: 38011781)
BMC Med Res Methodol. 2022 Dec 17;22(1):325. (PMID: 36528631)
Sci Rep. 2020 Sep 29;10(1):16057. (PMID: 32994452)
Am J Prev Cardiol. 2022 Apr 06;10:100342. (PMID: 35517870)
Comput Biol Med. 2025 Feb;185:109503. (PMID: 39647242)
Biology (Basel). 2023 Jan 11;12(1):. (PMID: 36671809)
Front Med (Lausanne). 2022 Jan 18;8:814566. (PMID: 35118099)
PLoS One. 2019 May 15;14(5):e0213653. (PMID: 31091238)
Sensors (Basel). 2023 Sep 07;23(18):. (PMID: 37765780)
Eur Heart J. 2019 Jun 21;40(24):1975-1986. (PMID: 30060039)
Diabetes Care. 2021 Jan;44(Suppl 1):S125-S150. (PMID: 33298421)
EClinicalMedicine. 2024 May 27;73:102660. (PMID: 38846068)
Diagn Progn Res. 2017 Dec 21;1:20. (PMID: 31093549)
Grant Information:
NP- 45-090 Deanship of Scientific Research and Libraries at Princess Nourah bint Abdulrahman University, through the "Nafea" Program
Contributed Indexing:
Keywords: Cardiovascular disease; Disease prevention; Feature selection; Predictive modeling; Quality of life; Risk assessment
Entry Date(s):
Date Created: 20250825 Date Completed: 20250827 Latest Revision: 20251114
Update Code:
20251114
PubMed Central ID:
PMC12378338
DOI:
10.1038/s41598-025-13790-x
PMID:
40854918
Database:
MEDLINE

Weitere Informationen

Cardiovascular disease (CVD) remains one of the leading causes of morbidity and mortality worldwide, highlighting the urgent need for early-stage diagnosis to improve clinical outcomes. Machine learning (ML) approaches have demonstrated substantial potential in predictive modeling for CVD risk assessment. In this study, we propose an advanced predictive model based on the CatBoost algorithm to classify various stages of CVD using hospital records as the primary data source. The dataset, sourced from a publicly available repository, comprises 12 key predictor variables. The proposed methodology incorporates feature selection, rigorous validation processes, and data augmentation to enhance predictive performance and address the challenges associated with high-dimensional medical data. Among several ML algorithms evaluated, the fine-tuned CatBoost model achieved the highest performance, automating feature selection and facilitating the detection of early-stage heart disease. The model attained an impressive F1-score of 99% and an overall accuracy of 99.02%, outperforming existing ML-based approaches. These findings underscore the potential of the CatBoost algorithm for rapid and accurate CVD diagnosis, thereby supporting clinical decision-making. Future work will focus on external validation and testing on independent datasets to further assess the model's generalizability and clinical applicability.
(© 2025. The Author(s).)

Declarations. Competing interests: The authors declare no competing interests.