An AI-Driven Evaluation of Resampling Strategies for Imbalanced Brain Stroke Prediction
Abstract
Class imbalance poses a major challenge in brain stroke prediction, as no of stroke cases are very less compare to non-stroke cases despite their critical clinical importance. This study presents a comparative evaluation of multiple machine learning classifiers on an imbalanced brain stroke dataset by applying different resampling strategies, including random oversampling, random under sampling, SMOTE, and SMOTE-ENN. The performance of Accuracy, precision, recall, and F1-score are used as gauges for the K-Nearest Neighbors, Gaussian Naïve Bayes, Decision Tree, as well as Random Forest systems. Experimental results reveal that models trained on the original imbalanced dataset achieve high overall accuracy but fail to adequately identify stroke cases, leading to weak minority-class performance. While oversampling enhances minority-class recall, it introduces a noticeable reduction in precision, and under sampling causes information loss that degrades model effectiveness. In contrast, SMOTE and SMOTE-ENN significantly improve class balance by concurrently boosting both precision and recall for the minority class. Among the evaluated approaches, the Random Forest model combined with SMOTE-ENN demonstrates superior performance, achieving the highest overall accuracy of 99% and well-balanced F1-scores of 99% across both classes
References
N. Melnykova et al., “Machine learning for stroke prediction using imbalanced data,” Sci. Rep., vol. 15, Art. no. 33773, 2025.
M. Rahardi et al., “Analyzing the Impact of Data Resampling on Stroke Prediction using Machine Learning,” Eng. Technol. Appl. Sci. Res., vol. 15, no. 2, pp. 20790–20797, Apr. 2025.
G. Husain, “SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Imbalanced Data Classification,” Algorithms, vol. 18, no. 1, 2025.
U. A. Ukpebor et al., “Machine Learning Models for Stroke Risk Prediction on Imbalanced Clinical Data,” Int. J. Eng. Technol. Innov., 2025.
M. A. Mahin, “Enhancing Stroke Risk Prediction with Explainable AI,” J. Neurosci. Vis. Sys., 2025.
R. Suguna et al., “Mitigating Class Imbalance in Churn Prediction with Ensemble Methods and SMOTE,” Sci. Rep., vol. 15, Art. no. 16256, 2025.
A. Aich et al., “CopulaSMOTE: A Copula-Based Oversampling Approach for Imbalanced Classification in Diabetes Prediction,” arXiv:2506.17326, Jun. 2025.
Y. Islam et al., “Advancing Tabular Stroke Modelling Through a Novel Hybrid Architecture and Feature-Selection Synergy,” arXiv:2505.15844, May 2025.
J. Hemmatian, R. Hajizadeh, and F. Nazari, “Addressing Imbalanced Data Classification with Cluster-Based Reduced Noise SMOTE,” PLoS ONE, vol. 20, no. 2, Feb. 2025.
K. S. Raslan et al., “iHHO-SMOTe: A Cleansed Approach for Handling Outliers and Reducing Noise to Improve Imbalanced Data Classification,” arXiv:2504.12850, Apr. 2025.
Refbacks
- There are currently no refbacks.