AUTOMATED MACHINE LEARNING PIPELINE WITH MULTI-ALGORITHM
Abstract
The rapid expansion of data-driven applications demands efficient and accessible tools for machine learning model development. This paper presents an Automated Machine Learning (AutoML) Pipeline System — a Flask-based web application that enables end-to-end automation of the classification workflow. The system accepts CSV datasets uploaded by the user, automatically performs preprocessing including label encoding, splits data into training and test sets, and simultaneously trains four widely-used classification algorithms: Logistic Regression, Decision Tree, Random Forest, and K-Nearest Neighbours (KNN). Accuracy scores are computed for each model and the best-performing algorithm is identified and reported. Results are visualised through a comparative bar chart rendered within the web interface. The system is evaluated on two benchmark datasets — the Pima Indians Diabetes dataset and the UCI Heart Disease dataset. The proposed platform lowers the barrier for non-expert users to apply machine learning and provides a reproducible, extensible baseline for automated model selection research.
References
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter, "Efficient and Robust Automated Machine Learning," in Advances in Neural Information Processing Systems (NeurIPS), vol. 28, 2015.
C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, "Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms," in Proc. ACM SIGKDD, 2013, pp. 847–855.
E. LeDell and S. Poirier, "H2O AutoML: Scalable Automatic Machine Learning," in Proc. AutoML Workshop at ICML, 2020.
L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
T. M. Cover and P. E. Hart, "Nearest Neighbor Pattern Classification," IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, 1967.
J. R. Quinlan, "Induction of Decision Trees," Machine Learning, vol. 1, no. 1, pp. 81–106, 1986.
D. R. Cox, "The Regression Analysis of Binary Sequences," Journal of the Royal Statistical Society: Series B, vol. 20, no. 2, pp. 215–232, 1958.
F. Pedregosa et al., "Scikit-learn: Machine Learning in Python," Journal of Machine Learning Research, vol. 12,
pp. 2825–2830, 2011.
W. McKinney, "Data Structures for Statistical Computing in Python," in Proc. 9th Python in Science Conference (SciPy), 2010, pp. 56–61.
A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd ed. Sebastopol, CA: O'Reilly Media, 2019.
Refbacks
- There are currently no refbacks.