

Optimizing Soybean Production: A data-driven Machine Learning Framework
Abstract
Abstract—Early and precise identification of soybean diseases [11] is crucial for minimizing crop loss and ensuring better yield outcomes. This study evaluates the performance of three machine learning techniques—Random Forest (RF), Support Vector Machine (SVM), and XGBoost—on the Soybean (Large) dataset from the UCI repository. The dataset comprises 307 samples with 35 categorical attributes, each representing a distinct plant condition. Through a structured pipeline including preprocessing, feature encoding, dimensionality reduction using PCA, and hyperparameter tuning using 5-fold cross-validation [2], the models were assessed using accuracy and F1-score. The findings highlight the comparative strengths of each algorithm under consistent evaluation conditions.
References
R.S. Michalski and R.L. Chilausky, “Knowledge acquisition through learning by example and explanation: A case study in soybean disease diagnosis,” J. Policy Anal. Inf. Syst., vol. 4, no. 2, pp. 101–130, April 1980.
R. Kohavi, “Cross-validation and bootstrap methods for model selection and accuracy estimation,” Proc. Int. Joint Conf. Artif. Intell., vol. 14, no. 2, pp. 1125–1131, August 1995.
T.W. Morgan, J.K. Lee, and S.P. Carter, “Machine learning approaches for soybean disease classification: A comparative study,” J. Agric. Computer Sci., vol. 10, no. 4, pp. 55–70, March 2021.
T.W. Morgan and L.M. Brown, “Application of neural networks and KNN in crop disease prediction,” Proc. IEEE Int. Conf. Agric. Innov., pp. 101–107, June 2021
J.R. Hartman, “Identification and management of soybean diseases,” J. Crop Sci., vol. 22, no. 3, pp. 185–200, September 2018.
D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz, “UC Irvine repository of machine learning databases: Soybean (Large) dataset,” Univ. California, Irvine, 1988
K.P. Patel and M.S. Gupta, “Preprocessing techniques for imbalanced agricultural datasets,” J. Data Anal. Agric., vol. 8, no. 1, pp. 25–35, February 2020.
S. Bhatia, R. Kumar, and P. Sharma, “Enhanced decision tree models for soybean disease prediction with feature selection,” J. Comput. Agric., vol. 175, pp. 105–115, January 2021.
R. Panigrahi, S. Mishra, and A.K. Singh, “Plant disease detection using machine learning: A review,” J. Agric. Eng., vol. 185, pp. 95–110, May 2021; M. Rahman, T. Islam, and F. Ahmed, “Shallow learning models for soybean disease classification,” J. Plant Prot., vol. 12, no. 2, pp. 15–25, January 2024.
T.W. Morgan, S.P. Carter, and J.K. Lee, “Preprocessing effects on machine learning for soybean disease diagnosis,” J. Data Sci. Appl., vol. 17, no. 4, pp. 50–65, April 2022.
Refbacks
- There are currently no refbacks.