Open Access Open Access  Restricted Access Subscription Access

Training Data Alchemy: Balancing Quality and Quantity in Machine Learning Training

T. Aditya Sai Srinivas, B. Thulasi Thanmai, A. David Donald, G. Thippanna, I. V. Dwaraka Srihith, I. Venkat Sai

Abstract


Determining the optimal amount of training data for machine learning algorithms is a critical task in achieving successful and accurate models. This abstract delves into the research surrounding this question and provides insights into the factors that affect the quantity of training data required for effective machine learning. It explores the delicate balance between data quality and quantity, the concept of overfitting, and the importance of representative and diverse datasets. Additionally, it discusses the various techniques and approaches used to estimate the minimum training data required for achieving desirable performance. By understanding the implications of training data size on model performance, researchers and practitioners can make informed decisions in selecting appropriate training datasets, thereby maximizing the efficiency and effectiveness of machine learning algorithms.


Full Text:

PDF

References


Halevy, A., Norvig, P., & Pereira, F. (2009). The Unreasonable Effectiveness of Data. IEEE Intelligent Systems, 24(2), 8-12.

Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G. S., Ng, A. (2012). Building high-level features using large-scale unsupervised learning. In Proceedings of the 29th International Conference on Machine Learning (ICML-12):1025-1032.

Rudin, C. (2019). The Mythos of Model Interpretability. Journal of the Royal Statistical Society: Series A (Statistics in Society), 182(3), 1019-1047.

He, H., Bai, Y., Garcia, E. A., & Li, S. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284.

Bengio, Y. (2012). Practical Recommendations for Gradient-Based Training of Deep Architectures. In Neural Networks: Tricks of the Trade.:437-478).


Refbacks

  • There are currently no refbacks.