

Benchmarking Deep Learning Models for Telugu Document Image Classification
Abstract
This study explores a hybrid machine learning and deep learning framework to classify digitized Telugu documents as either handwritten or machine-printed. The approach evaluates five models: three deep learning architectures AlexNet, ResNet50, and VGG16 and two traditional SVM classifiers using HOG and SURF feature extraction methods. The models are tested on a balanced dataset comprising 500 handwritten and 500 printed Telugu document images. Preprocessing steps such as median filtering and histogram equalization are employed to enhance image quality before feature extraction. Among all the models, VGG16 delivers the best performance, achieving a classification accuracy of 99.5%, surpassing ResNet50 (93.5%), AlexNet (92.5%), and SVM variants (90–92%). Additional evaluation using precision, recall, and F1-score confirms the robustness of VGG16. The results underscore the effectiveness of deep learning in script-specific document analysis and set a strong benchmark for future research in Telugu text classification.
References
Afzal, M. Z., Kölsch, A., Ahmed, S., & Dengel, A. (2015). DeepDocClassifier: Document classification with deep convolutional neural network. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (pp. 1111–1115). IEEE. https://doi.org/10.1109/ICDAR.2015.7333912
Dash, P. (2021). Classification of retinal diseases from OCT images using 2D-convolutional neural networks. International Journal of Future Generation Communication and Networking, 14(1), 5049–5060.
Fan, K. C., Wang, Y. K., & Wu, H. Y. (1998). Classification of machine-printed and handwritten texts using character block layout variance. Pattern Recognition, 31(9), 1275–1284. https://doi.org/10.1016/S0031-3203(98)00002-4
Guo, J. K., & Ma, M. Y. (2001). Separating handwritten material from machine printed text using hidden Markov models. In Proceedings of the Sixth International Conference on Document Analysis and Recognition (pp. 439–443). IEEE. https://doi.org/10.1109/ICDAR.2001.953850
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778). IEEE. https://doi.org/10.1109/CVPR.2016.90
Hossin, M., & Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 1–11.
Imade, S., Wada, T., Yoshinari, S., & Iwata, M. (1993). Segmentation and classification for mixed text/image documents using neural network. In Proceedings of the Second International Conference on Document Analysis and Recognition (pp. 930–934). IEEE. https://doi.org/10.1109/ICDAR.1993.395786
Jayadeep, G., Kirthiga, S., & Ashok Kumar, B. (2020). Convolutional neural network-based Indian sign language translator for banks. In 2020 International Conference on Intelligent Computing and Control Systems (ICICCS) (pp. 1–6). IEEE. https://doi.org/10.1109/ICICCS48265.2020.9121166
Jothi, C. S., & Themmozhi, D. (2015). Machine learning approach to document classification using concept based features. International Journal of Computer Applications, 118(20), 1–6.
Kang, L., Lin, Y., & Liu, W. (2014). Convolutional neural networks for document image classification. In 2014 22nd International Conference on Pattern Recognition (pp. 3168–3172). IEEE. https://doi.org/10.1109/ICPR.2014.546
Refbacks
- There are currently no refbacks.