

Improving the Precision of Tesseract-OCR Engine
Abstract
Picture Archives checked or got by mechanized cameras on PDAs experience the evil impacts of different limitations like numerical turns, focus setback, disproportionate lightning conditions, low separating objective, etc by virtue of these limits, the idea of picture reports is routinely undermined and thusly, the affirmation accuracy of OCR engines gets affected. This work fixates around dealing with the affirmation of Tesseract-OCR engine for Nepali picture reports through preprocessing. Consequently, we cultivated an image preprocessing pipeline containing 8 phases and attempted with a couple of Nepali text pictures which were accumulated from different sources like Nepali news corpus, books, printed records, etc Our exploratory results showed that the affirmation accuracy improved from 90.69%, 54.34% and 38.45 to 94.84%, 71.15% and 51.21% independently for high, medium and substandard quality pictures.
References
Khedekar, S., Ramanaprasad, V., Setlur, S., & Govindaraju, V. (2003, August). Text-image separation in devanagari documents. In Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings. (pp. 1265-1269). IEEE.
Kompalli, S., Nayak, S., Setlur, S., & Govindaraju, V. (2005, August). Challenges in OCR of Devanagari documents. In Eighth International Conference on Document Analysis and Recognition (ICDAR'05) (pp. 327-331). IEEE.
Smith, R. (2007). An Overview of the Tesseract OCR Engine. In proceedings of Document analysis and Recognition. ICDAR.
Smith, R. (2007, September). An overview of the Tesseract OCR engine. In Ninth international conference on document analysis and recognition (ICDAR 2007) (Vol. 2,
pp. 629-633). IEEE.
Alginahi, Y. (2010). Preprocessing Techniques in Character Recognition, Character Recognition, Minoru Mori (Ed.), ISBN: 978-953-307-105-3,
InTech.
Bansal, V., & Sinha, M. K. (2001, September). A complete OCR for printed Hindi text in Devanagari script. In Proceedings of Sixth International Conference on Document Analysis and Recognition (pp. 0800-0800). IEEE Computer Society.
Yadav, D., Sánchez-Cuadrado, S., & Morato, J. (2013). Optical character recognition for Hindi language using a neural-network approach. JIPS, 9(1), 117-140.
Gupta, D., & Nair, L. (2013). Improving OCR By Effective Pre- Processing and Segmentation for Devanagiri Script: A Quantified Study. Journal of Theoretical & Applied Information Technology, 52(2).
Refbacks
- There are currently no refbacks.