Open Access Open Access  Restricted Access Subscription Access

Document Forgery Detection using Machine Learning

Jyoti Sarawade, Shravani Jadhav, Vedika Jadhav, Arati Naigade, Utkarsha Nangare

Abstract


Nowadays, institutions such as banks, educational establishments, and government offices heavily rely on digital documents. Consequently, the issue of counterfeit paperwork has become increasingly significant. Forged identification cards, degrees, and official documents often result from simple photo editing software. Manually verifying each document is time consuming, allowing numerous scams to go unnoticed. Nonetheless, identifying forgeries remains more challenging than it ought to be.

This research presents a technique for detecting fraudulent documents by integrating machine learning with image processing tools. By examining scanned documents closely, it pinpoints sections that appear modified—such as altered text, swapped images, and forged signatures. Prior to beginning the analysis, images undergo a series of cleanup procedures, followed by the automatic extraction of essential information. Software that recognizes letters and numbers from images scans the text. A specialized neural network aimed at identifying patterns in visuals aids in determining what seems out of place. Experiments conducted on different samples revealed high success rates in detecting altered files. Institutions managing official documents could benefit from this method of verifying authenticity.


Full Text:

PDF

References


Zhao, L., Chen, C., & Huang, J. (2021). Deep learning-based forgery attack on document images. IEEE Transactions on Image Processing, 30, 7964–7979. https://doi.org/10.1109/TIP.2021.3110527

Patil, S., et al. (2024). PAN card tampering detection using machine learning techniques. In Proceedings of the IEEE ICDSIS Conference. https://doi.org/10.1109/ICDSIS61070.2024.10675622

Hamido, M., Mohialdin, A., & Atia, A. (2023). The use of background features, template synthesis and deep neural networks in document forgery detection. In Proceedings of the IEEE International Conference on Artificial Intelligence in Information and Communication (ICAIIC) (pp. 365–370). https://doi.org/10.1109/ICAIIC57133.2023.10067120

Akhtar, N., & Foresti, G. L. (2020). Image forgery detection using deep learning: A survey. In Proceedings of the IEEE International Conference on Computer Systems and Applications (AICCSA). https://doi.org/10.1109/AICCSA50499.2020.9074408

Tkachenko, I. A., Fournel, T., Shieh, J. J., & Trunde, F. (2016). Two-level QR code for private message sharing and document authentication. IEEE Transactions on Information Forensics and Security, 11(3), 571–583. https://doi.org/10.1109/TIFS.2015.2506546

Boned, C., Adan, J. L., Fornes, A., & Llados, J. (2024). Synthetic dataset of ID and travel documents. Scientific Data. https://doi.org/10.1038/s41597-024-04160-9

Yan, Z., et al. (2025). IDNet: A novel identity document dataset via few-shot and quality-driven synthetic data generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). https://doi.org/10.1109/CVPRW63382.2025.10825017

Singhal, S. K. (2021). Security analysis of Aadhaar authentication process and way forward. In Proceedings of the IEEE International Conference on Advances in Computing, Communication Control and Networking (ICAC3N) (pp. 1304–1307). https://doi.org/10.1109/ICAC3N53548.2021.9725391

Aditya, R., et al. (2025). NetraAadhaar: A deep learning-driven Aadhaar verification platform for the aid of visually impaired. IEEE Access. https://doi.org/10.1109/ACCESS.2025.3563786

Ouarda, A., et al. (2022). A CNN-based architecture for forgery detection in administrative documents. In Proceedings of the IEEE International Symposium on Networks, Computers and Communications / ISNIB. https://doi.org/10.1109/ISNIB57382.2022.10076089


Refbacks

  • There are currently no refbacks.