Open Access Open Access  Restricted Access Subscription Access

TA-RAG: Confidence-Calibrated Statement-Level Verification for Hallucination Resilient Regulatory Knowledge Systems

Sakshi More, Dnyaneshwari Pawar, Vedika Sardeshmukh, Vedanti Ghongade, Prof. A. A. Mohite

Abstract


Despite the heavily deployment of large language model (LLM) integrated with Retrieval Augmented Generation (RAG) for enterprise’s regulatory systems, they still generate hallucinated responses unsupported by the retrieved information. In highstakes areas such as banking, legal compliances, this can present potentially real liability and operational risks. In this paper, we propose a Trust Aware RAG Framework and demonstrate that it can be used to transparently incorporate multistage statement level verification and confidence scoring in the standard RAG pipeline. Generated answers are effectively decomposed into atomic claims and determined to be correct or not with respect to retrieved documents according to a support-based confidence score, achieving calibrated answer rejection with poor evidence coverage. Tested on a real-world RBI regulatory document corpus with factual and outofscope queries, the system shows significant reduction in hallucinations and increased reliability over RAG baselines. The proposed methodology provides a practical way to a more secure and reliable enterprise knowledge system.

 


Full Text:

PDF

References


L. Kong, X. Zhong, J. Chen et al., “Multi perspective consistency checking for large language model hallucination detection: A black box zero resource approach,” Frontiers of Information Technology & Electronic Engineering, vol. 26, pp. 2298–309, 2025, doi: 10.1631/FITEE.2500.

P. Tikka, J. Karjalainen, A. Alesani, and V. Goriachev, “Large language model hallucination mitigation in three industrial use cases,” IEEE Access, early access, 2026, doi: 10.1109/ACCESS.2026.3659997.

W. Zhang and J. Zhang, “Hallucination mitigation for retrieval augmented large language models: A review,” Mathematics, vol. 13, no. 5, Art. no. 856, 2025, doi: 10.3390/math13050856

C. S. Mala, G. Gezici, and F. Giannotti, “Hybrid Retrieval for Hallucination Mitigation in Large Language Models: A Comparative Analysis,” arXiv preprint arXiv:2504.05324v1 [cs.IR], 2025. doi: 10.48550/arXiv.2504.05324.

P. Elchafei and M. Abu Elkheir, “Hallucination Detectives at SemEval 2025 Task 3: Span Level Hallucination Detection for LLM Generated Answers,” in Proc. 19th International Workshop on Semantic Evaluation (SemEval 2025), Vienna, Austria, Jul. 2025, pp. 601–606

R. H. Ajmal, M. U. Sarwar, M. K. Hanif, and M. I. Khan, “Evaluating the Effectiveness of Advanced Language Models in Detecting and Mitigating Hallucinations Using Structured Question Answering, Novel Metrics, and Post Hoc Retrieval,” IEEE Access, vol. 13, pp. 173805–173812, 2025, doi: 10.1109/ACCESS.2025.3613851.

W. Wu, H. Wang, B. Li, P. Huang, X. Zhao and L. Liang, "MultiRAG: A Knowledge Guided Framework for Mitigating Hallucination in Multi Source Retrieval Augmented Generation," 2025 IEEE 41st International Conference on Data Engineering (ICDE), Hong Kong, Hong Kong, 2025, pp. 3070 3083, doi: 10.1109/ICDE65448.2025.0030

Magesh, V., Surani, F., Dahl, M., Suzgun, M., Manning, C.D. and Ho, D.E. (2025), Hallucination Free? Assessing the Reliability of Leading AI Legal Research Tools. J Empir Leg Stud, 22: 216 242. https://doi.org/10.1111/jels.12413


Refbacks

  • There are currently no refbacks.