AI-Based Live Translation and Dubbing Systems for Multilingual Video Content
Abstract
The language diversity in India is a big challenge when it comes to access to digital video content in various geographical areas. News, educational materials, and multimedia materials are also usually readable only in a particular language, which restricts their access to a greater audience. The recent developments in the field of Artificial Intelligence allowed automated speech recognition, machine translation, text-to-speech synthesis, and synchronization of lips, which means that multilingual video translation is now possible. In this review paper, we will discuss the current AI-based systems and techniques that are in use to provide live translation and dubbing of video materials. The paper evaluates speech-to-text solutions, neural machine translation systems, voice synthesis systems, subtitle generation systems, and lip- sync systems. A comparative analysis of the current solutions reveals its advantages and constraints especially with regards to Indian languages. The review outlines the major gaps in the research, which are absence of integrated systems, and offline limitations as well as difficulties in processing in real time. On the basis of these observations, the current paper addresses the necessity of an interdisciplinary AI-based structure to enhance access, efficiency, and scalability of multilingual video translators systems.
References
Y. Wu et al., “VideoDubber: Machine Translation with Speech- Aware Length Control for Video Dubbing,” arXiv, 2022.
https://arxiv.org/abs/2211.16934
D. Bigioi et al., “Multilingual Video Dubbing – A Technology Review,” Frontiers in Signal Processing, 2023A
https://www.frontiersin.org/articles/1 0.3389/frsip.2023.1230755
A. S. Subramanian et al., “Length-Aware Speech Translation
for Video Dubbing,” Interspeech 2025.https://www.isca- archive.org/interspeech_2025/subram anian25_interspeech.pdf
J. Choi et al., “Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing,” arXiv, 2025. https://arxiv.org/abs/2505.20899
C.Cui et al., “Fine- grained Video Dubbing
Duration Alignment,” arXiv, 2025.
https://arxiv.org/abs/2508.08550
K. Wang et al., “SyncVoice: Vision-Augmented TTS for Video Dubbing,” arXiv, 2025.
https://arxiv.org/abs/2512.05126
H.-S. Won et al., “End-to-End Multilingual Automatic Dubbing via Duration-based Translation with LLMs,” EMNLP, 2025.
https://aclanthology.org/2025.emnlp- demos.37
R. Kannojia, “Gen AI Driven Multilingual Audio Dubbing,” ScienceDirect, 2025.
https://www.sciencedirect.com/scienc e/article/pii/S2590123025023138
Refbacks
- There are currently no refbacks.