Open Access Open Access  Restricted Access Subscription Access

Caption flow - A Subtitle Generator

Ch. Koushik Chandra, J. Abhiram Rudra, Dr. K. Sreekala, Mr. P Satya Shekar Verma, Dr. A. Nagesh

Abstract


This project aims at improving the accessibility of people with hearing impairment by developing a lip reading model based on deep learning, implemented in Python and TensorFlow. The model is trained by extracting the mouth region from relevant data video files from the GRID dataset, performing normalization, and aligning the data which prepares the data for more efficient model learning. The model architecture includes 3D convolutions. LSTM networks are provided for the efficient processing of video sequences. A custom model was created to tackle the problem posed by unaligned sequences by employing a CTC loss function. Variation of the speech rate is easily countered by the model. The evaluation of the model is carried out by visual representation of the predictions and actual transcriptions to quantify the performance of the model in order to increase performance. The project analyzes potential applications of lip reading technology for improving communication accessibility. Visualization of predictions against actual transcriptions is employed to evaluate the model's accuracy and identify areas for improvement. The project also explores the potential for real-world applications of lip-reading technology, particularly in enhancing communication accessibility. Future plans include the development of a standalone application to bring this technology to practical use.

Full Text:

PDF

References


Generation of Video Subtitles for All Video Creators by Pramod Dhamdhere

IndicDialogue: A dataset of subtitles in 10 Indic languages for Indic language modeling

End-to-End Lip Reading Systems and Applications, X. Zhang et al, International Journal of Speech Tech,2024.

Deep Lip Reading: Challenges and Methods, Research team, IJARCCE, Int. J. of Adv. Research in Computer & Comm. Eng, 2023.

Application of a Short Video Caption Generation Algorithm in International Chinese Education and Teaching, International Journal of Web-Based Learning and Teaching Technologies

Automatic Speech Recognition through Visual Cues, A. Kumar et al, Int. Journal of Image & Vision Computing,2023.

Image Caption Generation Using Deep Learning by Shikha Mehta, Honey Baranwal & Sarvagya Saxena.Part of the book series: Lecture Notes in Computer Science (LNCS,volume 15546)


Refbacks

  • There are currently no refbacks.