Open Access Open Access  Restricted Access Subscription Access

LipNET: END-to-ENDLEVEL of Visual and Video Lip reading using CNN network: A Survey

Deepak N.R, Simran Pal, Abhishek DY, Chandrashekar ., D Vishal, Jayasurya K

Abstract


The indispensability of lip reading has garnered paramount significance, propelled by the strides achieved in the realm of deep learning. This transformative paradigm shift not only heralds a commitment to enhanced communication access, especially catering to the auditory challenges faced by individuals with hearing impairments, but also unfolds a plethora of multifaceted applications. Our sophisticated model, characterized by scrupulous pre-processing techniques and state-of-the-art feature extraction methodologies, attains a commendable echelon of accuracy that transcends the limitations inherent in traditional approaches. The amalgamation of lip reading, an intricate cognitive artistry centred around the comprehension of spoken language through discerning visual cues extracted from lip and facial movements, has undergone a profound evolution, reaching new heights with the integration of Convolutional Neural Networks (CNNs).


Full Text:

PDF

References


Themos Stafylakis and Georgios Tzimiropoulos - DEEP WORD EMBEDDINGS FOR VISUAL SPEECH RECOGNITION (2017)

Themos Stafylakis, Georgios Tzimiropoulos - Combining Residual Networks with LSTMs for Lipreading (2017)

Stavros Petridis , Themos Stafylakis , Pingchuan Ma , Feipeng Cai1 Georgios Tzimiropoulos , Maja Pantic - END-TO-END AUDIOVISUAL SPEECH RECOGNITION.(2018)

Triantafyllos Afouras, Joon Son Chung, Andrew Senior, Oriol Vinyals, Andrew Zisserman - Deep Audio-Visual Speech Recognition (2018)

Takaki Makino, Hank LiaoYannis Assael, Brendan Shillingford,Basilio Garcia,Otavio, Braga,Olivier Siohan - RECURRENT NEURAL NETWORK TRANSDUCER FOR AUDIOVISUAL SPEECH RECOGNITION (2019)

Pingchuan Ma, Stavros Petridis, Maja Pantic - END-TO-END AUDIO-VISUAL SPEECH RECOGNITION WITH CONFORMERS. (2021)

Bowen Shi, Wei-Ning Hsu ,Kushal Lakhotia,Abdelrahman Mohamed - LEARNING AUDIO- VISUAL SPEECH REPRESENTATION BY MASKED MULTIMODAL CLUSTER PREDICTION (2022)

K R Prajwal,Triantafyllos Afouras,Andrew Zisserman Sub-word Level Lip Reading With Visual AttentionJune 2022 IEEE/CVF Conference on Computer Vision and Pattern

a. Recognition (CVPR),2022

Petros Maragos, Panagiotis Giannoulis "Recent Advances in Deep Learning for Audio- Visual Speech Processing " Published: In the IEEE Signal Processing Magazine 2021.

G. A. Patil et al., "Lip Reading Using 3D CNN and LSTM," in 2020 International Conference on Signal Processing and Machine Learning (SPML), 2020.


Refbacks

  • There are currently no refbacks.