Open Access Open Access  Restricted Access Subscription Access

Feelings in Pixels: Exploring Multimodal Sentiments

I.V. Shashikala, K. Dora Babu, K. Swathi, M. Ramu

Abstract


Traditional sentiment analysis has largely focused on text, often missing the deeper emotional context found in audio, images, and video. This paper introduces a multimodal sentiment analysis framework that brings together text, speech, images, and video to capture a fuller picture of human emotions. For text, the system uses VADER (Valence Aware Dictionary and Sentiment Reasoner); for images, it employs Convolutional Neural Networks (CNNs) to recognize facial expressions; and for audio and video, it leverages real-time processing techniques to detect dynamic emotional shifts. The proposed approach achieved an overall accuracy of 80%, with strong precision and recall across different input types. Still, challenges like class imbalance (e.g., more "Happy" samples than "Disgust") and overfitting (with validation accuracy peaking at 60%) point to areas needing refinement. The potential applications of this framework are wide-ranging from improving customer engagement to supporting mental health monitoring and enhancing social media insights. Looking ahead, future research will investigate transformer-based models and address the ethical implications of real-time emotion analysis.


Full Text:

PDF

References


Baccianella, S., Esuli, A., & Sebastiani, F. (2010). SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. Proceedings of LREC, 10, 2200–2204.

Baevski, A., Zhou, Y., Mohamed, A., & Auli, M. (2020). *wav2vec 2.0: A framework for self-supervised learning of speech representations*. Advances in Neural Information Processing Systems, 33, 12449–12460.

Chen, T., Borth, D., Darrell, T., & Chang, S. F. (2014). DeepSentiBank: Visual sentiment concept classification with deep convolutional neural networks. arXiv preprint arXiv:1410.8586.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Goodfellow, I. J., Erhan, D., Carrier, P. L., Courville, A., Mirza, M., Hamner, B., ... & Zhou, Y. (2013). Challenges in representation learning: A report on three machine learning contests. Neural Networks, 64, 59–63.

Hutto, C., & Gilbert, E. (2014). VADER: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of ICWSM, 8(1), 216–225.

Mollahosseini, A., Hasani, B., & Mahoor, M. H. (2017). AffectNet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing, 10(1), 18–31.

Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. Proceedings of EMNLP, 79–86.

Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. Proceedings of BMVC, 1(3), 6.

Poria, S., Cambria, E., Hazarika, D., Majumder, N., Zadeh, A., & Morency, L. P. (2017). Context-dependent sentiment analysis in user-generated videos. Proceedings of ACL, 873–883.


Refbacks

  • There are currently no refbacks.