Sentix AI: A Production-Grade Platform for Real-Time Facial Emotion Detection and Temporal Analytics from Video
Abstract
This paper describes Sentix AI, a browser-native platform for real-time emotion detection and temporal analytics from video. The system uses Google Gemini's multimodal model to extract per-frame emotional data, catch laugh events, track multiple people, and produce time-coded reports. It runs entirely in the browser with React 19, TypeScript, and the Gemini 1.5 Pro vision API — no GPU, no server, just a web browser and a video file. We tested it across a range of content types, lighting conditions, and face counts. The system introduces an event-based emotion schema that ties together temporal span, peak intensity, and per-person tracking into a single JSON structure. This paper covers the architecture, the trade-offs we made, what breaks in practice, and where we think this can go next — including edge inference, multilingual affect analysis, and clinical use.
References
Ekman, P., & Friesen, W. V. (1978). Facial Action Coding System: A technique for the measurement of facial movement. Consulting Psychologists Press.
Mollahosseini, A., Hasani, B., & Mahoor, M. H. (2019). AffectNet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing, 10(1), 18–31.
Li, S., & Deng, W. (2020). Deep facial expression recognition: A survey. IEEE Transactions on Affective Computing, 13(3), 1195–1215.
Team, G., Anil, R., Borgeaud, S., et al. (2024). Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530.
Yang, Z., Li, L., Lin, K., et al. (2023). The dawn of LMMs: Preliminary explorations with GPT-4V(ision). arXiv preprint arXiv:2309.17421.
Lucey, P., Cohn, J. F., Kanade, T., et al. (2010). The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression. CVPR Workshops.
Li, Y., & Zeng, J. (2022). Facial expression recognition with multi-scale graph convolutional networks. IEEE Transactions on Affective Computing.
Farzaneh, A. H., & Qi, X. (2021). Facial expression recognition in the wild via deep attentive center loss. WACV 2021.
Kossaifi, J., Walecki, R., Panagakis, Y., et al. (2019). SEWA DB: A rich database for audio-visual emotion and sentiment research in the wild. IEEE TPAMI.
Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of the 1st Conference on Fairness, Accountability, and Transparency.
Refbacks
- There are currently no refbacks.