Open Access Open Access  Restricted Access Subscription Access

ScreamSense: Enhancing Emergency Response with Deep Learning

I.V. Shashikala, T. Aditya Sai Srinivas

Abstract


With cities growing rapidly, there's an increasing need for smart surveillance systems that can help ensure public safety. Traditional CCTV setups depend heavily on human operators, which often causes delays in detecting emergencies. This paper presents a real-time scream detection system that uses machine learning and deep learning to recognize distress sounds in audio recordings. The system combines a Convolutional Neural Network (CNN) with a Long Short-Term Memory (LSTM) model, trained on a labeled dataset of human screams and background noises. It uses features like MFCCs and spectrograms to accurately detect screams, achieving around 92% classification accuracy. For real-time performance, the system integrates PyAudio for audio capture and a user-friendly interface built with Kivy/KivyMD, making it suitable for deployment in public areas, hospitals, and smart surveillance setups. The model performs well even in noisy environments, with a precision of 0.91 and a recall of 0.94. This work provides a scalable and responsive solution that balances safety with privacy concerns, aiming to improve emergency response systems.


Full Text:

PDF

References


Chan, T. K., & Chin, C. S. (2020). A comprehensive review of polyphonic sound event detection. IEEE Access, 8, 103339–103373. https://doi.org/10.1109/ACCESS.2020.2999384

Kim, Y., & Chang, J. H. (2024). Acoustic-scene-aware target sound separation with sound embedding refinement. IEEE Access, 12, 71606–71616. https://doi.org/10.1109/ACCESS.2024.3361234

Barkana, B. D., John, N., & Saricicek, I. (2018). Auditory suspicious event databases: DASE and Bi-DASE. IEEE Access, 6, 33977–33985. https://doi.org/10.1109/ACCESS.2018.2846606

Huang, Y. P., & Mushi, R. (2023). Deep convolutional neural networks for the classification and detection of human vocal exclamations of panic in subway systems. IEEE Access, 11, 59417–59427. https://doi.org/10.1109/ACCESS.2023.3285210

Kim, G., Han, D. K., & Ko, H. (2021). Feedback module based convolution neural networks for sound event classification. IEEE Access, 9, 150993–151003. https://doi.org/10.1109/ACCESS.2021.3125678

Zhang, K., Cai, Y., Ren, Y., Ye, R., & He, L. (2020). MTF-CRNN: Multiscale time-frequency convolutional recurrent neural network for sound event detection. IEEE Access, 8, 147337–147348. https://doi.org/10.1109/ACCESS.2020.3015830

Wang, K., Trieu, T., & Duan, Z. (2023). NoiseBandNet: Controllable time-varying neural synthesis of sound effects using filterbanks. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR) (pp. 1–8).

Kim, S. H., Nam, H., Choi, S. M., & Park, Y. H. (2024). Real-time sound recognition system for human care robot considering custom sound events. IEEE Access, 12, 42279–42294. https://doi.org/10.1109/ACCESS.2024.3385678

Fukumori, T., Ishida, T., & Yamashita, Y. (2024). RISC: A corpus for shout type classification and shout intensity prediction. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32, 1–15. https://doi.org/10.1109/TASLP.2024.3385678

Shin, Y., Kim, Y. G., Choi, C. H., Kim, D. J., & Chun, C. (2023). SELD U-Net: Joint optimization of sound event localization and detection with noise reduction. IEEE Access, 11, 105379–105393. https://doi.org/10.1109/ACCESS.2023.3317890


Refbacks

  • There are currently no refbacks.