Open Access Open Access  Restricted Access Subscription Access

Echoes in the Metaverse: Optimized Speech-NLP Models for Immersive Digital Twin Interaction

M. Bhavya Sree, V. Poojitha, S. Rakshitha, K. Shirishsa, R. Sai Priyamshu, K. Dora Babu

Abstract


Imagine navigating a virtual world with just your voice sounds futuristic, right? But as the Metaverse and Digital Twins (DTs) evolve, this vision is fast becoming a reality. Achieving this requires AI that truly understands human speech. In this paper, we present an innovative solution to this challenge by designing an optimized Natural Language Processing (NLP) model that processes voice commands in real-time, enabling seamless interaction with virtual environments. `Our approach leverages Convolutional Neural Networks (CNNs) in combination with Mel Frequency Cepstral Coefficients (MFCCs) to efficiently extract and analyze speech features. Beginning with a basic model, we refined its architecture by adding layers, fine-tuning parameters, and mitigating overfitting, ultimately achieving a 95.6% accuracy in recognizing spoken commands. In addition to performance improvements, we developed a practical Keyword Spotting System (KWS) integrated with VR platforms like Unity, facilitating the transition from voice input to virtual action. Whether in gaming, training simulations, or virtual assistant applications, our work brings us closer to a Metaverse that listens and responds like the real world.


Full Text:

PDF

References


Alfaro, L., Linares, R., & Herrera, J. (2018). Scientific articles exploration system model based in immersive virtual reality and natural language processing techniques. International Journal of Advanced Computer Science and Applications, 9(7), 254–263.

Benedetto, L., & Cremonesi, P. (2019, August). Rexy, a configurable application for building virtual teaching assistants (pp. 233–241).

Boschert, S., & Rosen, R. (2016). Digital twin – The simulation aspect. In Mechatronic futures: Challenges and solutions for mechatronic systems and their designers (pp. 59–74). Springer.

Fathima, H., Juveria, S., Naaz, S. H., Shashikala, I. V., & Dora Babu, K. (2026). Future proofing real estate: Machine learning for price predictions.

Google. (2017). Speech commands dataset. https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html

Katz, D., et al. (2020). Utilization of a voice-based virtual reality advanced cardiac life support team leader refresher: Prospective observational study. Journal of Medical Internet Research, 22(3), 1–9.

Li, G., & Tang, B. (2019). Research on voice interaction technology in VR environment. In Proceedings of the 2019 International Conference on Electronic Engineering and Informatics (EEI) (pp. 213–216).

Lukaj, V., Catalfamo, A., Fazio, M., Celesti, A., & Villari, M. (2023). Optimized NLP models for digital twins in Metaverse. In 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC) (pp. 1453–1458). IEEE.

Lv, Z., Shang, W.-L., & Guizani, M. (2022). Impact of digital twins and metaverse on cities: History, current situation, and application perspectives. Applied Sciences, 12(24), 12820.

Mahmood, A., & Kose, U. (2021, January). Speech recognition based on convolutional neural networks and MFCC algorithm, 1, 6–12.


Refbacks

  • There are currently no refbacks.