Assessing the performance of machine learning techniques in achieving keyword coverage through semantic data analysis
Abstract
This article provides an in-depth comparative analysis of two advanced hybrid machine learning models designed for keyword extraction. The first combines Bidirectional Encoder Representations from Transformers (BERT) with an Autoencoder (AE), while the second merges the traditional Term Frequency–Inverse Document Frequency (TF-IDF) method with the same autoencoding framework. The study focuses on semantic analysis within text corpora, aiming to assess how effectively these methods identify meaningful and comprehensive keyword sets across various datasets. A thorough examination of each model’s architecture and mechanisms is performed, emphasising the crucial role of autoencoders in maintaining semantic consistency and contextual relevance of the extracted keywords. The experimental analysis covers multiple datasets, illustrating how differences in textual structure and semantic richness impact the overall extraction performance. Performance metrics such as precision, recall, and the F1-score are utilised to evaluate and compare accuracy. Additionally, the discussion outlines the respective strengths, limitations, and ideal scenarios for applying both hybrid approaches. Ultimately, the findings offer valuable insights for researchers and practitioners, providing practical guidance on selecting the most suitable method for tasks requiring deep semantic understanding and high precision in information retrieval.
References
Z. Sadirmekova, J. Tussupov, A. Murzakhmetov, G. Zhidekulova, A. Tungatarova, M. Tulenbayev, ... and G. Borankulova, "Ontology engineering of automatic text processing methods," International Journal of Electrical and Computer Engineering (IJECE), vol. 13, no. 6, pp. 6620-6628, 2023.
Polo-Blanco, M. J. González López, A. Bruno, and J. González-Sánchez, "Teaching students with mild intellectual disability to solve word problems using schema-based instruction," Learning Disability Quarterly, vol. 47, no. 1, pp. 3-15, 2024.
Y. Suzuki, H. Jeong, H. Cui, K. Okamoto, R. Kawashima, and M. Sugiura, "An fMRI validation study of the word-monitoring task as a measure of implicit knowledge: Exploring the role of explicit and implicit aptitudes in behavioural and neural processing," Studies in Second Language Acquisition, vol. 45, no. 1, pp. 109-136, 2023.
P. Tiwari, S. Chaudhary, D. Majhi, and B. Mukherjee, "Comparing research trends through author-provided keywords with machine extracted terms: A ML algorithm approach using publications data on neurological disorders," Iberoamerican Journal of Science Measurement and Communication, vol. 3, no. 1, p. 2, 2023.
L. P. Hung and S. Alias, "Beyond sentiment analysis: A review of recent trends in text-based sentiment analysis and emotion detection," Journal of Advanced Computational Intelligence and Intelligent Informatics, vol. 27, no. 1, pp. 84-95, 2023.
C. P. Chai, "Comparison of text preprocessing methods," Natural Language Engineering, vol. 29, no. 3, pp. 509-553, 2023.
U. E. Chigbu, S. O. Atiku, and C. C. Du Plessis, "The Science of Literature Reviews: Searching, Identifying, Selecting, and Synthesising," Publications, vol. 11, no. 1, p. 2, 2023.
L. Hickman, S. Thapa, L. Tay, M. Cao, and P. Srinivasan, "Text preprocessing for text mining in organisational research: Review and recommendations," Organisational Research Methods, vol. 25, no. 1, pp. 114-146, 2022.
L. Hickman, S. Thapa, L. Tay, M. Cao, and P. Srinivasan, "Text preprocessing for text mining in organisational research: Review and recommendations," Organisational Research Methods, vol. 25, no. 1, pp. 114-146, 2022.
H. Hassani, C. Beneki, S. Unger, M. T. Mazinani, and M. R. Yeganegi, "Text mining in big data analytics," Big Data and Cognitive Computing, vol. 4, no. 1, p. 1, 2020.
Refbacks
- There are currently no refbacks.