Open Access Open Access  Restricted Access Subscription Access

Log classification system

Gopika P, Sabarinath R

Abstract


The exponential growth of system logs in modern IT environments necessitates efficient, automated methods for analyzing and classifying log data. This project presents the development of a Log Classification System using Python, NLP techniques, and machine learning models to categorize logs into predefined classes such as errors, warnings, and informational messages. The system pipeline includes data exploration, clustering with DBSCAN, feature extraction using BERT embeddings, and classification via Logistic Regression. The project leverages PyCharm Professional for development, and a FastAPI backend to enable real-time log processing. Additionally, comprehensive documentation including README and requirements files ensures reproducibility and deployment readiness. By integrating preprocessing, advanced embeddings, and machine learning, the proposed system automates log management, reduces manual effort, and provides actionable insights for system monitoring and anomaly detection. This implementation demonstrates the practical application of modern AI techniques for enhancing operational efficiency in enterprise IT systems.


Full Text:

PDF

References


Apache Software Foundation (2024). “Apache Spark™ — Unified Analytics Engine for Large-Scale Data Processing.”

Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise (DBSCAN).” Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, 226–231.

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” NAACL-HLT 2019.

Hugging Face Transformers Documentation (2024). “BERT Models for NLP.” [Online]. Available: https://huggingface.co/docs/transformers

scikit-learn Documentation (2024). “Logistic Regression in scikit-learn.” [Online]. Available: https://scikit-learn.org/stable/modules/linear_model.html

FastAPI Documentation (2024). “FastAPI: Modern, Fast Web Framework for Python.” [Online]. Available: https://fastapi.tiangolo.com/

PyCharm Documentation (2024). “JetBrains PyCharm Professional Edition.” [Online]. Available: https://www.jetbrains.com/pycharm/

Shojaee Rad, Z., & Ghobaei-Arani, M. (2024). “Data pipeline approaches in serverless computing: a taxonomy, review, and research trends.” Journal of Big Data, 11, Article No. 82.


Refbacks

  • There are currently no refbacks.