Open Access Open Access  Restricted Access Subscription Access

Content based Detection and Blocking of Spam/Phishing Emails using Machine Learning

Akalya Devi C, Karthika Renuka D, Sarvesh S

Abstract


Utilising the web has been increasing day by day, as a greater number of people are using it, especially for communication. E-mail remains to be one of the most efficient ways of communication techniques and one of the most effective tools for communication for social to business purposes, due to its cost and minimum time consumption. Through e-mail, one can flood the internet by sending multiple copies of same message to large number of users. One important issue to be addressed in e-mails is that our inboxes are generally affected by attacks which mainly includes spam. Currently, spam e-mails are identified by detecting stop words in it, however if any new spam, fake or irrelevant e-mail is sent without including the stop words, it isn't properly identified. Therefore, a system should learn the words and its meaning to detect spam e-mails efficiently. To overcome this issue of blocking new and unrecognised spam e-mails, Machine Learning based approach on ‘Phishing Websites’ dataset from the UCI repository is proposed. Our proposed methodology is to use Morphological Analysis in Natural Language Processing (NLP) for better spam identification. By utilising the machine learning techniques efficiently, spam and phishing emails are to be detected and blocked in the server side itself.


Full Text:

PDF

References


Nguyen, M., Nguyen, T., & Nguyen, T. H. (2018). A deep learning model with hierarchical lstms and supervised attention for anti-phishing. arXiv preprint arXiv:1805.01554..

Anti-Phishing Working Group. (2018). Phishing Activity Trends Report 1st Quarter 2018. Available: http://docs.apwg.org/reports/apwg_trends_report_q1_2018.pdf

PhishLabs. (2018). 2018 Phish Trends & Intelligence Report. Available: https://info.phishlabs.com/hubfs/2018%20PTI%20Report/PhishLabs%20Trend%20Report_2018-digital.pdf

Anti-Phishing Working Group. (2016). Phishing Activity Trends Report 4th Quarter 2016. Available: http://docs.apwg.org/reports/apwg_trends_report_q4_2016.pdf

Anti-Phishing Working Group. (2015). Phishing Activity Trends Report 1st-3rd Quarter 2015. Available: http://docs.apwg.org/reports/apwg_trends_report_q1-q3_2015.pdf

Phishing Websites Data Set - Machine Learning Repository. Available: https://archive.ics.uci.edu/ml/datasets/Phishing+Websites

Mohammed, M. A., Mostafa, S. A., Obaid, O. I., Zeebaree, S. R., Abd Ghani, M. K., Mustapha, A., ... & AL-Dhief, F. T. (2019). An anti-spam detection model for emails of multi-natural language. Journal of Southwest Jiaotong University, 54(3).

Subramaniam, T., Jalab, H. A., & Taqa, A. Y. (2010). Overview of textual anti-spam filtering techniques. International Journal of Physical Sciences, 5(12), 1869-1882.

Sharma, A. K., & Sahni, S. (2011). A comparative study of classification algorithms for spam email data analysis. International Journal on Computer Science and Engineering, 3(5), 1890-1895.

Chhabra, P., Wadhvani, R., & Shukla, S. (2010). Spam filtering using support vector machine. Special Issue of IJCCT, 1(2), 3.


Refbacks

  • There are currently no refbacks.