Phishing Detection Using Natural Language Processing and Behavioural Analysis: A Multi-Faceted Approach

Abuh Ibrahim Sani1, Oludolamu Onimole2, Adetunji Oludele Adebayo3, Nathaniel Adeniyi Akande4, & Uju Judith Eziokwu5

1Cybersecurity Analyst/Independent Researcher, EyBrids Limited, Nigeria
2Cybersecurity Operation Analyst/Independent Researcher, Teesside University, Middlesbrough
3Information Security Manager/Independent researcher, University of Bradford
4Cyberecurity Analyst/Independent Researcher, University of Bradford
5Data Analyst/Independent Researcher, University of Bradford, UK
DOI –
http://doi.org/10.37502/IJSMR.2025.81105

Abstract

Phishing is still one of the most common and harmful types of cyberattack. It uses human psychology and trust to break into systems, steal credentials, and cost people money. Conventional defences, frequently dependent on static heuristics or domain reputation lists, find it challenging to adjust to the swiftly changing linguistic styles and technical infrastructures employed by attackers. This paper presents a hybrid phishing detection framework that combines Natural Language Processing (NLP) with Behavioural Analysis to enhance accuracy, interpretability, and resilience. Urgency, sentiment, and pragmatic intent are examples of linguistic indicators. Irregular sender activity and recipient interaction are examples of behavioural features. The model uses transformer-based architectures and ensemble learning for classification. It uses datasets from PhishTank and Enron, and Adaptive Synthetic Sampling (ADASYN) to fix class imbalance. Experimental evaluation shows that this system works very well, with 98.7% accuracy, 97.9% recall, and an AUC of 0.99. This is better than single-modality systems. Adding features that can be easily understood makes things clearer and provides analysts with useful information. The results show that a multimodal, privacy-conscious, and explainable framework greatly improves phishing detection, making it a useful and scalable improvement for modern email security systems.

Keywords: Behavioural Analysis, Cybersecurity, Hybrid Detection, Natural Language Processing, Phishing

Reference

  • Abd Rahman, N. S., Othman, A., Yusof, M. F. M., & Azmi, A. (2025). Assessing phishing susceptibility among academic and non-academic staff using a simulated phishing exercise. Asia-Pacific Journal of Information Technology and Multimedia, 14(1), 139-153.
  • Alam, R., Khune, A., Kalal, T.V. and Nautiyal, A., 2024, December. E2Phish: Explainable Ensemble Machine Learning Model for Enhanced Phishing URL Detection. In 2024, IEEE 8th International Conference on Information and Communication Technology (CICT) (pp. 1-6). IEEE.
  • Alshdadi, A.A., 2024, July. LSTM-PSO: NLP-based model for detecting phishing attacks. In Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (pp. 70-79).
  • Al-Janabi, M., & Al-Shourbaji, I. (2022). PhishGNN: A phishing website detection framework using graph neural networks. arXiv preprint arXiv:2205.14919.
  • ANDRIU, A.-V. (2023). Adaptive Phishing Detection: Harnessing the Power of Artificial Intelligence for Enhanced Email Security. Romanian Cyber Security Journal, 5(1), 3–9. doi: 10.54851/v5i1y202301.
  • Belz, A. (2022). A Metrological Perspective on Reproducibility in NLP. Available at: https://doi.org/10.1162/coli.
  • Benavides-Astudillo, E., Fuertes, W., Sanchez-Gordon, S., Nuñez-Agurto, D. and Rodríguez-Galán, G. (2023). A phishing-attack-detection model using natural language processing and deep learning. Applied Sciences, 13(9), 5275.
  • Binte Rashid, M., Rahaman, M.S. and Rivas, P. (2024). Navigating the Multimodal Landscape: A Review on Integration of Text and Image Data in Machine Learning Architectures. Machine Learning and Knowledge Extraction, 6(3), 1545–1563. doi: 10.3390/make6030074.
  • Bountakas, P., Koutroumpouchos, K. and Xenakis, C., 2021, August. A comparison of natural language processing and machine learning methods for phishing email detection. In Proceedings of the 16th International Conference on Availability, Reliability and Security (pp. 1-12).
  • Calzarossa, M.C., Giudici, P. and Zieni, R., 2024. Explainable machine learning for phishing feature detection. Quality and Reliability Engineering International, 40(1), pp.362-373.
  • Elangovan, A., He, J., Li, Y. and Verspoor, K. (2024). Principles from Clinical Research for NLP Model Generalisation. Available at: http://arxiv.org/abs/2311.03663.
  • Elsharief, A. F., & BİNGÖL, N. (2025). Comparative evaluation of machine learning models for phishing email detection. ResearchGate.
  • Gallo, L., Gentile, D., Ruggiero, S., Botta, A. and Ventre, G., 2024. The human factor in phishing: Collecting and analyzing user behavior when reading emails. Computers & Security, 139, p.103671.
  • (n.d.). Behavioral analytics in cybersecurity: A user behavior analysis guide. Retrieved from Gurucul.
  • Jain, A., & Gupta, B. B. (2015). Feature extraction or feature selection for text classification: A case study on phishing email detection. International Journal of Information Engineering and Electronic Business, 7(2), 60-65.
  • Madhavan, V., Anand, G.P. and Sridhar, S. (2025). Safe URL Detection with Privacy Using Machine Learning and Cryptography Techniques. In ICDT 2025 – 3rd International Conference on Disruptive Technologies. IEEE, pp. 321–326. doi: 10.1109/ICDT63985.2025.10986356.
  • Mittal, A., Engels, D.D., Kommanapalli, H., Sivaraman, R. and Chowdhury, T., 2022. Phishing detection using natural language processing and machine learning. SMU Data Science Review, 6(2), p.14.
  • Human Factors in Cybersecurity Using Behavioural Analysis and Machine Learning Technique. European Journal of Computer Science and Information Technology, 13(51), 101–118. Available at: https://eajournals.org/ejcsit/vol13-issue51-2025/enhancing-risk-management-with-human-factors-in-cybersecurity-using-behavioural-analysis-and-machine-learning-technique/.
  • Omar, A.R., Taie, S. and Shaheen, M.E., 2023. From phishing behavior analysis and feature selection to enhance prediction rate in phishing detection. International Journal of Advanced Computer Science and Applications, 14(5).
  • Pașca, A. M., Cîrstea, C. A., Geman, O., Chiuchisan, I., & Pașca, I. (2024). A feature engineering approach for detecting phishing emails. ResearchGate.
  • Ponce‐Bobadilla, A.V., Schmitt, V., Maier, C.S., Mensing, S. and Stodtmann, S., 2024. Practical guide to SHAP analysis: Explaining supervised machine learning model predictions in drug development. Clinical and translational science17(11), p.e70056.
  • Rao, G.S.N. and Reddy, J.V., 2025. A Novel Approach For Phishing Detection System Using Hybrid Data Mining Techniques. International Journal of Environmental Sciences, pp.83-94.
  • REDDY K.T 2023. Unravelling Behavioural Analysis in Phishing Detection, Insights2Techinfo, pp.1
  • Roy, S.S. and Nilizadeh, S., 2024. PhishLang: A Real-Time, Fully Client-Side Phishing Detection Framework Using MobileBERT. arXiv preprint arXiv:2408.05667.
  • Salloum, S.A., 2023. Enhancing Cybersecurity: Machine Learning and Natural Language Processing for Arabic Phishing Email Detection (Doctoral dissertation, University of Salford (United Kingdom)).
  • Salloum, S., Gaber, T., Vadera, S. and Shaalan, K., 2022. A systematic literature review on phishing email detection using natural language processing techniques. Ieee Access, 10, pp.65703-65727.
  • Shyni, C. E., et al. (2016). A multi-classifier-based prediction model for phishing emails detection using topic modelling, named entity recognition and image processing. Circuits and Systems, 7(9), 2507-2520.
  • Siddiqui, Z. (2024). Human-centric cybersecurity: Evaluating phishing susceptibility using behavioral metrics. ResearchGate.
  • Singh, V., Aggarwal, S., Rajivan, P., & Gonzalez, C. (2020). It is not all about the features: The role of similarity in the detection of phishing emails. Carnegie Mellon University.
  • Stevanović, N. (2022). Character and word embeddings for phishing email detection. Computing and Informatics, 41(5), 1337-1357.
  • Sublime Security. (2024). Email topic modeling: Simplifying detection with ML-powered granularity. Sublime Security Blog.
  • Uddin, M.A., Islam, M.N., Maglaras, L., Janicke, H. and Sarker, I.H. (2025). ExplainableDetector: Exploring transformer-based language modeling approach for SMS spam detection with explainability analysis. Digital Communications and Networks. doi: 10.1016/j.dcan.2025.07.008.
  • VanDerMeulen, J. (2022). Urgency in phishing emails: A sentiment analysis approach. Dakota State University Honors Theses.
  • Vishwanath, A., Harrison, B., & Ng, Y. J. (2016). Suspicion, cognition, and automaticity model of phishing susceptibility. Communication Research, 45(8), 1146-1166.
  • Yang, Z., Liu, Y., Wang, Z., Zhang, Y., & Liu, J. (2024). A graph-based machine learning model for phishing URL detection. arXiv preprint arXiv:2401.06912.
  • Zieni, R., Massari, L. and Calzarossa, M.C., 2023. Phishing or not phishing? A survey on the detection of phishing websites. IEEE Access, 11, pp.18499-18519.