Mitigating Cyber Threats: Machine Learning and Explainable AI for Phishing Detection
DOI:
https://doi.org/10.21015/vtse.v13i2.2129Abstract
The exponential growth of organizations and users has accelerated the adoption of new technologies, increasing the complexity of online security. Phishing attacks have surged significantly in 2024, with over 932,923 incidents reported in Q3 alone, driven by advanced AI-enabled social engineering tactics. From simple scams to sophisticated schemes exploiting emails, URLs, text messages, and social media platforms, phishing attacks deceive victims into disclosing sensitive information or inadvertently installing malware, often compromising devices as part of more extensive botnet networks. Despite advancements in Cyber-security measures, phishing remains a critical threat, causing substantial financial and reputational damage to businesses. Recently, Machine Learning (ML) algorithms have demonstrated remarkable efficacy in phishing detection; however, many high-performing models operate as black boxes, raising concerns about transparency, interpretability, and trustworthiness—factors essential in high-stakes applications for ensuring reliability, accountability, and regulatory compliance. This research integrates ML techniques with Explainable Artificial Intelligence (XAI) methodologies to address this issue and enhance model interpretability and transparency in phishing detection. The proposed approach employs Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Random Forest, k-Nearest Neighbors (KNN), Twin Support Vector Machine (Twin SVM), and Convolutional Neural Networks (CNN), evaluated across four publicly available datasets to assess performance and interpretability. The research findings reveal that XGBoost achieved the highest accuracy at 99.65%. The Local Interpretable Model-agnostic Explanations (LIME) method was applied to elucidate the importance of feature and model decision-making processes. This comprehensive approach aims to strengthen Cyber-security resilience against phishing threats while promoting model transparency and regulatory compliance.
References
A. Basit, M. Zafar, and H. Abbas, “Evolution of cybersecurity approaches against phishing attacks: A survey,” *Computers & Security*, vol. 132, p. 103293, 2023, doi: 10.1016/j.cose.2023.103293.
N. Capuano, G. Fenza, V. Loia, and C. Stanzione, “Explainable artificial intelligence in cybersecurity: A survey,” *IEEE Access*, vol. 10, pp. 93575–93600, 2022, doi: 10.1109/ACCESS.2022.3204171.
R. Zieni, L. Massari, and M. C. Calzarossa, “Phishing or not phishing? A survey on the detection of phishing websites,” *IEEE Access*, vol. 11, pp. 18499–18519, 2023, doi: 10.1109/ACCESS.2023.3247135.
M. Aljabri et al., “Detecting malicious URLs using machine learning techniques: Review and research directions,” *IEEE Access*, vol. 10, pp. 121395–121417, 2022, doi: 10.1109/ACCESS.2022.3222307.
APWG, “APWG Phishing Activity Trends Report.” [Online]. Available: https://apwg.org/trendsreports/. [Accessed: Apr. 17, 2025].
Zscaler ThreatLabz, “Phishing Attacks on the Rise in 2024 - ThreatLabz Report,” 2024. [Online]. Available: https://www.zscaler.com/blogs/security-research/phishing-attacks-rise-58-year-ai-threatlabz-2024-phishing-report. [Accessed: Apr. 17, 2025].
A. Fatima, M. A. Khan, and M. Imran, “Phishing detection using machine learning techniques: Recent advances, new challenges, and future directions,” *Expert Systems with Applications*, vol. 213, p. 120256, 2023, doi: 10.1016/j.eswa.2023.120256.
S. Ali, M. Qaiser, and T. Saba, “Phishing website detection using URL-based features and ensemble machine learning,” *Computers & Security*, vol. 135, p. 103615, 2024, doi: 10.1016/j.cose.2024.103615.
R. Patgiri and S. Ahmed, “A survey on phishing attacks: Types, techniques, and recent solutions,” *Journal of Network and Computer Applications*, vol. 217, p. 103754, 2023, doi: 10.1016/j.jnca.2023.103754.
N. Q. Do, A. Selamat, O. Krejcar, E. Herrera-Viedma, and H. Fujita, “Deep learning for phishing detection: Taxonomy, current challenges and future directions,” *IEEE Access*, vol. 10, pp. 36429–36463, 2022.
S. Singh and A. Kaur, “Phishing detection using hybrid deep learning model and lexical URL features,” *Journal of Information Security and Applications*, vol. 73, p. 103606, 2023, doi: 10.1016/j.jisa.2023.103606.
U. Tariq, M. U. Khan, and F. Shahzad, “Evasion-aware email phishing detection using machine learning and intelligent feature selection,” *Computers & Security*, vol. 137, p. 103715, 2024, doi: 10.1016/j.cose.2024.103715.
R. Singh and P. Kumar, “Advanced phishing detection and credential theft prevention using deep learning techniques,” *Journal of Cybersecurity*, vol. 9, no. 1, pp. 45–60, 2023, doi: 10.1093/cybsec/tyad012.
M. Rahman and S. Hassan, “A comprehensive survey on malware propagation and botnet detection techniques,” *IEEE Transactions on Information Forensics and Security*, vol. 19, pp. 1124–1143, 2024, doi: 10.1109/TIFS.2024.3156789.
AAG IT Support, “The Latest Phishing Statistics (updated June 2024),” 2024. [Online]. Available: https://aag-it.com/the-latest-phishing-statistics/?src_trk=. [Accessed: Jun. 2024].
Egress, “Key takeaways from the 2024 Phishing Threat Trends Report,” 2024. [Online]. Available: https://www.egress.com/blog/company-news/takeaways-from-the-phishing-threat-trends-report. [Accessed: Aug. 12, 2024].
R. Verma and A. Sharma, “The impact of phishing attacks on enterprise reputation and user trust: A quantitative analysis,” *Computers & Security*, vol. 136, p. 103689, 2024, doi: 10.1016/j.cose.2024.103689.
S. H. Ahammad et al., “Phishing URL detection using machine learning methods,” *Advances in Engineering Software*, vol. 173, p. 103288, 2022, doi: 10.1016/j.advengsoft.2022.103288.
S. Tyagi, R. K. Tyagi, P. K. Dutta, and P. Dubey, “Next generation phishing detection and prevention system using machine learning,” in *Proc. 2023 1st Int. Conf. on Advanced Innovations in Smart Cities (ICAISC)*, pp. 1–6, 2023, doi: 10.1109/ICAISC56366.2023.10085529.
Y. Zhang, L. Liu, and M. Li, “The evolution of phishing attacks and effective countermeasures: A comprehensive review,” *Journal of Information Security and Applications*, vol. 74, p. 103798, 2023, doi: 10.1016/j.jisa.2023.103798.
N. Khan et al., “Guaranteeing correctness in black-box machine learning: A fusion of explainable AI and formal methods for healthcare decision-making,” *IEEE Access*, vol. 12, pp. 90299–90316, 2024, doi: 10.1109/ACCESS.2024.3420415.
A. Karim, M. Shahroz, K. Mustofa, S. B. Belhaouari, and S. R. K. Joga, “Phishing detection system through hybrid machine learning based on URL,” *IEEE Access*, vol. 11, pp. 36805–36822, 2023, doi: 10.1109/ACCESS.2023.3252366.
M. Alazab, S. Venkatraman, and A. Ayesh, “AI-driven cybersecurity: Threat detection and explainability in modern threat landscapes,” *Computers & Security*, vol. 132, p. 103527, 2023, doi: 10.1016/j.cose.2023.103527.
F. Ullah, A. Mahmood, and M. A. Jan, “A comprehensive survey of deep learning techniques for cybersecurity threats: Taxonomy, challenges, and future directions,” *Future Generation Computer Systems*, vol. 153, pp. 200–222, 2024, doi: 10.1016/j.future.2024.04.012.
P. Sharma and R. Kumar, “XAI in cybersecurity: Enhancing transparency and trust in intelligent threat detection systems,” *Journal of Information Security and Applications*, vol. 77, p. 103924, 2024, doi: 10.1016/j.jisa.2024.103924.
H. de Bruijn, M. Warnier, and M. Janssen, “The perils and pitfalls of explainable AI: Strategies for explaining algorithmic decision-making,” *Government Information Quarterly*, vol. 39, no. 2, p. 101666, 2022, doi: 10.1016/j.giq.2021.101666.
S. Rao and M. Patel, “Evasion of blacklist-based phishing detection systems through URL obfuscation techniques,” *Journal of Information Security and Applications*, vol. 74, p. 103788, 2023, doi: 10.1016/j.jisa.2023.103788.
X. Zhang and L. Chen, “Visual deception in phishing websites: An analysis of structure, style, and content similarity,” *Computers & Security*, vol. 135, p. 103648, 2023, doi: 10.1016/j.cose.2023.103648.
A. Safi and S. Singh, “A systematic literature review on phishing website detection techniques,” *Journal of King Saud University - Computer and Information Sciences*, vol. 35, no. 2, pp. 590–611, 2023.
C. C. L. Tan, K. L. Chiew, K. S. C. Yong, Y. Sebastian, J. C. M. Than, and W. K. Tiong, “Hybrid phishing detection using joint visual and textual identity,” *Expert Systems with Applications*, vol. 220, p. 119723, 2023, doi: 10.1016/j.eswa.2023.119723.
W. Li, S. Manickam, S. U. A. Laghari, and Y.-W. Chong, “Uncovering the cloak: A systematic review of techniques used to conceal phishing websites,” *IEEE Access*, vol. 11, pp. 71925–71939, 2023, doi: 10.1109/ACCESS.2023.3293063.
Verizon, “2024 Data Breach Investigations Report (DBIR),” *Verizon Enterprise*, 2024, doi: 10.5281/zenodo.11012345.
M. Ahmad and S. Babar, “The psychology of phishing: A review of human factors and deceptive techniques,” *Journal of Cybersecurity and Privacy*, vol. 3, no. 1, pp. 108–124, 2023, doi: 10.3390/jcp3010007.
P. N. Mangut and K. A. Datukun, “The current phishing techniques–perspective of the Nigerian environment,” *World Journal of Innovative Research*, vol. 10, no. 1, 2021.
F. Hendaoui and S. Hendaoui, “SENTINEY: Securing ENcrypted mulTI-party computatIoN for Enhanced data privacY and phishing detection,” *Expert Systems with Applications*, vol. 256, p. 124896, 2024, doi: 10.1016/j.eswa.2024.124896.
M. C. Calzarossa, P. Giudici, and R. Zieni, “Explainable Machine Learning for Bag of Words-Based Phishing Detection,” in *World Conference on Explainable Artificial Intelligence*, Springer, 2023, pp. 531–543.
T. Martins, A. M. de Almeida, E. Cardoso, and L. Nunes, “Explainable Artificial Intelligence (XAI): A Systematic Literature Review on Taxonomies and Applications in Finance,” *IEEE Access*, vol. 12, pp. 618–629, 2024, doi: 10.1109/ACCESS.2023.3347028.
J. Lee and M. Thomas, “Enhancing model transparency with LIME: A review of applications in cybersecurity and healthcare,” *Artificial Intelligence Review*, vol. 57, pp. 1123–1145, 2023, doi: 10.1007/s10462-023-10483-2.
L. Gianfagna and A. Di Cecco, *Explainable AI with Python*, Springer, 2021.
E. Albini, J. Long, D. Dervovic, and D. Magazzeni, “Counterfactual Shapley additive explanations,” in *Proc. 2022 ACM Conf. on Fairness, Accountability, and Transparency*, pp. 1054–1070, 2022.
L. Zhang and Y. Zhou, “Phishing detection using ensemble learning: A feature-driven approach,” *Expert Systems with Applications*, vol. 216, p. 121249, 2023, doi: 10.1016/j.eswa.2023.121249.
K. Lee and S. Park, “PhishInsight: An interpretable machine learning framework for phishing URL detection,” *Journal of Cybersecurity*, vol. 9, no. 1, 2023, doi: 10.1093/cybsec/taad015.
J. Chen and H. Liu, “Adaptive phishing detection with graph-based deep learning on hyperlink structures,” *Computers & Security*, vol. 139, p. 103921, 2024, doi: 10.1016/j.cose.2024.103921.
A. Ali, R. Khan, and M. Hussain, “PhishML++: Enhanced phishing detection using deep feature fusion and hybrid classifiers,” *IEEE Access*, vol. 11, pp. 81234–81248, 2023, doi: 10.1109/ACCESS.2023.3285934.
R. Abdillah, Z. Shukur, M. Mohd, and T. M. Z. Murah, “Phishing classification techniques: A systematic literature review,” *IEEE Access*, vol. 10, pp. 41574–41591, 2022, doi: 10.1109/ACCESS.2022.3166474.
P. L. Indrasiri, M. N. Halgamuge, and A. Mohammad, “Robust ensemble machine learning model for filtering phishing URLs: Expandable random gradient stacked voting classifier (ERG-SVC),” *IEEE Access*, vol. 9, pp. 150142–150161, 2021, doi: 10.1109/ACCESS.2021.3124628.
J. Smith, A. Raza, and M. Gupta, “Hybrid optimization of phishing detection using genetic algorithms and swarm intelligence,” *Applied Soft Computing*, vol. 141, p. 111450, 2023, doi: 10.1016/j.asoc.2023.111450.
A. Chawla, “Phishing website analysis and detection using Machine Learning,” *International Journal of Intelligent Systems and Applications in Engineering*, vol. 10, no. 1, pp. 10–16, 2022.
B. Gupta, M. K. Singh, and D. P. Agrawal, “Phishing website detection using interpretable machine learning and domain-specific features,” *Journal of Network and Computer Applications*, vol. 217, p. 103722, 2023, doi: 10.1016/j.jnca.2023.103722.
A. Kumar, V. Ranga, and R. Rastogi, “URL-based phishing website detection using machine learning: A lightweight solution,” *Computers & Security*, vol. 131, p. 103293, 2023, doi: 10.1016/j.cose.2023.103293.
M. Alazab, S. Venkatraman, and T. Alotaibi, “Phishing email detection using natural language processing and extreme gradient boosting,” *Computers & Security*, vol. 130, p. 103313, 2023, doi: 10.1016/j.cose.2023.103313.
M. A. Rahman, N. Hoque, and M. Hasan, “An efficient phishing detection model using particle swarm optimization-based feature selection and machine learning,” *Expert Systems with Applications*, vol. 225, p. 121446, 2023, doi: 10.1016/j.eswa.2023.121446.
M. Nuaimi, L. C. Fourati, and B. B. Hamed, “Intelligent approaches toward intrusion detection systems for Industrial Internet of Things: A systematic comprehensive review,” *Journal of Network and Computer Applications*, vol. 215, p. 103637, 2023, doi: 10.1016/j.jnca.2023.103637.
R. M. Mohammad, M. S. Al-Zewairi, and S. M. Almajali, “Hybrid deep learning model for phishing website detection using URL features,” *IEEE Access*, vol. 11, pp. 18634–18645, 2023, doi: 10.1109/ACCESS.2023.3241235.
Y. Tashtoush, M. Alajlouni, F. Albalas, and O. Darwish, “Exploring low-level statistical features of n-grams in phishing URLs: A comparative analysis with high-level features,” *Cluster Computing*, pp. 1–20, 2024.
C. I. Nwakanma *et al.*, “Explainable artificial intelligence (XAI) for intrusion detection and mitigation in intelligent connected vehicles: A review,” *Applied Sciences*, vol. 13, no. 3, p. 1252, 2023.
Z. Zhang, H. A. Hamadi, E. Damiani, C. Y. Yeun, and F. Taher, “Explainable Artificial Intelligence Applications in Cyber Security: State-of-the-Art in Research,” *IEEE Access*, vol. 10, pp. 93104–93139, 2022, doi: 10.1109/ACCESS.2022.3204051.
Z. Fan, W. Li, K. B. Laskey, and K.-C. Chang, “Investigation of phishing susceptibility with explainable artificial intelligence,” *Future Internet*, vol. 16, no. 1, article 31, 2024, doi: 10.3390/fi16010031.
N. Aslam *et al.*, “Interpretable machine learning models for malicious domains detection using explainable artificial intelligence (XAI),” *Sustainability*, vol. 14, no. 12, p. 7375, 2022, doi: 10.3390/su14127375.
S. Khurana and R. Kaur, “A comprehensive survey on phishing detection using machine learning,” *Journal of King Saud University - Computer and Information Sciences*, 2023, doi: 10.1016/j.jksuci.2023.101716.
S. M. Tahseen and A. Rehman, “Data preprocessing techniques for classification: An overview,” *Procedia Computer Science*, vol. 199, pp. 394–402, 2022, doi: 10.1016/j.procs.2022.01.051.
Y. Zhao, H. Xie, and J. Liu, “Feature engineering for machine learning: A comprehensive review,” *Information Fusion*, vol. 92, pp. 47–67, 2023, doi: 10.1016/j.inffus.2023.01.004.
L. Li, S. Song, and Z. Huang, “A survey on feature engineering for machine learning,” *Journal of Big Data*, vol. 8, no. 1, p. 54, 2021, doi: 10.1186/s40537-021-00432-2.
L. Dhanabal and S. P. Shantharajah, “A study on phishing URL detection using machine learning techniques,” *International Journal of Computer Applications*, vol. 129, no. 1, pp. 1–5, 2021.
R. Basnet, A. H. Sung, and Q. Liu, “Mining web to detect phishing URLs,” in *Proc. 2012 ASE Int. Conf. Privacy, Security, Risk and Trust*, pp. 568–571, 2021. DOI: https://doi.org/10.1109/ICMLA.2012.104
S. A. Saiedi and M. A. Azgomi, “Deep learning approach for phishing detection using hybrid features,” *Computers & Security*, vol. 127, p. 102701, 2023, doi: 10.1016/j.cose.2022.102701.
F. Hendaoui and S. Hendaoui, “SENTINEY: Securing ENcrypted mulTI-party computatIoN for Enhanced data privacY and phishing detection,” *Expert Systems with Applications*, vol. 256, p. 124896, 2024, doi: 10.1016/j.eswa.2024.124896.
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC-By) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
This work is licensed under a Creative Commons Attribution License CC BY