Phishing is currently one of the severest cybersecurity challenges facing the emerging online community. With damages running into millions of dollars in financial and brand losses, the sad tale of phishing activities continues unabated. This led to an arms race between the con artists and online security community which demand a constant investigation to win the cyberwar. In this paper, a new approach to phishing is investigated based on the concept of minimal feature set on some selected remarkable machine learning algorithms. The goal of this is to select and determine the most efficient machine learning methodology without undue high computational requirement usually occasioned by non-minimal feature corpus. Using the frequency analysis approach, a 13-dimensional feature set consisting of 85% URL-based feature category and 15% non-URL-based feature category was generated. This is because the URL-based features are observed to be more regularly exploited by phishers in most zero-day attacks. The proposed minimal feature set is then trained on a number of classifiers consisting of Random Tree, Decision Tree, Artificial Neural Network, Support Vector Machine and Naïve Bayes. Using 10 fold-cross validation, the approach was experimented and evaluated with a dataset consisting of 10000 phishing instances. The results indicate that Random Tree outperforms other classifiers with significant accuracy of 96.1% and a Receiver’s Operating Curve (ROC) value of 98.7%. Thus, the approach provides the performance metrics of various state of art machine learning approaches popular with phishing detection which can stimulate further deeper research work in the evaluation of other ML techniques with the minimal feature set approach.


[1] Action Fraud Security Report 2020

[2] Adebowale M., Lwin K., Sanchez E and Hossain M. (2018). Intelligent Web-Phishing Detection and Protection Scheme using integrated Features of Images, Frames and Text. Expert System with Applications.

[3] CSO Online report on phishing activities. Accessed 2016 (www.csoonline.com/articles)

[4] Chiew L., Chang H., Sze N and Tiong K. (2015.) Utilization of website logo for phishing detection. Computer and Security Journal.

[5] Garera S., Provos N., Chew M., and Rubin A. (2007). A Framework for Detection and Measurement of Phishing Attacks. In Proc. of WORM 07 ACM. USA

[6] Gowtham R and Krishnamurthi I. (2014). PhishTackle-a web services architecture for anti-phishing. Cluster Comput.

[7] Han W, Cao Y, Bertino E and Yong J. (2012).Using automated individual white-list to protect web digital identities. Expert Systems with Applications.

[8] Hamid A and Abawajy, J. 2014. An approach to profiling phishing activities. Journal of computer and security. Elsevier Press

[9] Hota H.S, Shrivas A.K and Hota R. (2018). An Ensemble Model for Detecting Phishing Attack with Proposed Remove-Replace Feature Selection Technique. International Conference on Computational Intelligence and Data Science. Procedia Computer Science. Vo. 123, pp. 900-907

[10] Jain A and Gupta B. (2017). Two-level authentication approach to protect from phishing attacks in real-time. J. Ambient Intell Human Comp. DOI 10.1007/s12652-017-0616-z

[11] Jain AK, Gupta BB (2016) A novel approach to protect against phishing

attacks at client side using auto-updated white-list. EURASIP J Inf Secur 2016:1–11

[12] Kanchan H, Laxmi A, S.K. Muttoo (2017) Detecting redirection spam using multilayer perceptron neural network. Soft Comput. 21 (13) 3803–3814.

[13] Khadi, S. Shinde, Detection of phishing websites using data mining techniques, Int. J. Eng. Res. Technol. 2 (12) (2014).

[14] LOC Security report, 2020

[15] Mao J, Bian J., Tian W., Zhu S., Wei T., Li. A. and Liang Z. (2019). Phishing Page detection via classifier from page layout feature. EURASIP Journal of Wireless Communication and Networking. Vol 43,

[16] Mohammad R and Thabtah L and McCluskey. (2014). Tutorial and critical analysis of phishing websites methods. Comp Sci. Rev. J

[17] Mohammad R, F. Thabtah, L. McCluskey, Predicting phishing websites based on self-structuring neural network (2014), J. Neural Comput. Appl. (ISSN: 0941-0643) 25 (2) 443–458. Springer.

[18] Oest A. Safaei Y. and Zhang P. (2020). PhishTime: Continuous Longitudinal Measurement of the Effectiveness of Anti-phishing Blacklists. 29th Usenix Security Symposium

[19] Orunsolu A., Afolabi O, Sodiya A and Akinwale A. (2018). A Users Awareness Study and Influence of Socio-Demography Perception of Anti-Phishing Security Tips. Acta Informatica Pragensia.

[20] Orunsolu A. Sodiya S. and Akinwale A. (2019). A Predictive Model for Phishing Detection. Journal of King Saud University-Computer and Information Sciences.

[21] Orunsolu A, Sodiya A and Kareem S. (2020). LinkCalculator- An Efficient Link-Based Phishing Detection Tool. Acta Informatica Malaysia

[22] Pham C, Nguyen L. Tan N, Huh N and Hong S. (2018). Phishing-Aware: A Neuro-Fuzzy Approach for Anti-Phishing on Fog Networks. IEEE Transactions on Network and Service Management

[23] Prakash P., Kumar M., Kompella R and Gupta M (2010). PhishNet: predictive blacklisting to detect phishing attacks Proceedings of 29the Conference on Information Communications.

[24] Phishtank dataset (2018). http://www.phishtank.com.

[25] Qabajeh I., Thabtah F. and Chiclana F. (2018). A recent review of conventional vs. automated cybersecurity anti-phishing techniques. Computer Science Review.

[26] Tan C., Chiew L and Sze N. (2017). Phishing Webpage Detection Using Weighted URL Tokens for Identity Keywords Retrieval. Lecture Notes in Electrical Engineering. Vol. 398.

[27] Stats and Trends Security report 2017

[28] Seymour J, P. Tully, Generative Models for Spear Phishing Posts on SocialMedia. Technical report, 2018.

[29] Silva C, Feitosa E and Garcia V. (2020). Heuristic-based strategy for phishing prediction: A survey of URL-based approach. Computers and Security Journal. Elsevier.

[30] Varshney G, Misra M. and Atrey P. (2016). A survey and classification of web phishing detection Schemes. Security Comm. Networks

[31] Varshney G, Misra M., and Atrey K. (2016). A phish detector using lightweight search features. Comput Secur;62:213–28.

[32]Zhu E, Ju Y, Chen Z, Liu F and Fang X. (2020). DTOF-ANN: An Artificial Neural Network phishing detection model based on Decision Tree and Optimal Features. Applied Soft Computing 95 10.1016/j.asoc.2020.106505



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.