Phishing continues to grow as phishers discover new exploits and attack vectors for hosting malicious content; the traditional response using takedowns and blacklists does not appear to impede phishers significantly. A handful of law enforcement projects — for example the FBI's Digital PhishNet and the Internet Crime and Complaint Center (ic3.gov) — have demonstrated that they can collect phishing data in substantial volumes, but these collections have not yet resulted in a significant decline in criminal phishing activity. In this paper, a new system is demonstrated for prioritizing investigative resources to help reduce the time and effort expended examining this particular form of online criminal activity. This research presents a means to correlate phishing websites by showing that certain websites are created by the same phishing kit. Such kits contain the content files needed to create the counterfeit website and often contain additional clues to the identity of the creators. A clustering algorithm is presented that uses collected phishing kits to establish clusters of related phishing websites. The ability to correlate websites provides law enforcement or other potential stakeholders with a means for prioritizing the allocation of limited investigative resources by identifying frequently repeating phishing offenders.


Aaron, G. and Rasmussen, R. (2010). ‘Global Phishing Survey 2H/2009’. Counter eCrime Operations Summit IV. May 11-13, 2010. Säo Paulo, Brazil.

Abu-Nimeh, S., Nappa, D., Wang, X., and Nair, S. (2007). ‘A Comparison of Machine Learning Techniques for Phishing Detection’. APWG eCrime Researchers Summit, October 4-5, 2007. Pittsburgh, PA.

Anti-Phishing Working Group (2009). ‘APWG/CMU CUPS Phishing Education Landing Page Project: Optimizing Counter-eCrime Consumer Education Through Just-in-Time Delivery of Computer Safety Instruction’, APWG Public Education Initiative, Lexington, MA.

Anti-Phishing Working Group (2010), ‘APWG’, http://www.antiphishing.org/, July 17, 2010.

Basnet, R., Mukkamala, S., and Sung, A. (2008), "Detection of Phishing Attacks: A Machine Learning Approach," Studies in Fuzziness and Soft Computing, 226: 373-383.

Cao, Y., Han, W., and Le, Y. (2008). ‘Anti-phishing Based on Automated Individual White-list’. ACM Workshop on Digital Identity Management. October 31, 2008. Alexandria, VA.

Chandrasekaran M., Narayanan, K., and Upadhyaya, S. (2006). ‘Phishing Email Detection Based on Structural Properties’. New York State Cybersecurity Conference Symposium on Information Assurance: Intrusion Detection and Prevention. July 14-15, 2006. Albany, NY.

Cova, M., Kruegel, C., Vigna, G. (2008). ‘There is No Free Phish: An Analysis of "Free" and Live Phishing Kits’. USENIX Workshop on Offensive Technologies. July 28, 2008. San Jose, CA.

Digital PhishNet (2010), ‘Digital PhishNet’, https://www.digitalphishnet.org/, July 17, 2010.

Fette, I., Sadeh, N., and Tomasic, A. (2007). ‘Learning to detect phishing emails’. International Conference on World Wide Web. May 8-12, 2007. Banff, Alberta, Canada.

Han, J., and Kamber, M. (2001), Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, CA.

Internet Crime Complaint Center (2010), ‘Internet Crime Complaint Center’, http://www.ic3.gov/, July 17, 2010.

Irani, D., Webb, S. Giffin, J., Pu, C. (2008) ‘Evolutionary Study of Phishing’. eCrime Researchers Summit. October 14-16, 2008. Atlanta, GA.

Jakobsson, M. and Myers, S., Eds. (2006), Phishing and Countermeasures: Understanding the Increasing Problem of Electronic Identity Theft, WileyInterscience, Hoboken, NJ.

Kulczynski, S. (1927), "Die Pflanzenassoziationen der Pienienen," Intern. Acad. Pol. Sci. Lett. Cl. Sci. Math. Nat, Ser. B, 1927(2): 180.

Lanier, M.M. and Henry, S. (2004), Essential Criminology, Westview Press, Boulder, CO.

L'Huillier, G., Weber, R., and Figueroa, N. (2009). ‘Online Phishing Classification Using Adversarial Data mining and Signaling Games’. ACM SIGKDD Workshop on Cybersecurity and Intelligence Informatics. June 28, 2009. Paris, France.

Li, S. and Schmitz, R. (2009). ‘A Novel Anti-Phishing Framework Based on Honeypots’. APWG eCrime Researchers Summit. October 20-21, 2009. Tacoma, WA.

Litan, A. (2009), ‘The War on Phishing Is Far From Over’. Gartner, Inc., Research ID Number G00166605, April 2, 2009.

Ludl, C., McAllister, S., Kirda, E., and Kruegel, C. (2007). ‘On the Effectiveness of Techniques to Detect Phishing Sites’. International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. July 12-13, 2007. Lucerne, Switzerland.

McAfee (2010), ‘McAfee SiteAdvisor Software’, http://www.siteadvisor.com/, July 17, 2010.

Microsoft Safety (2010), ‘Anti-phishing Technologies’, http://www.microsoft.com/mscorp/safety/technologies/antiphishing/, April 8, 2010.

Microsoft Safety (2010), ‘Sender ID’, http://www.microsoft.com/mscorp/safety/technologies/senderid/default.mspx, April 8, 2010.

Moore, T. and Clayton, R. (2007). ‘Examining the Impact of Website Takedown on Phishing’. APWG eCrimes Researchers Summit. October 4-5, 2007. Pittsburgh, PA.

Mozilla Messenging (2010), ‘Thunderbird 3 Features’, http://www.mozillamessaging.com/en-US/thunderbird/features/, April 8, 2010.

Netcraft (2010), ‘Anti-Phishing Toolbar’, http://toolbar.netcraft.com/, June 30, 2010.

Prakash, P., Kumar, M., Kompella, R.R., and Gupta, M. (2010). ‘PhishNet: Predictive Blacklisting to Detect Phishing Attacks’. IEEE Conference on Computer Communications. March 15-19, 2010. San Diego, CA.

Provos, N. (2009). ‘Google: Malware Sites on the Upswing’. Eweek, August 27, 2009. http://securitywatch.eweek.com/online_malware/google_malware_sites_on_th e_upswing.html

Ronda, T., Saroiu, S., and Wolman, A. (2008). ‘Itrustpage: a User-assisted Anti-phishing Tool’. ACM Sigops/Eurosys European Conference on Computer Systems. April 1-4, 2008. Glasgow, Scotland.

Saberi, A., Vahidi, M., and Bidgoli, B.M. (2007). ‘Learn to Detect Phishing Scams Using Learning and Ensemble Methods’. Web Intelligence and Intelligent Agent Technology Workshops, IEEE/WIC/ACM International Conferences. November 2-5, 2007. Silicon Valley, CA.

Sheng, S., Wardman, B., Warner, G., Cranor, L.F., Hong, J., and Zhang, C. (2009). ‘An Empirical Analysis of Phishing Blacklists’. CEAS 2009 – Sixth Conference on Email and Anti-Spam. July 16-17, 2009. Mountain View, CA.

Soldo, F., El Defrawy, K., Markopoulou, A., Krishnamurthy, B., van der Merwe, J. (2008). ‘Filtering Sources of Unwanted Traffic’. Information Theory and Applications Workshop. Jan 27, 2008 – Feb 1, 2008. San Diego, CA.

Stamm, S., Ramzan, Z., and Jakobsson, M. (2006). ‘Drive-by Pharming’. Information and Communication Security 2007. December 12-15, 2010. Zhengzhou, China.

Symantec (2010), ‘Firewall – Anti Virus – Phishing Protection | Norton 360’, http://www.symantec.com/norton/360, July 17, 2010.

Tauberer, J. (2008), ‘Add-ons for Thunderbird: Sender Verification Antiphishing Extension’, https://addons.mozilla.org/enUS/thunderbird/addon/345, April 8, 2010.

Toolan, F. and Carthy, J. (2009). ‘Phishing Detection Using Classifier Ensembles’. APWG eCrime Researchers Summit. October 20-21, 2009.Tacoma, WA.

Valdes, A., Almgren, M., Cheung, S., Deswarte, Y., Dutertre, B., Levy, J., Saïdi, H., Stavridou, V., and Uribe, T. (2003). “An Architecture for an Adaptive Intrusion-Tolerant Server,” Security Protocols: Lecture Notes in Computer Science, 2845: 569-574.

Wardman, B. (2010). ‘UAB Phishing Data Mine’. University of Alabama at Birmingham Computer and Information Sciences Department Technical Report Number : UABCIS-TR-2010-111710-1. November 17, 2010. http://www.cis.uab.edu/forensics/TechReports/PhishingDataMine.pdf.

Wardman, B., Shukla, G., and Warner, G. (2009). ‘Identifying Vulnerable Websites by Analysis of Common Strings in Phishing URLs’. APWG eCrime Researchers Summit. October 20-21, 2009. Tacoma, WA.

Wardman, B. and Warner, G. (2008). ‘Automating Phishing Website Identification Through Deep MD5 Matching’. eCrime Researchers Summit. October 15-16, 2008. Atlanta, GA.

Weaver, R. and Collins, M. (2007). ‘Fishing for Phishes: Applying CaptureRecapture Methods to Estimate Phishing Populations’. APWG eCrime Researchers Summit. October 4-5, 2007. Pittsburgh, PA.





To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.