Proposal / Submission Type

Peer Reviewed Paper

Location

Daytona Beach, Florida

Start Date

21-5-2015 9:10 AM

Abstract

Phishing websites, phish, attempt to deceive users into exposing their passwords, user IDs, and other sensitive information by imitating legitimate websites, such as banks, product vendors, and service providers. Phishing investigators need fast automated tools to analyze the volume of phishing attacks seen today. In this paper, we present the Simple Set Comparison tool. The Simple Set Comparison tool is a fast automated tool that groups phish by imitated brand allowing phishing investigators to quickly identify and focus on phish targeting a particular brand. The Simple Set Comparison tool is evaluated against a traditional clustering algorithm over a month's worth of phishing data, 19,825 confirmed phish. The results show clusters of comparable quality, but created more than 37 times faster than the traditional clustering algorithm.

Keywords: phishing, phish kits, phishing investigation, data mining, parallel processing

Comments

Session Chair: Ezhil S. Kalaimannan, University of West Florida

 
May 21st, 9:10 AM

Phishing Intelligence Using the Simple Set Comparison Tool

Daytona Beach, Florida

Phishing websites, phish, attempt to deceive users into exposing their passwords, user IDs, and other sensitive information by imitating legitimate websites, such as banks, product vendors, and service providers. Phishing investigators need fast automated tools to analyze the volume of phishing attacks seen today. In this paper, we present the Simple Set Comparison tool. The Simple Set Comparison tool is a fast automated tool that groups phish by imitated brand allowing phishing investigators to quickly identify and focus on phish targeting a particular brand. The Simple Set Comparison tool is evaluated against a traditional clustering algorithm over a month's worth of phishing data, 19,825 confirmed phish. The results show clusters of comparable quality, but created more than 37 times faster than the traditional clustering algorithm.

Keywords: phishing, phish kits, phishing investigation, data mining, parallel processing