The Association of Digital Forensics, Security and Law (ADFSL)
Spam related cyber crimes have become a serious threat to society. Current spam research mainly aims to detect spam more effectively. We believe the identification and disruption of the supporting infrastructure used by spammers is a more effective way of stopping spam than filtering. The termination of spam hosts will greatly reduce the profit a spammer can generate and thwart his ability to send more spam. This research proposes an algorithm for clustering spam domains extracted from spam emails based on the hosting IP addresses and tracing the IP addresses over a period of time. The results show that many seemingly unrelated spam campaigns are actually related if the domain names in the URLs are investigated; spammers have a sophisticated mechanism for combating URL blacklisting by registering many new domain names every day and flushing out old domains; the domains are hosted at different IP addresses across several networks, mostly in China where legislation is not as tight as in the United States; old IP addresses are replaced by new ones from time to time, but still show strong correlation among them. This paper demonstrates an effective use of data mining to relate spam emails for the purpose of identifying the supporting infrastructure used for spamming and other cyber criminal activities.
Aggarwal, C. C., Han J., Wang, J. and Yu, P. S. (2003). ‘A framework for clustering evolving data stream’. The 29th International Conference on Very Large Data Bases. Sept. 9-12, 2003. Berlin, Germany.
Anderson, D. S., Fleizach, C., Savage, S., and Voelker, G. M. (2007). ‘Spamscatter: Characterizing internet scam hosting infrastructure’. The 16th USENIX Security Symposium. Aug. 6-10, 2007. Boston, MA.
Baase, S. (1988). ‘Graphs and digraphs’, in Computer Algorithms: Introduction to Design and Analysis, (2nd ed.). Addison-Wesley, Boston, MA.
Barbara, D. (2002). ‘Requirements for clustering data streams’. ACM SIGKDD Explorations Newsletter, 3(2), 23 – 27.
Blumstein, A., Cohen, J. and Nagin, D. (Eds.) (1978). Incapacitation: Estimating the Effects of Criminal Sanctions on Crime Rates. National Academy of Sciences, Washington, DC.
Calais, P. H., Pires, D. E. V., Guedes, D. O., Meira, W. Jr., Hoepers, C. and Steding-Jessen, K. (2008). ‘A Campaign-based Characterization of Spamming Strategies’. The 5th Conference on Email and Anti-Spam. Aug. 21-22, 2008. Mountain View, CA.
Clayton, R. (2009). ‘How much did shutting down McColo help?’ The 6th Conference on Email and Anti-Spam. Jul. 16-17, 2009. Mountain View, CA.
Cao, F., Ester, M., Qian, W. and Zhou, A. (2006). ‘Density-Based Clustering over an Evolving Data Stream with Noise’. The 6th SIAM International Conference on Data Mining. Apr. 20-22, 2006. Bethesda, MD.
DiBenedetto, S., Massey, D., Papdopoulos, C. and Walsh P. J. (2009). ‘Analyzing the aftermath of the McColo shutdown’. The 9th Annual International Symposium on Applications and the Internet. Jul. 20-24, 2009. Seattle, WA.
Federal Trade Commission. (2009). ‘FTC Shuts Down Notorious Rogue Internet Service Provider, 3FN Service Specializes in Hosting Spam-Spewing Botnets, Phishing Web sites, Child Pornography, and Other Illegal, Malicious Web Content’, http://www.ftc.gov/opa/2009/06/3fn.shtm, retrieved on Oct 20, 2009.
Kanich, C., Kreibich, C., Levchenko, K., Enright, B., Voelker, G., Paxson, V. and Savage, S. (2008). ‘Spamalytics: An empirical analysis of spam marketing conversion’. The 15th ACM Conference on Computer and Communication Security. Oct. 27-31, Alexandria, VA.
Levenshtein, V. I. (1966). ‘Binary codes capable of correcting insertion and reversals’. Soviet Physics - Doklady, 10, 707 – 710.
McAfee Avert Labs. (2009). ‘McAfee threats report: first quarter 2009’. http://img.en25.com/Web/McAfee/5395rpt_avert_quarterlythreat_0409_v3.pdf, retrieved on Sept 15, 2009.
Mori, T., Esquivel. H., Akella. A., Shimoda, A. and Goto. S (2009). ‘Understanding the World’s Worst Spamming Botnet’. ftp://ftp.cs.wisc.edu/pub/techreports/2009/TR1660.pdf, retrieved on Oct 17, 2009.
Pu, C., and Webb, S. (2006). ‘Observed trends in spam construction techniques: A case study of spam evolution’. The 3rd Conference on Email and Anti-Spam. Jul. 27-28, 2006. Mountain View, CA.
Qi, M., Wang, Y. and Xu, R. (2009) ‘Fighting cybercrime: legislation in China’. International Journal of Electronic Security and Digital Forensics, 2, (2). 219-227.
St Sauver, J. (2008). ‘Spam, domain names and registrars’[PDF document]. MAAWG 12th General Meeting. Feb. 18-20, 2008. San Francisco, CA. http://www.uoregon.edu/~joe/maawg12/domains-talk.pdf, retrieved on Aug. 15, 2009.
Tom. P. (2008). ‘Latent botnet discovery via spam clustering’. The Expanded MIT Spam Conference 2008. Mar. 27-28, 2008. Boston, MA.
Webb, S., Caverlee, J. and Pu, C. (2006). ‘Introducing the Webb Spam Corpus: Using email spam to identify web spam automatically’. The 3rd Conference on Email and Anti-Spam. Jul. 27-28, 2006. Mountain View, CA.
Webb, S., Caverlee, J. and Pu, C. (2007). ‘Characterizing Web Spam Using Content and HTTP Session Analysis’. The 4th Conference on Email and AntiSpam. Aug. 2-3, 2007. Mountain View, CA.
Wei, C, Sprague, A., Warner, G and Skjellum, A. (2009). ‘Characterization of spam advertised website hosting strategy’. The 6th Conference on Email and Anti-Spam. Jul. 16-17, 2009. Mountain View, CA.
WikiPedia. (2009). ‘Wildcard DNS Record’. http://en.wikipedia.org/wiki/Wildcard_DNS_record, Retrieved on Jun. 10, 2009.
Zhang, T., Ramakrishnan R. and Livny, M. (1996). ‘BIRCH: An Efficient Data Clustering Method for Very Large Databases’. The 1996 ACM SIGMOD International Conference on Management of Data. Jun. 4-6, 1996. Montreal, Canada.
Wei, Chun; Sprague, Alan; Warner, Gary; and Skjellum, Anthony
"Clustering Spam Domains and Destination Websites: Digital Forensics with Data Mining,"
Journal of Digital Forensics, Security and Law: Vol. 5
, Article 2.
Available at: http://commons.erau.edu/jdfsl/vol5/iss1/2