Prior Publisher

The Association of Digital Forensics, Security and Law (ADFSL)


In recent years, Internet technologies changed enormously and allow faster Internet connections, higher data rates and mobile usage. Hence, it is possible to send huge amounts of data / files easily which is often used by insiders or attackers to steal intellectual property. As a consequence, data leakage prevention systems (DLPS) have been developed which analyze network traffic and alert in case of a data leak. Although the overall concepts of the detection techniques are known, the systems are mostly closed and commercial. Within this paper we present a new technique for network traffic analysis based on approximate matching (a.k.a fuzzy hashing) which is very common in digital forensics to correlate similar files. This paper demonstrates how to optimize and apply them on single network packets. Our contribution is a straightforward concept which does not need a comprehensive configuration: hash the file and store the digest in the database. Within our experiments we obtained false positive rates between 10−4 and 10−5 and an algorithm throughput of over 650 Mbit/s.


Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM , 13 , 422–426.

Breitinger, F., & Baier, H. (2012, October). Similarity Preserving Hashing: Eligible Properties and a new Algorithm MRSH-v2. 4th ICST Conference on Digital Forensics & Cyber Crime (ICDF2C).

Breitinger, F., Baier, H., & White, D. (2014). On the database lookup problem of approximate matching. 1st Digital Forensics Research Conference EU (DFRWS-EU’14).

Breitinger, F., Guttman, B., McCarrin, M., & Roussev, V. (2014). Approximate matching: Definition and terminology. NIST Special Publication 800-168 , DRAFT.

Breitinger, F., Liu, H., Winter, C., Baier, H., Rybalchenko, A., & Steinebach, M. (2013, Sept). Towards a process model for hash functions in digital forensics. 5th International Conference on Digital Forensics & Cyber Crime.

Breitinger, F., Stivaktakis, G., & Roussev, V. (2013, Sept). Evaluating detection error trade-offs for bytewise approximate matching algorithms. 5th ICST Conference on Digital Forensics & Cyber Crime (ICDF2C).

Dharmapurikar, S., Krishnamurthy, P., Sproull, T., & Lockwood, J. (2003). Deep packet inspection using parallel bloom filters. In High performance interconnects, 2003. proceedings. 11th symposium on (p. 44-51).

Gallagher, P., & Director, A. (1995). Secure Hash Standard (SHS) (Tech. Rep.). National Institute of Standards and Technologies, Federal Information Processing Standards Publication 180-1.

Garfinkel, S. L. (2010, August). Digital forensics research: The next 10 years. Digitial Investigation, 7 , 64–73. Retrieved from http://dx.doi.org/ 10.1016/j.diin.2010.05.009 doi: 10.1016/j.diin.2010.05.009 IEEE 802.3

Ethernet Working Group. (2012, July). Industry Connections Ethernet Bandwidth Assessment (Tech. Rep.). IEEE.

Kornblum, J. (2006, September). Identifying almost identical files using context triggered piecewise hashing. Digital Investigation, 3 , 91–97. Retrieved from http://dx.doi.org/10.1016/ j.diin.2006.06.015 doi: 10.1016/j.diin.2006.06.015

Lawton, G. (2008). New technology prevents data leakage. Computer , 41 (9), 14–17.

Proter, T. (2010, October). The perils of deep packet inspection. Symantec.com. Retrieved from http:// www.symantec.com/connect/articles/ perils-deep-packet-inspection

Radicati, S., & Hoang, Q. (2011, May). Email statistics report, 2011-2015 (Tech. Rep.). 1900 Embarcadero road, suite 206., Palo Alto, CA, 94303: Radicati Group, INC. Retrieved from http://www.radicati.com/wp/ wp-content/uploads/2011/05/ Email-Statistics-Report-2011-2015 -Executive-Summary.pdf

Roussev, V. (2010). Data fingerprinting with similarity digests. In K.-P. Chow & S. Shenoi (Eds.), Advances in digital forensics vi (Vol. 337, pp. 207–226).

Springer Berlin Heidelberg. Retrieved from http://dx.doi.org/10.1007/ 978-3-642-15506-2 15 doi: 10.1007/978-3-642-15506-2 15

Roussev, V. (2011, August). An evaluation of forensic similarity hashes. Digital Investigation, 8 , 34–41. Retrieved from http://dx.doi.org/10.1016/ j.diin.2011.05.005 doi: 10.1016/j.diin.2011.05.005

Roussev, V., III, G. G. R., & Marziale, L. (2007, September). Multi-resolution similarity hashing. Digital Investigation, 4 , 105–113. doi: 10.1016/j.diin.2007.06.011

SANS Institute. (2010). Understanding and selecting a data loss prevention solution. Securosis, L.L.C. Shannon, C. E. (January 2001, January). A mathematical theory of communication. SIGMOBILE Mob. Comput. Commun. Rev., 3–55.

Tipton, H. F., & Krause, M. (2003). Information security management handbook (5th ed.). Auerbach Publications.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.