Prior Publisher

The Association of Digital Forensics, Security and Law (ADFSL)


Investigating seized devices within digital forensics represents a challenging task due to the increasing amount of data. Common procedures utilize automated file identification, which reduces the amount of data an investigator has to examine manually. In the past years the research field of approximate matching arises to detect similar data. However, if n denotes the number of similarity digests in a database, then the lookup for a single similarity digest is of complexity of O(n). This paper presents a concept to extend existing approximate matching algorithms, which reduces the lookup complexity from O(n) to O(log(n)). Our proposed approach is based on the well-known divide and conquer paradigm and builds a Bloom filter-based tree data structure in order to enable an efficient lookup of similarity digests. Further, it is demonstrated that the presented technique is highly scalable operating a trade-off between storage requirements and computational efficiency. We perform a theoretical assessment based on recently published results and reasonable magnitudes of input data, and show that the complexity reduction achieved by the proposed technique yields a 2 20-fold acceleration of look-up costs.


Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM , 13 , 422--426.

Breitinger, F., & Baier, H. (2013). Similarity preserving hashing: Eligible properties and a new algorithm mrsh-v2. In M. Rogers & K. Seigfried-Spellar (Eds.), Digital forensics and cyber crime (Vol. 114, pp. 167--182).

Springer Berlin Heidelberg. Retrieved from http://dx.doi.org/10.1007/ 978-3-642-39891-9 11 doi: 10.1007/978-3-642-39891-9 11

Breitinger, F., Baier, H., & White, D. (2014). On the database lookup problem of approximate matching. 1st Digital Forensics Research Conference EU (DFRWS-EU’14).

Breitinger, F., Liu, H., Winter, C., Baier, H., Rybalchenko, A., & Steinebach, M. (2013, Sept). Towards a process model for hash functions in digital forensics. 5th International Conference on Digital Forensics & Cyber Crime.

Breitinger, F., Stivaktakis, G., & Baier, H. (2013, August). FRASH: A framework to test algorithms of similarity hashing. In 13th Digital Forensics Research Conference (DFRWS’13). Monterey.

Breitinger, F., Stivaktakis, G., & Roussev, V. (2013, Sept). Evaluating detection error trade-offs for bytewise approximate matching algorithms. 5th ICST Conference on Digital Forensics & Cyber Crime (ICDF2C).

Broder, A., & Mitzenmacher, M. (2005). Network Applications of Bloom Filters: A Survey. Internet Mathematics, 1 (4), 485--509.

Gallagher, P., & Director, A. (1995). Secure Hash Standard (SHS) (Tech. Rep.). National Institute of Standards and Technologies, Federal Information Processing Standards Publication 180-1.

Kornblum, J. (2006, September). Identifying almost identical files using context triggered piecewise hashing. Digital Investigation, 3 , 91--97. Retrieved from http://dx.doi.org/10.1016/ j.diin.2006.06.015 doi: 10.1016/j.diin.2006.06.015

Mullin, J. (1990, may). Optimal semijoins for distributed database systems. IEEE Transactions on Software Engineering, 16 (5), 558 -560. doi: 10.1109/32.52778

NIST Information Technology Laboratory. (2013). National Software Reference Library. (http://www.nsrl.nist.gov)

Noll, L. C. (1994--2012). Fnv hash. http://www.isthe.com/chongo/tech/ comp/fnv/index.html.

Roussev, V. (2010). Data fingerprinting with similarity digests. In K.-P. Chow & S. Shenoi (Eds.), Advances in digital forensics vi (Vol. 337, pp. 207--226).

Springer Berlin Heidelberg. Retrieved from http://dx.doi.org/10.1007/ 978-3-642-15506-2 15 doi: 10.1007/978-3-642-15506-2\ 15

Roussev, V. (2011, August). An evaluation of forensic similarity hashes. Digital Investigation, 8 , 34--41. Retrieved from http://dx.doi.org/10.1016/ j.diin.2011.05.005 doi: 10.1016/j.diin.2011.05.005

Roussev, V. (2012). Managing terabyte-scale investigations with similarity digests. In G. Peterson & S. Shenoi (Eds.), Advances in digital forensics viii (Vol. 383, pp. 19--34).Springer Berlin Heidelberg. Retrieved from http://dx.doi.org/ 10.1007/978-3-642-33962-2 2 doi: 10.1007/978-3-642-33962-2 2

Roussev, V., III, G. G. R., & Marziale, L. (2007, September). Multi-resolution similarity hashing. Digital Investigation, 4 , 105--113. doi: 10.1016/j.diin.2007.06.011

Winter, C., Schneider, M., & Yannikos, Y. (2013). F2s2: Fast forensic similarity search through indexing piecewise hashsignatures. Digital Investigation, 0 , -. (Article in Press - no journal issue assigned by now) doi: http://dx.doi.org/ 10.1016/j.diin.2013.08.003



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.