Prior Publisher
The Association of Digital Forensics, Security and Law (ADFSL)
Abstract
Investigating seized devices within digital forensics represents a challenging task due to the increasing amount of data. Common procedures utilize automated file identification, which reduces the amount of data an investigator has to examine manually. In the past years the research field of approximate matching arises to detect similar data. However, if n denotes the number of similarity digests in a database, then the lookup for a single similarity digest is of complexity of O(n). This paper presents a concept to extend existing approximate matching algorithms, which reduces the lookup complexity from O(n) to O(log(n)). Our proposed approach is based on the well-known divide and conquer paradigm and builds a Bloom filter-based tree data structure in order to enable an efficient lookup of similarity digests. Further, it is demonstrated that the presented technique is highly scalable operating a trade-off between storage requirements and computational efficiency. We perform a theoretical assessment based on recently published results and reasonable magnitudes of input data, and show that the complexity reduction achieved by the proposed technique yields a 2 20-fold acceleration of look-up costs.
References
Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM , 13 , 422--426.
Breitinger, F., & Baier, H. (2013). Similarity preserving hashing: Eligible properties and a new algorithm mrsh-v2. In M. Rogers & K. Seigfried-Spellar (Eds.), Digital forensics and cyber crime (Vol. 114, pp. 167--182).
Springer Berlin Heidelberg. Retrieved from http://dx.doi.org/10.1007/ 978-3-642-39891-9 11 doi: 10.1007/978-3-642-39891-9 11
Breitinger, F., Baier, H., & White, D. (2014). On the database lookup problem of approximate matching. 1st Digital Forensics Research Conference EU (DFRWS-EU’14).
Breitinger, F., Liu, H., Winter, C., Baier, H., Rybalchenko, A., & Steinebach, M. (2013, Sept). Towards a process model for hash functions in digital forensics. 5th International Conference on Digital Forensics & Cyber Crime.
Breitinger, F., Stivaktakis, G., & Baier, H. (2013, August). FRASH: A framework to test algorithms of similarity hashing. In 13th Digital Forensics Research Conference (DFRWS’13). Monterey.
Breitinger, F., Stivaktakis, G., & Roussev, V. (2013, Sept). Evaluating detection error trade-offs for bytewise approximate matching algorithms. 5th ICST Conference on Digital Forensics & Cyber Crime (ICDF2C).
Broder, A., & Mitzenmacher, M. (2005). Network Applications of Bloom Filters: A Survey. Internet Mathematics, 1 (4), 485--509.
Gallagher, P., & Director, A. (1995). Secure Hash Standard (SHS) (Tech. Rep.). National Institute of Standards and Technologies, Federal Information Processing Standards Publication 180-1.
Kornblum, J. (2006, September). Identifying almost identical files using context triggered piecewise hashing. Digital Investigation, 3 , 91--97. Retrieved from http://dx.doi.org/10.1016/ j.diin.2006.06.015 doi: 10.1016/j.diin.2006.06.015
Mullin, J. (1990, may). Optimal semijoins for distributed database systems. IEEE Transactions on Software Engineering, 16 (5), 558 -560. doi: 10.1109/32.52778
NIST Information Technology Laboratory. (2013). National Software Reference Library. (http://www.nsrl.nist.gov)
Noll, L. C. (1994--2012). Fnv hash. http://www.isthe.com/chongo/tech/ comp/fnv/index.html.
Roussev, V. (2010). Data fingerprinting with similarity digests. In K.-P. Chow & S. Shenoi (Eds.), Advances in digital forensics vi (Vol. 337, pp. 207--226).
Springer Berlin Heidelberg. Retrieved from http://dx.doi.org/10.1007/ 978-3-642-15506-2 15 doi: 10.1007/978-3-642-15506-2\ 15
Roussev, V. (2011, August). An evaluation of forensic similarity hashes. Digital Investigation, 8 , 34--41. Retrieved from http://dx.doi.org/10.1016/ j.diin.2011.05.005 doi: 10.1016/j.diin.2011.05.005
Roussev, V. (2012). Managing terabyte-scale investigations with similarity digests. In G. Peterson & S. Shenoi (Eds.), Advances in digital forensics viii (Vol. 383, pp. 19--34).Springer Berlin Heidelberg. Retrieved from http://dx.doi.org/ 10.1007/978-3-642-33962-2 2 doi: 10.1007/978-3-642-33962-2 2
Roussev, V., III, G. G. R., & Marziale, L. (2007, September). Multi-resolution similarity hashing. Digital Investigation, 4 , 105--113. doi: 10.1016/j.diin.2007.06.011
Winter, C., Schneider, M., & Yannikos, Y. (2013). F2s2: Fast forensic similarity search through indexing piecewise hashsignatures. Digital Investigation, 0 , -. (Article in Press - no journal issue assigned by now) doi: http://dx.doi.org/ 10.1016/j.diin.2013.08.003
Recommended Citation
Breitinger, Frank; Rathgeb, Christian; and Baier, Harald
(2014)
"An Efficient Similarity Digests Database Lookup – A Logarithmic Divide & Conquer Approach,"
Journal of Digital Forensics, Security and Law: Vol. 9
, Article 13.
DOI: https://doi.org/10.15394/jdfsl.2014.1178
Available at:
https://commons.erau.edu/jdfsl/vol9/iss2/13
Included in
Computer Engineering Commons, Computer Law Commons, Electrical and Computer Engineering Commons, Forensic Science and Technology Commons, Information Security Commons