Prior Publisher
The Association of Digital Forensics, Security and Law (ADFSL)
Abstract
In the era of big data, the volume of digital data is increasing rapidly, causing new challenges for investigators to examine the same in a reasonable amount of time. A major requirement of modern forensic investigation is the ability to perform automatic filtering of correlated data, and thereby reducing and focusing the manual effort of the investigator. Approximate matching is a technique to find “closeness” between two digital artifacts. mvHash-B is a well-known approximate matching scheme used for finding similarity between two digital objects and produces a ‘score of similarity’ on a scale of 0 to 100. However, no security analysis of mvHash-B is available in the literature. In this work, we perform the first academic security analysis of this algorithm and show that it is possible for an attacker to “fool” it by causing the similarity score to be close to zero even when the objects are very similar. By similarity of the objects, we mean semantic similarity for text and visual match for images.
The designers of mvHash-B had claimed that the scheme is secure against ‘active manipulation’. We contest this claim in this work. We propose an algorithm that starts with a given document and produces another one of the same size without influencing its semantic and visual meaning (for text and image files, respectively) but which has low similarity score as measured by mvHash-B. In our experiments, we show that the similarity score can be reduced from 100 to less than 6 for text and image documents. We performed experiments with 50 text files and 200 images and the average similarity score between the original file and the file produced by our algorithm was found to be 4 for text files and 6 for image files. In fact, if the original file size is small then the similarity score between the two files was close to 0, almost always.
To improve the security of mvHash-B against active adversaries, we propose a modification in the scheme. We show that the modification prevents the attack we describe in this work.
References
Baier, H., & Breitinger, F. (2011). Security aspects of piecewise hashing in computer forensics. In H. Morgenstern et al. (Eds.), Sixth international conference on IT security incident management and IT forensics, IMF 2011, stuttgart, germany, may 10-12, 2011 (pp. 21{36). IEEE Computer Society. Retrieved from http://dx.doi.org/10.1109/ IMF.2011.16 doi: 10.1109/IMF.2011.16
Breitinger, F., Astebol, K. P., Baier, H., & Busch, C. (2013). mvhash-b - A new approach for similarity preserving hashing. In Seventh international conference on IT security incident management and IT forensics, IMF 2013, nuremberg, germany, march 12-14, 2013 (pp. 33{44).
Breitinger, F., & Baier, H. (2012a). A fuzzy hashing approach based on random sequences and hamming distance. In Proceedings of the conference on digital forensics, security and law (pp. 89{100).
Breitinger, F., & Baier, H. (2012b). Properties of a similarity preserving hash function and their realization in sdhash. In 2012 information security for south africa, balalaika hotel, sandton, johannesburg, south africa, august 15-17, 2012 (pp. 1{8). Retrieved from http://dx.doi.org/10.1109/ISSA.2012.6320445 doi: 10.1109/ISSA.2012.6320445
Breitinger, F., Baier, H., & Beckingham, J. (2012). Security and implementation analysis of the similarity digest sdhash. In First international baltic conference on network security & forensics (nesefo).
Breitinger, F., Guttman, B., McCarrin, M., & Roussev, V. (2014). Approximate matching: definition and terminology. URL http://csrc. nist. gov/publications/drafts/800-168/sp800 168 draft. pdf .
Chang, D., Sanadhya, S. K., Singh, M., & Verma, R. (2015). A collision attack on sdhash similarity hashing. In Proceedings of 10th intl. conference on systematic approaches to digital forensic engineering (pp. 36{46).
Chen, L., & Wang, G. (2008). An efficient piecewise hashing method for computer forensics. In Proceedings of the international workshop on knowledge discovery and data mining, WKDD 2008,adelaide, australia, 23-24 january 2008 (pp. 635{638). IEEE Computer Society. Retrieved from http://dx.doi.org/ 10.1109/WKDD.2008.80 doi: 10.1109/WKDD.2008.80
Divakaran, A. (2008). Multimedia content analysis: Theory and applications (1st ed.). Springer Publishing Company, Incorporated. Harbour, N. (2002). Dcdd. defense computer forensics lab.
Kornblum, J. D. (2006). Identifying almost 12 identical files using context triggered piecewise ashing. Digital Investigation, 3 (Supplement-1), 91{97. Retrieved from http://dx.doi.org/10.1016/
j.diin.2006.06.015 doi: 10.1016/j.diin.2006.06.015
Roussev, V. (2009). Building a better similarity trap with statistically improbable features. In 42st hawaii international international conference on systems science (HICSS-42 2009), proceedings
(CD-ROM and online), 5-8 january 2009, waikoloa, big island, hi, USA (pp. 1{10). IEEE Computer Society. Retrieved from http://dx.doi.org/10.1109/HICSS.2009.97 doi: 10.1109/HICSS.2009.97
Roussev, V. (2010). Data fingerprinting with similarity digests. In K. Chow & S. Shenoi (Eds.), Advances in digital forensics VI - sixth IFIP WG 11.9 international conference on digital forensics, hong kong, china, january 4-6, 2010, revised selected papers (Vol. 337, pp. 207{226). Springer. Retrieved from http://dx.doi.org/10.1007/978-3-642-15506-2 15 doi: 10.1007/978-3-642-15506-2 15
Seo, K., Lim, K., Choi, J., Chang, K., & Lee, S. (2009, 12). Detecting similar files based on hash and statistical analysis for digital forensic investigation. In Proceedings of the 2009 2nd international conference on computer science and its applications, csa 2009. doi: 10.1109/CSA.2009.5404198
Tridgell, A. (2002). Spamsum readme. Retrieved from https://www.samba.org/ftp/unpacked/ junkcode/spamsum/README
Recommended Citation
Chang, Donghoon; Sanadhya, Somitra; and Singh, Monika
(2016)
"Security Analysis of MVhash-B Similarity Hashing,"
Journal of Digital Forensics, Security and Law: Vol. 11
, Article 2.
DOI: https://doi.org/10.15394/jdfsl.2016.1376
Available at:
https://commons.erau.edu/jdfsl/vol11/iss2/2
Included in
Computer Engineering Commons, Computer Law Commons, Electrical and Computer Engineering Commons, Forensic Science and Technology Commons, Information Security Commons