Wednesday, May 30, 2012

A Fuzzy Hashing Approach Based on Random Sequences and Hamming Distance

Frank Breitinger, Center for Advanced Security Research Darmstadt (CASED) and Department of Computer Science, Hochschule Darmstadt, GermanyFollow
Harald Baier, Center for Advanced Security Research Darmstadt (CASED) and Department of Computer Science, Hochschule Darmstadt, GermanyFollow

Proposal / Submission Type

Peer Reviewed Paper

Location

Richmond, Virginia

Start Date

30-5-2012 1:45 PM

Abstract

Hash functions are well-known methods in computer science to map arbitrary large input to bit strings of a fixed length that serve as unique input identifier/fingerprints. A key property of cryptographic hash functions is that even if only one bit of the input is changed the output behaves pseudo randomly and therefore similar files cannot be identified. However, in the area of computer forensics it is also necessary to find similar files (e.g. different versions of a file), wherefore we need a similarity preserving hash function also called fuzzy hash function. In this paper we present a new approach for fuzzy hashing called bbHash. It is based on the idea to ‘rebuild’ an input as good as possible using a fixed set of randomly chosen byte sequences called building blocks of byte length l (e.g. l= 128 ). The proceeding is as follows: slide through the input byte-by-byte, read out the current input byte sequence of length l , and compute the Hamming distances of all building blocks against the current input byte sequence. Each building block with Hamming distance smaller than a certain threshold contributes the file’s bbHash. We discuss (dis- )advantages of our bbHash to further fuzzy hash approaches. A key property of bbHash is that it is the first fuzzy hashing approach based on a comparison to external data structures.

Keywords: Fuzzy hashing, similarity preserving hash function, similarity digests, Hamming distance, computer forensics.

Scholarly Commons Citation

Breitinger, Frank and Baier, Harald, "A Fuzzy Hashing Approach Based on Random Sequences and Hamming Distance" (2012). Annual ADFSL Conference on Digital Forensics, Security and Law. 15.
https://commons.erau.edu/adfsl/2012/wednesday/15

Download

Included in

Computer Engineering Commons, Computer Law Commons, Electrical and Computer Engineering Commons, Forensic Science and Technology Commons, Information Security Commons

COinS

May 30th, 1:45 PM

A Fuzzy Hashing Approach Based on Random Sequences and Hamming Distance

Richmond, Virginia

Keywords: Fuzzy hashing, similarity preserving hash function, similarity digests, Hamming distance, computer forensics.

Wednesday, May 30, 2012

A Fuzzy Hashing Approach Based on Random Sequences and Hamming Distance

Proposal / Submission Type

Location

Start Date

Abstract

Scholarly Commons Citation

Included in

Conference Home

Links

Search

Browse

Wednesday, May 30, 2012

A Fuzzy Hashing Approach Based on Random Sequences and Hamming Distance

Presenter Information

Proposal / Submission Type

Location

Start Date

Abstract

Scholarly Commons Citation

Included in

Share

Conference Home

Links

Search

Browse