Drives found during investigations often have useful information in the form of email addresses which can be acquired by search in the raw drive data independent of the file system. Using this data we can build a picture of the social networks that a drive owner participated in, even perhaps better than investigating their online profiles maintained by social-networking services because drives contain much data that users have not approved for public display. However, many addresses found on drives are not forensically interesting, such as sales and support links. We developed a program to filter these out using a Naïve Bayes classifier and eliminated 73.3% of the addresses from a representative corpus. We show that the byte-offset proximity of the remaining addresses found on a drive, their word similarity, and their number of co-occurrences over a corpus are good measures of association of addresses, and we built graphs using this data of the interconnections both between addresses and between drives. Results provided several new insights into our test data.


