The major challenges with big data examination and analysis are volume, complex interdependence across content, and heterogeneity. The examination and analysis phases are considered essential to a digital forensics process. However, traditional techniques for the forensic investigation use one or more forensic tools to examine and analyse each resource. In addition, when multiple resources are included in one case, there is an inability to cross-correlate findings which often leads to inefficiencies in processing and identifying evidence. Furthermore, most current forensics tools cannot cope with large volumes of data. This paper develops a novel framework for digital forensic analysis of heterogeneous big data. The framework mainly focuses upon the investigations of three core issues: data volume, heterogeneous data and the investigators cognitive load in understanding the relationships between artefacts. The proposed approach focuses upon the use of metadata to solve the data volume problem, semantic web ontologies to solve the heterogeneous data sources and artificial intelligence models to support the automated identification and correlation of artefacts to reduce the burden placed upon the investigator to understand the nature and relationship of the artefacts.


Allemang, D., & Hendler, J. (2011). Semantic web for the working ontologist: effective modeling in RDFS and OWL: Elsevier.

Alzaabi, M., Jones, A., & Martin, T. A. (2013). An ontology-based forensic analysis tool. Paper presented at the Proceedings of the Conference on Digital Forensics, Security and Law.

ALfahdi M 2016. Automated Digital Forensics & Computer Crime Profiling. Ph.D. thesis, Plymouth University.

Ayers, D. 2009. A second generation computer forensic analysis system. digital investigation, 6, S34-S42.

Benredjem, D. 2007. Contributions to cyber-forensics: processes and e-mail analysis. Concordia University.

Beebe, N. L., & Liu, L. (2014). Clustering digital forensic string search output. Digital Investigation, 11(4), 314-322.

Buchholz, F., & Spafford, E. (2004). On the role of file system metadata in digital forensics. Digital Investigation, 1(4), 298-309.

Case, A., Cristina, A., Marziale, L., Richard, G. G., & Roussev, V. (2008). FACE: Automated digital evidence discovery and correlation. Digital Investigation, 5, S65-S75.

Casey, E. (2011). Digital evidence and computer crime: Forensic science, computers, and the internet: Academic press.

Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile Networks and Applications, 19(2), 171-209.

Cheng, X., Hu, C., Li, Y., Lin, W., & Zuo, H. (2013). Data Evolution Analysis of Virtual DataSpace for Managing the Big Data Lifecycle. Paper presented at the Parallel and Distributed Processing Symposium Workshops & Ph.D. Forum (IPDPSW), 2013 IEEE 27th International.

da Cruz Nassif, L. F., & Hruschka, E. R. (2011). Document clustering for forensic computing: An approach for improving computer inspection. Paper presented at the Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on.

Dash, P., & Campus, C. (2014). Fast Processing of Large (Big) Forensics Data. Retrieved from http://www.idrbt.ac.in/PDFs/PT%20Reports/2014/Pritam%20Dash_Fast%20Processing%20of%20Large%20(Big)%20Forensics%20Data.pdf

Fensel, D., Bussler, C., Ding, Y., Kartseva, V., Klein, M., Korotkiy, M., . . . Siebes, R. (2002). Semantic web application areas. Paper presented at the NLDB Workshop.

Fisher, D., Brush, A., Hogan, B., Smith, M., & Jacobs, A. (2007). Using social metadata in email triage: Lessons from the field Human Interface and the Management of Information. Interacting in Information Environments (pp. 13-22): Springer.

Garfinkel, S. L. (2006). Forensic feature extraction and cross-drive analysis. Digital Investigation, 3, 71-81.

Gholap, P., & Maral, V. (2013). Information Retrieval of K-Means Clustering For Forensic Analysis. International Journal of Science and Research (IJSR).

Kataria, M., & Mittal, M. P. (2014). BIG DATA: A Review. International Journal of Computer Science and Mobile Computing, Vol.3( Issue.7), 106-110.

Khan, M. N. A. (2008). Digital Forensics using Machine Learning Methods. PhD thesis, University of Sussex, UK.

Li, H. & Lu, X. (2014) Challenges and Trends of Big Data Analytics. P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2014 Ninth International Conference on 2014. IEEE, 566-567.

Liu, Y., Liu, X., & Yang, L. (2010). Analysis and design of heterogeneous bioinformatics database integration system based on middleware. Paper presented at the Information Management and Engineering (ICIME), 2010 The 2nd IEEE International Conference on.

Mezghani, E., Exposito, E., Drira, K., Da Silveira, M., & Pruski, C. (2015). A Semantic Big Data Platform for Integrating Heterogeneous Wearable Data in Healthcare. Journal of Medical Systems, 39(12), 1-8.

Najafabadi, M. M., Villanustre, F., Khoshgoftaar, T. M., Seliya, N., Wald, R., & Muharemagic, E. (2015). Deep learning applications and challenges in big data analytics. Journal of Big Data, 2(1), 1-21.

Noel, G. E., & Peterson, G. L. (2014). Applicability of Latent Dirichlet Allocation to multi-disk search. Digital Investigation, 11(1), 43-56.

Palmer, G. (2001). A road map for digital forensic research. Paper presented at the First Digital Forensic Research Workshop, Utica, New York.

Patrascu, A., & Patriciu, V.-V. (2013). Beyond digital forensics. A cloud computing perspective over incident response and reporting. Paper presented at the Applied Computational Intelligence and Informatics (SACI), 2013 IEEE 8th International Symposium on.

Quick, D., & Choo, K.-K. R. (2014). Data reduction and data mining framework for digital forensic evidence: storage, intelligence, review and archive. Trends & Issues in Crime and Criminal Justice, 480, 1-11.

Raghavan, S. (2014). A framework for identifying associations in digital evidence using metadata.

Raghavan, S., Clark, A., & Mohay, G. (2009). FIA: an open forensic integration architecture for composing digital evidence Forensics in telecommunications, information and multimedia (pp. 83-94): Springer.

Raghavan, S., & Raghavan, S. (2014). Eliciting file relationships using metadata based associations for digital forensics. CSI transactions on ICT, 2(1), 49-64.

Roussev, V., & Quates, C. (2012). Content triage with similarity digests: the M57 case study. Digital Investigation, 9, S60-S68.

Rowe, N. C. (2014). Identifying forensically uninteresting files using a large corpus Digital Forensics and Cyber Crime (pp. 86-101): Springer.

Rowe, N. C., & Garfinkel, S. L. (2012). Finding anomalous and suspicious files from directory metadata on a large corpus Digital Forensics and Cyber Crime (pp. 115-130): Springer.

Ruback, M., Hoelz, B., & Ralha, C. (2012). A new approach for creating forensic hashsets Advances in Digital Forensics VIII (pp. 83-97): Springer.

Shang, W., Jiang, Z. M., Hemmati, H., Adams, B., Hassan, A. E., & Martin, P. (2013). Assisting developers of big data analytics applications when deploying on hadoop clouds. Paper presented at the Proceedings of the 2013 International Conference on Software Engineering.

Sharma, A., Chaudhary, B., & Gore, M. (2008). Metadata Extraction from Semi-structured Email Documents. Paper presented at the Computing in the Global Information Technology, 2008. ICCGI'08. The Third International Multi-Conference on.

Vaarandi, R. (2005). Tools and Techniques for Event Log Analysis: Tallinn University of Technology Press.

Xu, X., YANG, Z.-q., XIU, J.-p., & Chen, L. (2013). A big data acquisition engine based on rule engine. The Journal of China Universities of Posts and Telecommunications, 20, 45-49.

Zhenyou, Z., Jingjing, Z., Shu, L., & Zhi, C. (2011). Research on the integration and query optimization for the distributed heterogeneous database. Paper presented at the Computer Science and Network Technology (ICCSNT), 2011 International Conference on.

Zuech, R., Khoshgoftaar, T. M., & Wald, R. (2015). Intrusion detection and Big Heterogeneous Data: a Survey. Journal of Big Data, 2(1), 1-41.





To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.