Prior Publisher
The Association of Digital Forensics, Security and Law (ADFSL)
Abstract
In modern world, the use of digital devices for leisure or professional reasons is growing quickly; nevertheless, criminals try to fool authorities and hide evidence in a computer by changing the file type. File type detection is a very demanding task for a digital forensic examiner. In this paper, a new methodology is proposed – in a digital forensics perspective- to identify altered file types with high accuracy by employing computational intelligence techniques. The proposed methodology is applied to the three most common image file types (jpg, png and gif) as well as to uncompressed tiff images. A three-stage process involving feature extraction (Byte Frequency Distribution), feature selection (genetic algorithm) and classification (neural network) is proposed. Experimental results were conducted having files altered in a digital forensics perspective and the results are presented. The proposed model shows very high and exceptional accuracy in file type identification.
References
Ahmed, I., Lhee, K., Shin, H., & Hong, M. (2010). Content-based File-type Identification Using Cosine Similarity and a Divide-and-Conquer Approach. IETE Technical Review, 27(6), 465. https://doi.org/10.4103/0256-4602.67149
Ahmed, I., Lhee, K., Shin, H., & Hong, M. (2011). Fast content-based file-type identification. In 7th Annual IFIP WG 11.9 International Conference on Digital Forensics (pp. 65–75). Springer Boston. https://doi.org/10.1007/978-3-642-24212- 0_5
Amirani, M. C., Toorani, M., & Mihandoost, S. (2013). Feature-based Type Identification of File Fragments. Security and Communication Networks, 6(1), 115– 128. https://doi.org/10.1002/sec.553
Amirani, M. C., Toorani, M., & Shirazi, a. a B. (2008). A new approach to content-based file type detection. In IEEE Symposium on Computers and Communications (pp. 1103–1108). https://doi.org/10.1109/ISCC.2008.4625611
Cao, D., Luo, J., Yin, M., & Yang, H. (2010). Feature selection based file type identification algorithm. In 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems (Vol. 3, pp. 58–62). IEEE. https://doi.org/10.1109/ICICISYS.2010.56 58559
CoolUtils. (2017). Powerful Image Converter Yet Easy-to-use. Retrieved from https://www.coolutils.com/TotalImageCon verterDunham, J., Sun, M., & Tseng, J. (2005). Classifying file type of stream ciphers in depth using neural networks. In The 3rd ACS/IEEE International Conference on Computer Systems and Applications. Retrieved from http://ieeexplore.ieee.org/xpls/abs_all.jsp? arnumber=1387088
Evensen, J. D., Lindahl, S., & Goodwin, M. (2014). File-type Detection Using Naïve Bayes and n-gram Analysis. Norwegian Information Security Conference, NISK. Fredrikstad. Retrieved from http://ojs.bibsys.no/index.php/NISK/articl e/view/99
Fei-Fei, L., Fergus, R., & Perona, P. (2007). Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 106(1), 59–70. https://doi.org/10.1016/j.cviu.2005.09.012
Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Retrieved from http://dl.acm.org/citation.cfm?id=534133
Hall, M. (1999). Correlation-based feature selection for machine learning. The University of Waicato. Retrieved from http://www.cs.waikato.ac.nz/~mhall/thesis .pdf
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software. ACM SIGKDD Explorations Newsletter, 11(1), 10. https://doi.org/10.1145/1656274.1656278
Harris, R. (2007). Using artificial neural networks for forensic file type identification. Master’s Thesis, Purdue University. Retrieved from https://www.cerias.purdue.edu/assets/pdf/ bibtex_archive/bibtex_archive/2007-19.ps
Jourdan, L., Dhaenens, C., & Talbi, E. (2001). A genetic algorithm for feature selection in data-mining for genetics. In Proceedings of the 4th Metaheuristics International Conference. Retrieved from ftp://155.253.6.100/acalabria/PhD/Materi ale/MachineLearning/Jourdan 2001 - A Genetic Algorithm for Feature Selection in Data-Mining for Genetics.pdf
Karampidis, K., Papadourakis, G., & Deligiannis, I. (2015). File Type Identification -A Literature Review. In 9th International Conference on New Horizons in Industry Business and Education, NHIBE 2015 (p. 141). Skiathos, Greece: 9th International Conference on New Horizons in Industry Business and Education. Retrieved from http://nhibe2015.vsnet.eu/proceedings/papers/3_15_[P]0076.p df
Kessler, G. (2015). File Signatures. Retrieved October 26, 2015, from http://www.garykessler.net/library/file_si gs.html
Kohavi, R. (1995). A Study of CrossValidation and Bootstrap for Accuracy Estimation and Model Selection. International Joint Conference on Artificial Intelligence, 14(12), 1137–1143. https://doi.org/10.1067/mod.2000.109031
Li, W. J., Wang, K., Stolfo, S. J., & Herzog, B. (2005). Fileprints: Identifying file types by n-gram analysis. Proceedings from the 6th Annual IEEE System, Man and Cybernetics Information Assurance Workshop, SMC 2005, 2005(June), 64–71. https://doi.org/10.1109/IAW.2005.1495935
McDaniel, M. (2001). Automatic File Type Detection Algorithm. James Madison University.
McDaniel, M., & Heydari, M. H. (2003). Content based file type detection algorithms. 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the. https://doi.org/10.1109/HICSS.2003.11749 05
Meghanathan, Boumerdassi, S., Chaki, N., & Dhinaharan Nagamalai. (2010). Recent Trends in Network Security and Applications (Vol. 89, pp. 253–262). Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-14478-3
NCH Software. (2017). Convert Between All Popular Image Formats with Pixillion. Retrieved from http://www.nchsoftware.com/imageconvert er/
Palmer, G. (2001). A Road Map for Digital Forensic Research. Proceedings of the 2001 Digital Forensics Research Workshop (DFRWS 2004). https://doi.org/10.1111/j.1365- 2656.2005.01025.x
T.E.I of Crete. (2015). E-Thesis. Retrieved October 26, 2015, from http://nefeli.lib.teicrete.gr/search/
The MathWorks Inc. (2016). MATLAB. Natick, Massachusetts: The MathWorks Inc. Retrieved from http://www.mathworks.com/
Vafaie, H., & Jong, K. De. (1992). Genetic Algorithms as a Tool for Feature Selection in Machine Learning. In International Conference on Tools with AI (pp. 200– 203). https://doi.org/10.1109/TAI.1992.246402
Zhuo Li, Zheng Jing, Wang Fang, Li Xia, Ai Bin, Q. J. (2008). A genetic algorithm based wrapper feature selection method for classification of hyper spectral data using support vector machine. GEOGRAPHICAL RESEARCH, 27(3), 493–501. https://doi.org/10.11821/yj2008030002
Recommended Citation
Karampidis, Konstantinos and Papadourakis, Giorgos
(2017)
"File Type Identification - Computational Intelligence for Digital Forensics,"
Journal of Digital Forensics, Security and Law: Vol. 12
, Article 6.
DOI: https://doi.org/10.15394/jdfsl.2017.1472
Available at:
https://commons.erau.edu/jdfsl/vol12/iss2/6