The Association of Digital Forensics, Security and Law (ADFSL)


In modern world, the use of digital devices for leisure or professional reasons is growing quickly; nevertheless, criminals try to fool authorities and hide evidence in a computer by changing the file type. File type detection is a very demanding task for a digital forensic examiner. In this paper, a new methodology is proposed – in a digital forensics perspective- to identify altered file types with high accuracy by employing computational intelligence techniques. The proposed methodology is applied to the three most common image file types (jpg, png and gif) as well as to uncompressed tiff images. A three-stage process involving feature extraction (Byte Frequency Distribution), feature selection (genetic algorithm) and classification (neural network) is proposed. Experimental results were conducted having files altered in a digital forensics perspective and the results are presented. The proposed model shows very high and exceptional accuracy in file type identification.


Ahmed, I., Lhee, K., Shin, H., & Hong, M. (2010). Content-based File-type Identification Using Cosine Similarity and a Divide-and-Conquer Approach. IETE Technical Review, 27(6), 465. https://doi.org/10.4103/0256-4602.67149

Ahmed, I., Lhee, K., Shin, H., & Hong, M. (2011). Fast content-based file-type identification. In 7th Annual IFIP WG 11.9 International Conference on Digital Forensics (pp. 65–75). Springer Boston. https://doi.org/10.1007/978-3-642-24212- 0_5

Amirani, M. C., Toorani, M., & Mihandoost, S. (2013). Feature-based Type Identification of File Fragments. Security and Communication Networks, 6(1), 115– 128. https://doi.org/10.1002/sec.553

Amirani, M. C., Toorani, M., & Shirazi, a. a B. (2008). A new approach to content-based file type detection. In IEEE Symposium on Computers and Communications (pp. 1103–1108). https://doi.org/10.1109/ISCC.2008.4625611

Cao, D., Luo, J., Yin, M., & Yang, H. (2010). Feature selection based file type identification algorithm. In 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems (Vol. 3, pp. 58–62). IEEE. https://doi.org/10.1109/ICICISYS.2010.56 58559

CoolUtils. (2017). Powerful Image Converter Yet Easy-to-use. Retrieved from https://www.coolutils.com/TotalImageCon verterDunham, J., Sun, M., & Tseng, J. (2005). Classifying file type of stream ciphers in depth using neural networks. In The 3rd ACS/IEEE International Conference on Computer Systems and Applications. Retrieved from http://ieeexplore.ieee.org/xpls/abs_all.jsp? arnumber=1387088

Evensen, J. D., Lindahl, S., & Goodwin, M. (2014). File-type Detection Using Naïve Bayes and n-gram Analysis. Norwegian Information Security Conference, NISK. Fredrikstad. Retrieved from http://ojs.bibsys.no/index.php/NISK/articl e/view/99

Fei-Fei, L., Fergus, R., & Perona, P. (2007). Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 106(1), 59–70. https://doi.org/10.1016/j.cviu.2005.09.012

Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Retrieved from http://dl.acm.org/citation.cfm?id=534133

Hall, M. (1999). Correlation-based feature selection for machine learning. The University of Waicato. Retrieved from http://www.cs.waikato.ac.nz/~mhall/thesis .pdf

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software. ACM SIGKDD Explorations Newsletter, 11(1), 10. https://doi.org/10.1145/1656274.1656278

Harris, R. (2007). Using artificial neural networks for forensic file type identification. Master’s Thesis, Purdue University. Retrieved from https://www.cerias.purdue.edu/assets/pdf/ bibtex_archive/bibtex_archive/2007-19.ps

Jourdan, L., Dhaenens, C., & Talbi, E. (2001). A genetic algorithm for feature selection in data-mining for genetics. In Proceedings of the 4th Metaheuristics International Conference. Retrieved from ale/MachineLearning/Jourdan 2001 - A Genetic Algorithm for Feature Selection in Data-Mining for Genetics.pdf

Karampidis, K., Papadourakis, G., & Deligiannis, I. (2015). File Type Identification -A Literature Review. In 9th International Conference on New Horizons in Industry Business and Education, NHIBE 2015 (p. 141). Skiathos, Greece: 9th International Conference on New Horizons in Industry Business and Education. Retrieved from http://nhibe2015.vsnet.eu/proceedings/papers/3_15_[P]0076.p df

Kessler, G. (2015). File Signatures. Retrieved October 26, 2015, from http://www.garykessler.net/library/file_si gs.html

Kohavi, R. (1995). A Study of CrossValidation and Bootstrap for Accuracy Estimation and Model Selection. International Joint Conference on Artificial Intelligence, 14(12), 1137–1143. https://doi.org/10.1067/mod.2000.109031

Li, W. J., Wang, K., Stolfo, S. J., & Herzog, B. (2005). Fileprints: Identifying file types by n-gram analysis. Proceedings from the 6th Annual IEEE System, Man and Cybernetics Information Assurance Workshop, SMC 2005, 2005(June), 64–71. https://doi.org/10.1109/IAW.2005.1495935

McDaniel, M. (2001). Automatic File Type Detection Algorithm. James Madison University.

McDaniel, M., & Heydari, M. H. (2003). Content based file type detection algorithms. 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the. https://doi.org/10.1109/HICSS.2003.11749 05

Meghanathan, Boumerdassi, S., Chaki, N., & Dhinaharan Nagamalai. (2010). Recent Trends in Network Security and Applications (Vol. 89, pp. 253–262). Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-14478-3

NCH Software. (2017). Convert Between All Popular Image Formats with Pixillion. Retrieved from http://www.nchsoftware.com/imageconvert er/

Palmer, G. (2001). A Road Map for Digital Forensic Research. Proceedings of the 2001 Digital Forensics Research Workshop (DFRWS 2004). https://doi.org/10.1111/j.1365- 2656.2005.01025.x

T.E.I of Crete. (2015). E-Thesis. Retrieved October 26, 2015, from http://nefeli.lib.teicrete.gr/search/

The MathWorks Inc. (2016). MATLAB. Natick, Massachusetts: The MathWorks Inc. Retrieved from http://www.mathworks.com/

Vafaie, H., & Jong, K. De. (1992). Genetic Algorithms as a Tool for Feature Selection in Machine Learning. In International Conference on Tools with AI (pp. 200– 203). https://doi.org/10.1109/TAI.1992.246402

Zhuo Li, Zheng Jing, Wang Fang, Li Xia, Ai Bin, Q. J. (2008). A genetic algorithm based wrapper feature selection method for classification of hyper spectral data using support vector machine. GEOGRAPHICAL RESEARCH, 27(3), 493–501. https://doi.org/10.11821/yj2008030002



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.