Logo image
Using Code Metric Histograms and Genetic Algorithms to Perform Author Identification for Software Forensics
Conference proceeding

Using Code Metric Histograms and Genetic Algorithms to Perform Author Identification for Software Forensics

Robert Lange and Spiros Mancoridis
GECCO 2007 : Genetic and Evolutionary Computation Conference, July 7-11, 2007 University College London, London, UK, pp 2082-2089
01 Jan 2007

Abstract

Computer Science, Artificial Intelligence Computer Science, Software Engineering Science & Technology Computer Science Technology
We have developed a technique to characterize software developers styles using a set of source code metrics. This style fingerprint can be used to identify the likely author of a piece of code from a pool of candidates. Author identification has applications in criminal justice, corporate litigation, and plagiarism detection. Furthermore, we call identify candidate developers who share similar styles, making our technique useful For software maintenance as well. Our method involves measuring the differences in histogram distributions for code metrics. Identifying a combination of metrics that is effective in distinguishing developer styles is key to the utility of the technique. Our case study involves 18 metrics, and the. time involved in exhaustive searching of the problem space prevented Lis from adding additional metrics. Using a genetic algorithm to perform the search, we were able to find good metric combinations in hours as opposed to weeks. The genetic algorithm has enabled its to begin adding new metrics to our catalog of available metrics. This paper documents the results of our experiments in author identification for software forensics and outlines future directions of research to improve the utility of our method.

Metrics

7 Record Views
60 citations in Scopus

Details

Logo image