To check if this algorithm was really an improvement over the old one, we compared it with the theoretical bounds and the previous approach, which was also stepwise optimal. Being all stepwise optimal, there cannot be a lot of difference; all will play more or less in the same number of average guesses, but the number of evaluated combinations and the standard deviation for both will give the best approach. In fact, in the first paper [17] we compared Genetic Mastermind to a random search approach (like the one proposed, for instance, by Strobl [10] and Rosu [20].
Several runs were made for each N,L combination; in some cases, up to 1000, in other cases, only 100. Time was also measured on several computers: AMD K62 400 MHz and Pentium 166 running GNU/Linux RedHat 6.0.
The first test would be a reality check, that is, how the algorithm scores in the standard N=6,L=4 and N=8,L=5 (also called Super Mastermind) case. Results are shown in tables 1 and 2.


These results show that our Genetic Mastermind obtains results that are somewhat better than previous bound, or consistent with them, and that it evaluates only a part of the search space to arrive at those results. In the only comparison with another genetic Mastermind [18], GenMM outperforms it; but this is only due to the fact that Bento et al.'s algorithm does not play an optimal strategy.
Once it has been proved that GenMM performs correctly, some other tests was made to check the influence of the population size on the quality of the solution. Results are shown in figure 4
This results show that, since the main problem of this algorithm is to maintain diversity, using bigger populations results usually in slightly better results, or at least more consistent results: the number of combinations evaluated, and thus the total time taken to evaluate them, has a smaller standard deviation.
Finally, another test was made to check how the problem scalated with size; since GenMM is a general solution to the game of mastermind, it should be expected that results do not degradate with problem size. To check that, 100 runs were made for a problem of length 6 and 6 to 9 colors. Results of these runs are shown in figure 5
The most interesting result of this experiment is to find that, even as search space size grows exponentially, the number of combinations evaluated grows only linearly; the number of combinations has been fitted to a straigh line f(x)=ax+b with and ; the final sum of squares of residuals was 0.0386008.