next up previous
Next: Conclusions Up: Finding a needle in Previous: Algorithm


Results

To check if this algorithm was really an improvement over the old one, we compared it with the theoretical bounds and the previous approach, which was also stepwise optimal. Being all stepwise optimal, there cannot be a lot of difference; all will play more or less in the same number of average guesses, but the number of evaluated combinations and the standard deviation for both will give the best approach. In fact, in the first paper [17] we compared Genetic Mastermind to a random search approach (like the one proposed, for instance, by Strobl [10] and Rosu [20].

Several runs were made for each N,L combination; in some cases, up to 1000, in other cases, only 100. Time was also measured on several computers: AMD K6-2 400 MHz and Pentium 166 running GNU/Linux RedHat 6.0.

The first test would be a reality check, that is, how the algorithm scores in the standard N=6,L=4 and N=8,L=5 (also called Super Mastermind) case. Results are shown in tables 1 and 2.


Table: Results in the standard case, 6 colors and 4 positions. Our results (labeled GenMM, for genetic mastermind) are consistent with the best results found, and averages are lower than in Koyama's and Knuth's case. It must be taken into account that these strategies are not stepwise optimal. It is also consistent with Bestavros' result. Our algorithm obtained better results by playing as first guess 1122, as suggested by Knuth [1] (GenMM (1)); results when playing 1234 first were not as good, from the number of guesses or the number of evaluated combinatiosn point of view. GenMM was run 1000 times, with a population of 100.
Author Avg. guesses Avg. Combinations
Knuth [1] 4.478 All
Koyama [9] 4.340 All
Bestavros [14] 3.8 $\pm$ 0.5 All
Rosu [20] 4.66 Random search
GenMM (1) 4.132 $\pm$ 1.00 279 $\pm$ 188
GenMM (2) 4.312 $\pm$ 0.989 320 $\pm$ 268


Table: Results in the ``super'' case, 6 colors and 4 positions. Our results (labeled GenMM, for genetic mastermind) are consistent with the best results found (Rosu, but no standard deviation is mentioned and is thus difficult to compare). The number of evaluated combinations is also consistent with Bento's, but our algorithm achieves a solution in less moves. GenMM was run 500 times, with a population of 400.
Author Avg. guesses Avg. Combinations
Rosu [20] 5.88 Random search
Bento et al. [18] 6.866 1029.9
GenMM 5.904 $\pm$ 0.975 2171 $\pm$ 1268

These results show that our Genetic Mastermind obtains results that are somewhat better than previous bound, or consistent with them, and that it evaluates only a part of the search space to arrive at those results. In the only comparison with another genetic Mastermind [18], GenMM outperforms it; but this is only due to the fact that Bento et al.'s algorithm does not play an optimal strategy.

Once it has been proved that GenMM performs correctly, some other tests was made to check the influence of the population size on the quality of the solution. Results are shown in figure 4

Figure: The top graph shows the average number of guesses made for different population sizes, and the bottom graph the average number of combinations evaluated. Averages are substantially similar independently of the population size, however, a bigger population means a lower standard deviation, which at the same time means that the range of possible times the algorithm will take is reduced; that is why, for problems of a certain size, a bigger population is preferred. In all cases, the problem evaluated was N=6,L=6.
\begin{figure}
\vspace{0.3cm}
{ \psfig{figure=figpopa.eps,width=3.5in,height=3in...
...re=figpopb.eps,width=3.5in,height=3in}}
\vspace{1in}
\vspace{0.3cm}
\end{figure}

This results show that, since the main problem of this algorithm is to maintain diversity, using bigger populations results usually in slightly better results, or at least more consistent results: the number of combinations evaluated, and thus the total time taken to evaluate them, has a smaller standard deviation.

Finally, another test was made to check how the problem scalated with size; since GenMM is a general solution to the game of mastermind, it should be expected that results do not degradate with problem size. To check that, 100 runs were made for a problem of length 6 and 6 to 9 colors. Results of these runs are shown in figure 5

Figure: The top graph shows the average number of guesses made for different number of colors, and the bottom graph the average number of combinations evaluated together with a fitting to a straigh line and search space size in a logarithmic plot. In all cases, population was 400 and the combination length 6. Obviously, figures go up with search space size, but the number of combinations evaluated increases less steeply than search space size.
\begin{figure}
\vspace{0.3cm}
{ \psfig{figure=colorsavg.eps,width=3.5in,height=3...
...orscombavg.eps,width=3.5in,height=3in}}
\vspace{1in}
\vspace{0.3cm}
\end{figure}

The most interesting result of this experiment is to find that, even as search space size grows exponentially, the number of combinations evaluated grows only linearly; the number of combinations has been fitted to a straigh line f(x)=ax+b with $a = 1504.22 \pm 189.1$ and $b =1519.26 \pm 143.8$; the final sum of squares of residuals was 0.0386008.


next up previous
Next: Conclusions Up: Finding a needle in Previous: Algorithm

1999-06-23