5. Results

5.1. Hypotheses 1 and 2

The hypothesis that achieving arc consistency using AC-3 is sufficient to solve the Zebra problem can be quickly disproved using the benchmark Zebra problem presented earlier in this paper. Once arc consistency has been achieved there still remain several variables with more than one possible value in its domain. Similar results can be seen for the Sherlock problem by applying AC-3 to a random case generated by the GenerateRandomProblem() procedure of the preceeding section.

One such sample problem appears in Appendix A

Kumar (in [Kumar92]) reports that arc consistency is not, in general, sufficient to find a solution (or that there is no solution). He then goes on to show that for a contraint graph having n nodes, it can be solved using only arc consistency, if the graph is n-consistent. He also shows that it is sometimes possible to solve a n-node bcsp with less than n-consistency.

5.2. Hypotheses 3 and 4

Given that arc consistency by itself cannot solve all binary constraint problems it follows that an algorithm such as BM-CBJ2 which can is better at solving the bcsp's. Having said that, comparing the performance of AC-3 to that of BM-CBJ2 for problems that AC-3 can solve is still an interesting exercise. Therefore, using problems generated using GenerateStrongKProblem(), we obtain the following results over 1000 test cases for each of the Zebra and Sherlock problems:

Table 3. AC-3 vs. BM-CBJ2: Zebra

MeasurementAC-3BM-CBJ2
Fewer Checks (number of cases)25975
Less Time (number of cases)277710
Average Number of Checks234103755
Minimum Number of Checks9876137
Maximum Number of Checks42086134394
Average Time (CPU + System) (seconds)1.771.72
Minimum Time (CPU + System) (seconds)1.571.39
Maximum Time (CPU + System) (seconds)1.985.24

Table 4. AC-3 vs. BM-CBJ2: Sherlock

MeasurementAC-3BM-CBJ2
Fewer Checks (number of cases)124876
Less Time (number of cases)417573
Average Number of Checks89090139170
Minimum Number of Checks33529294
Maximum Number of Checks16439161250399
Average Time (CPU + System) (seconds)2.164.67
Minimum Time (CPU + System) (seconds)1.831.50
Maximum Time (CPU + System) (seconds)2.561145.12

What is most interesting about the results is the fact that in a small number cases BM-CBJ2 is much worse than AC-3, so much so that for Sherlock most of the performance measures look worse for BM-CBJ2 than AC-3 despite the fact that it was better more than half the time. Exploring the properties of the problems that result in such bad performance is a useful future exercise.

5.3. Hypothesis 5

The hypothesis that BM-CBJ2 is more efficient at solving most of a large random sample of of Zebra problems is true, as can be seen below. Further, we can see that the algorithm tested can be ordered, both in terms of constraint checks and in terms of time, as follows: BM-CBJ2 < DoubleX < Xover < Mutate although DoubleX is only slightly better than Xover. The change to Xover to get DoubleX is not a clear winner, but crossover plus mutation is without a doubt better than mutation alone. The numerical results follow.

Table 5. Times one algorithm (row) bettered another (column)

AlgorithmBM-CBJ2MutateXoverDoubleX
ChecksTimeChecksTimeChecksTimeChecksTime
BM-CBJ2500500500500500500
Mutate0083917377
Xover00417408227223
DoubleX00427423273277

Table 6. Zebra, BM-CBJ2 vs Mutate vs Xover vs DoubleX

MeasurementBM-CBJ2MutateXoverDoubleX
Average Number of Checks35541258885538974293315161
Minimum Number of Checks412541110379620499423
Maximum Number of Checks15093132081622323868611212341720
Average Time (CPU + System) (seconds)2.1762.6522.0019.30
Minimum Time (CPU + System) (seconds)1.825.444.665.14
Maximum Time (CPU + System) (seconds)6.49686.841603.49986.12