## Friday, May 06, 2011

### [TECH] Enumerating subsets with backtracking and some interesting observations on the backtracking enumeration tree.

During my current research on enumerating all the common Hamming neighborhood of a set of strings. I needed an algorithm which can enumerate seven disjoint subsets from a given set. The fundamental building block in that is an algorithm which enumerates all the subsets of size $r$ from a given set of size $n$. I'm very much aware of the standard recursive algorithm which uses the fact ${n \choose r} = {n-1 \choose r} + {n-1 \choose r-1}$. However I wanted a much efficient version because I need to use this several times and cannot afford the constants in stack based implementations. So I wrote the following to backtracking based algorithm to enumerate the subsets. Once I did that I started observing some very interesting patterns in the number of leaves for each internal backtrack node at level $r-1$.

/*Back tracking*/
enum_count_t EnumSubSetsPrint(char *set, size_t n, size_t r){
size_t state_idx_data[r+1], i, level;
size_t *state_idx = state_idx_data;
enum_count_t n_choose_r=0;

set--;
state_idx--;
for(i=1; i<=r+1; i++){ /*initialize*/
state_idx[i] = 0;
}

level = 1;
while(level){
if(level == r+1){
/*back track*/
for(i=1; i<=r; i++){
printf("%c ", set[state_idx[i]]);
}
printf("\n");
n_choose_r++;
level--;
}else if(state_idx[level]+r-level > n-1){
level--;
}else{
/*track forward*/
state_idx[level]++; /*pick a item*/
state_idx[level+1] = state_idx[level];
level++;
}
}
return n_choose_r;
}

By looking at these patterns (at the end of this paragraph) I came up with the following interesting identities for $n \choose 3$ and $n \choose 4$.
IDENTITY-1: ${n\choose 3} = \sum_{i=1}^{n-2} (n-i)(i-1)$
IDENTITY-2: ${n\choose 4} = \sum_{j=1}^{n-3}\sum_{i=1}^{n-2-j} (i)(j)$
I'm actually printing the number of leaves under an internal node at level $r-1$ in the backtracking tree. I'm observing the following pattern.

=================
8 3 0

6 5 4 3 2 1
5 4 3 2 1
4 3 2 1
3 2 1
2 1
1
(8 \choose 3) = 56

Please enter n and r < n and print_option
(e.g. 15 14 1 or 15 14 0)
9 3 0

7 6 5 4 3 2 1
6 5 4 3 2 1
5 4 3 2 1
4 3 2 1
3 2 1
2 1
1
(9 \choose 3) = 84

Please enter n and r < n and print_option
(e.g. 15 14 1 or 15 14 0)
10 3 0

8 7 6 5 4 3 2 1
7 6 5 4 3 2 1
6 5 4 3 2 1
5 4 3 2 1
4 3 2 1
3 2 1
2 1
1
(10 \choose 3) = 120
Please enter n and r < n and print_option
(e.g. 15 14 1 or 15 14 0)
11 3 0

9 8 7 6 5 4 3 2 1
8 7 6 5 4 3 2 1
7 6 5 4 3 2 1
6 5 4 3 2 1
5 4 3 2 1
4 3 2 1
3 2 1
2 1
1
===============================

The following are for $n \choose 4$ for various values of $n$.
===============================
8 4 0

5 4 3 2 1
4 3 2 1
3 2 1
2 1
1
4 3 2 1
3 2 1
2 1
1
3 2 1
2 1
1
2 1
1
1
(8 \choose 4) = 70

Please enter n and r < n and print_option
(e.g. 15 14 1 or 15 14 0)
9 4 0

6 5 4 3 2 1
5 4 3 2 1
4 3 2 1
3 2 1
2 1
1
5 4 3 2 1
4 3 2 1
3 2 1
2 1
1
4 3 2 1
3 2 1
2 1
1
3 2 1
2 1
1
2 1
1
1

===============================

Play around with this program Enumerate.c , Enumerate.h.

Finally I enabled $LaTeX$ setting for the blog.