Friday, May 06, 2011

[TECH] Enumerating subsets with backtracking and some interesting observations on the backtracking enumeration tree.

During my current research on enumerating all the common Hamming neighborhood of a set of strings. I needed an algorithm which can enumerate seven disjoint subsets from a given set. The fundamental building block in that is an algorithm which enumerates all the subsets of size $r$ from a given set of size $n$. I'm very much aware of the standard recursive algorithm which uses the fact ${n \choose r} = {n-1 \choose r} + {n-1 \choose r-1}$. However I wanted a much efficient version because I need to use this several times and cannot afford the constants in stack based implementations. So I wrote the following to backtracking based algorithm to enumerate the subsets. Once I did that I started observing some very interesting patterns in the number of leaves for each internal backtrack node at level $r-1$.

/*Back tracking*/ 
enum_count_t EnumSubSetsPrint(char *set, size_t n, size_t r){ 
    size_t state_idx_data[r+1], i, level; 
    size_t *state_idx = state_idx_data; 
    enum_count_t n_choose_r=0;  

    set--; 
    state_idx--; 
    for(i=1; i<=r+1; i++){ /*initialize*/ 
        state_idx[i] = 0; 
    } 

    level = 1; 
    while(level){ 
        if(level == r+1){
            /*back track*/ 
            for(i=1; i<=r; i++){ 
                printf("%c ", set[state_idx[i]]); 
            } 
            printf("\n"); 
            n_choose_r++; 
            level--; 
        }else if(state_idx[level]+r-level > n-1){ 
            level--; 
        }else{  
            /*track forward*/ 
            state_idx[level]++; /*pick a item*/ 
            state_idx[level+1] = state_idx[level]; 
            level++; 
        } 
    } 
    return n_choose_r; 
} 

By looking at these patterns (at the end of this paragraph) I came up with the following interesting identities for $n \choose 3$ and $n \choose 4$.
IDENTITY-1: ${n\choose 3} = \sum_{i=1}^{n-2} (n-i)(i-1)$
IDENTITY-2: ${n\choose 4} = \sum_{j=1}^{n-3}\sum_{i=1}^{n-2-j} (i)(j) $
I'm actually printing the number of leaves under an internal node at level $r-1$ in the backtracking tree. I'm observing the following pattern.

=================
8 3 0

6 5 4 3 2 1 
5 4 3 2 1 
4 3 2 1 
3 2 1 
2 1 
1 
(8 \choose 3) = 56 

Please enter n and r < n and print_option
 (e.g. 15 14 1 or 15 14 0)
9 3 0

7 6 5 4 3 2 1 
6 5 4 3 2 1 
5 4 3 2 1 
4 3 2 1 
3 2 1 
2 1 
1 
(9 \choose 3) = 84 

Please enter n and r < n and print_option
 (e.g. 15 14 1 or 15 14 0)
10 3 0

8 7 6 5 4 3 2 1 
7 6 5 4 3 2 1 
6 5 4 3 2 1 
5 4 3 2 1 
4 3 2 1 
3 2 1 
2 1 
1 
(10 \choose 3) = 120 
Please enter n and r < n and print_option
 (e.g. 15 14 1 or 15 14 0)
11 3 0

9 8 7 6 5 4 3 2 1 
8 7 6 5 4 3 2 1 
7 6 5 4 3 2 1 
6 5 4 3 2 1 
5 4 3 2 1 
4 3 2 1 
3 2 1 
2 1 
1 
===============================

The following are for $n \choose 4$ for various values of $n$.
===============================
8 4 0

5 4 3 2 1 
4 3 2 1 
3 2 1 
2 1 
1 
4 3 2 1 
3 2 1 
2 1 
1 
3 2 1 
2 1 
1 
2 1 
1 
1 
(8 \choose 4) = 70 

Please enter n and r < n and print_option
 (e.g. 15 14 1 or 15 14 0)
9 4 0

6 5 4 3 2 1 
5 4 3 2 1 
4 3 2 1 
3 2 1 
2 1 
1 
5 4 3 2 1 
4 3 2 1 
3 2 1 
2 1 
1 
4 3 2 1 
3 2 1 
2 1 
1 
3 2 1 
2 1 
1 
2 1 
1 
1 

===============================

Play around with this program Enumerate.c , Enumerate.h.

Finally I enabled $LaTeX$ setting for the blog.

No comments: