Saturday, May 14, 2011

[TECH] Enumerating set partitions with $DLX$

I have been a great fan of Dr. Knuth's paper about Dancing Links (Algorithm X) . In that paper Don suggests how the combinatorial search using backtracking can be improved in practice with the original idea from Hitotumatu and Noshita. One strange thing about Dancing Links is that its not Don's original idea, however Don made it popular by writing a paper about it. Don used the $DLX$ idea to solve the exact cover (matrix cover) problem and he gave several examples (e.g. Covering an rectilinear area with Pentominos) where he certain combinatorial problems to exact cover.

I have been waiting to find a problem where I get to apply $DLX$ without reducing it to exact cover problem. I'm really happy now that simultaneous hamming neighborhood problem (given a set $S=\{s_1,s_2\ldots s_n\},\, s_i\in \Sigma^{l}$ of strings compute the set $N_d(S) = \{ t | hamming(s_i,t)\leq d ,\forall s_i \in S\}$) which I have been working lately can be solved by applying $DLX$. As I mentioned in my earlier post the algorithm to compute $N_d(S)$ needs solve a set partition problem -- which further needs an algorithm to enumerate all the subsets of a given size $r$. In my previous post we have seen how to compute $n \choose r$ efficiently with backtracking. I'm now going use $DLX$ to solve the same problem. Notice that $DLX$ is an overkill to this simple problem, however it really makes sense if we look it from the perspective of the partitioning problem -- which is what we want to solve. $DLX$ just plays the role of UNDO operation see below.

/*Implementation of $n \choose r$ using $DLX$*/
enum_count_t EnumSubSetsCountDLX(size_t n, size_t r){
 enum_count_t n_choose_r=0;
 DLXState *dlx_sentinal = GetEnumSubSetsDLXState(n); /*Initialize the DLXState*/
 DLXState *sidx_dat[r+1], **state_idx=sidx_dat; /*item picked in the state*/
 /*total # of items picked in a given state*/
 size_t scidx_dat[r+1], *state_count_idx=scidx_dat; 
 size_t level;

 state_count_idx--; state_idx--; /*1-indexing*/
 level=1; state_count_idx[level]=0; state_idx[level] = dlx_sentinal;
 while(level){
  if(level == r+1){
   n_choose_r++;
   level--; 
   if(level) Dance(state_idx[level]);
  }else if(state_count_idx[level]+r-level > n-1){ /*state_idx[level]*/
   level--; 
   if(level) Dance(state_idx[level]);
  }else{
   state_idx[level] = state_idx[level]->right; 
   Break(state_idx[level]);
   state_count_idx[level]++; 
   /*track forward*/
   state_count_idx[level+1] = state_count_idx[level]; 
   state_idx[level+1] = state_idx[level];
   level++;
  }
 }
 CleanDLXState(dlx_sentinal); /*cleanup*/
 return n_choose_r;
}

/*Break and Dance operations*/
void Break(DLXState *state){
 state->left->right = state->right;
 state->right->left = state->left;
}
void Dance(DLXState *state){
 state->left->right = state;
 state->right->left = state;
}

/*Initializes a new DLX state required to enumerate $n \choose r$*/
EnumSubSetState* GetNewEnumSStateWithDLX(DLXState *dlx_sentinal, 
 size_t n, size_t r){

 EnumSubSetState *estate = malloc(sizeof(EnumSubSetState));
 assert(estate);
 estate->dlx_sentinal = dlx_sentinal;

 estate->state_count_idx = malloc(sizeof(size_t)*(r+1));
 estate->state_idx = malloc(sizeof(DLXState *)*(r+1));
 assert(estate->state_count_idx && estate->state_idx);

 (estate->state_count_idx)--; (estate->state_idx)--;

 estate->level=1; estate->state_count_idx[1]=0; 
 estate->state_idx[1] = estate->dlx_sentinal;
 estate->n = n; estate->r = r;
 return estate;

}



Next I'll post the most interesting part which is to enumerate the partitions using $DLX$.

Friday, May 06, 2011

[TECH] Enumerating subsets with backtracking and some interesting observations on the backtracking enumeration tree.

During my current research on enumerating all the common Hamming neighborhood of a set of strings. I needed an algorithm which can enumerate seven disjoint subsets from a given set. The fundamental building block in that is an algorithm which enumerates all the subsets of size $r$ from a given set of size $n$. I'm very much aware of the standard recursive algorithm which uses the fact ${n \choose r} = {n-1 \choose r} + {n-1 \choose r-1}$. However I wanted a much efficient version because I need to use this several times and cannot afford the constants in stack based implementations. So I wrote the following to backtracking based algorithm to enumerate the subsets. Once I did that I started observing some very interesting patterns in the number of leaves for each internal backtrack node at level $r-1$.

/*Back tracking*/ 
enum_count_t EnumSubSetsPrint(char *set, size_t n, size_t r){ 
    size_t state_idx_data[r+1], i, level; 
    size_t *state_idx = state_idx_data; 
    enum_count_t n_choose_r=0;  

    set--; 
    state_idx--; 
    for(i=1; i<=r+1; i++){ /*initialize*/ 
        state_idx[i] = 0; 
    } 

    level = 1; 
    while(level){ 
        if(level == r+1){
            /*back track*/ 
            for(i=1; i<=r; i++){ 
                printf("%c ", set[state_idx[i]]); 
            } 
            printf("\n"); 
            n_choose_r++; 
            level--; 
        }else if(state_idx[level]+r-level > n-1){ 
            level--; 
        }else{  
            /*track forward*/ 
            state_idx[level]++; /*pick a item*/ 
            state_idx[level+1] = state_idx[level]; 
            level++; 
        } 
    } 
    return n_choose_r; 
} 

By looking at these patterns (at the end of this paragraph) I came up with the following interesting identities for $n \choose 3$ and $n \choose 4$.
IDENTITY-1: ${n\choose 3} = \sum_{i=1}^{n-2} (n-i)(i-1)$
IDENTITY-2: ${n\choose 4} = \sum_{j=1}^{n-3}\sum_{i=1}^{n-2-j} (i)(j) $
I'm actually printing the number of leaves under an internal node at level $r-1$ in the backtracking tree. I'm observing the following pattern.

=================
8 3 0

6 5 4 3 2 1 
5 4 3 2 1 
4 3 2 1 
3 2 1 
2 1 
1 
(8 \choose 3) = 56 

Please enter n and r < n and print_option
 (e.g. 15 14 1 or 15 14 0)
9 3 0

7 6 5 4 3 2 1 
6 5 4 3 2 1 
5 4 3 2 1 
4 3 2 1 
3 2 1 
2 1 
1 
(9 \choose 3) = 84 

Please enter n and r < n and print_option
 (e.g. 15 14 1 or 15 14 0)
10 3 0

8 7 6 5 4 3 2 1 
7 6 5 4 3 2 1 
6 5 4 3 2 1 
5 4 3 2 1 
4 3 2 1 
3 2 1 
2 1 
1 
(10 \choose 3) = 120 
Please enter n and r < n and print_option
 (e.g. 15 14 1 or 15 14 0)
11 3 0

9 8 7 6 5 4 3 2 1 
8 7 6 5 4 3 2 1 
7 6 5 4 3 2 1 
6 5 4 3 2 1 
5 4 3 2 1 
4 3 2 1 
3 2 1 
2 1 
1 
===============================

The following are for $n \choose 4$ for various values of $n$.
===============================
8 4 0

5 4 3 2 1 
4 3 2 1 
3 2 1 
2 1 
1 
4 3 2 1 
3 2 1 
2 1 
1 
3 2 1 
2 1 
1 
2 1 
1 
1 
(8 \choose 4) = 70 

Please enter n and r < n and print_option
 (e.g. 15 14 1 or 15 14 0)
9 4 0

6 5 4 3 2 1 
5 4 3 2 1 
4 3 2 1 
3 2 1 
2 1 
1 
5 4 3 2 1 
4 3 2 1 
3 2 1 
2 1 
1 
4 3 2 1 
3 2 1 
2 1 
1 
3 2 1 
2 1 
1 
2 1 
1 
1 

===============================

Play around with this program Enumerate.c , Enumerate.h.

Finally I enabled $LaTeX$ setting for the blog.

Thursday, May 05, 2011

[THEORY] Is nature a giant guessing algorithm ?

I was reading Dr. Lipton's blog on guessing and was really impressed on how he relates the power of guessing with the question whether P=NP or P!=NP.

Looks like nature on the other had seems to be solving several combinatorial problems almost instantly (e.g. folding of the proteins to minimize its energy), does nature guess the solution to the problem ?. One way to guess a solution to a problem is to look inside the building blocks which formulated the problem. Dr. Lipton gives an example were we were supposed to guess the digits which can open a lock -- the lock we normally use in the GYM. It may be hard to guess the digits which can open a lock without any help. However if we can get an X-ray of the levers we can guess the solution to the problem more easily. So does this mean nature looks at the problems in a totally different perspective ? , is there something very intuitive to the nature which the humans fail to recognize ?.

Why did all the great mathematicians failed to find a polynomial solution 3-SAT ? -- or prove it does not exist. Is there something missing in the way we are looking at a digital circuit ? , can we design guessing algorithms to solve the NP-Complete problems ?. We don't know the answers for all these. But I'm sure that nature is a great guessing algorithm and it may be impossible for us to guess its guessing algorithm.