Thursday, December 04, 2008

[TECH] A non-recursive algorithm to compute the Edit Script in O(min(n1,n2)+log(min(n1,n2)) space

Recently I was working on an area efficient and synthesizable design to compute the edit script (minimum cost sequence of INSERT, DELETE and CHANGE) operations to transform string S1 to S2. After a little bit of thought I have the following idea to build a non-recursive version of the Hirschberg's algorithm, since the algorithm is non-recursive we can build an efficient digital circuit with this idea. We use a simple circular queue and apply DFS (Depth First Search) and we can prove that the capacity of this queue at any stage of the algorithm is θ(log(min(n1,n2)). The proof is simple we choose the geometrically decreasing string which has smaller length from the given strings(min(n1,n2)). Since we do a depth first search and the depth of subproblem tree is θ(log(min(n1,n2)), so we will have atmost θ(log(min(n1,n2)) subproblems in the circular queue at any stage of the algorithm.

The core non-recursive algorithm starts off with the initial problem and finds the minimum alignment split ('q') between (S1[1.....n1/2] and S2 creates two sub problems and appends in the circular queue (currently it uses BFS). Download the linear space implementation from here Following is the outline of the core non recursive implementation.

     91     CQueue bfs_list;
     92     ESSubProblem es_sub;
     93 
     94     InitializeCQ(&bfs_list,sizeof(ESSubProblem));
     95     /*Put the toplevel subproblem into the CQueue*/
     96     es_sub.sub_s1 = s1; es_sub.sub_n1 = n1;
     97     es_sub.sub_s2 = s2; es_sub.sub_n2 = n2;
     98 
     99     AppendCQ(&bfs_list,&es_sub);
    100     while(DequeueCQ(&bfs_list,&es_sub)){
    101         sub_s1 = es_sub.sub_s1; sub_s2 = es_sub.sub_s2;
    102         sub_n1 = es_sub.sub_n1; sub_n2 = es_sub.sub_n2;
    103 
    104         if(sub_n1 <=1){
    105             /*If sub_n1 is 1 or 0 we cannot split it any more*/
    106             EditDistanceLS(sub_s1,sub_n1,sub_s2,sub_n2);
    107             i = sub_n1; j = sub_n2;
    108             /*Figure out the edit operations*/
    109             while(i || j){
    110                 op = GetOp(i,j,(i?sub_s1[i-1]:'\0'),
    111                     (j?sub_s2[j-1]:'\0'));
    112                 switch(op){
    113                     case 'I':
    114                             EditScriptS2[(&sub_s2[j-1]) - s2] = 'I';
    115                             j--;
    116                             break;
    117                     case 'D':
    118                             EditScriptS1[(&sub_s1[i-1]) - s1] = 'D';
    119                             i--;
    120                             break;
    121                     case 'C':
    122                             EditScriptS1[(&sub_s1[i-1]) - s1] =
    123                                 (sub_s1[i-1] == sub_s2[j-1])?':':'C';
    124                                 i--; j--;
    125                 }
    126             }
    127         }else {
    128             /*Add the two subproblems*/
    129             q = FindMinSplit(sub_s1,sub_n1,sub_s2,sub_n2);
    130             /*SUB PROBLEM 1*/
    131             es_sub.sub_n1 = CELING(sub_n1,2);
    132             es_sub.sub_n2 = q;
    133             AppendCQ(&bfs_list,&es_sub);
    134 
    135             /*SUB PROBLEM 2*/
    136             es_sub.sub_s1 = &(sub_s1[CELING(sub_n1,2)]);
    137             es_sub.sub_s2 = &(sub_s2[q]);
    138             es_sub.sub_n1 = sub_n1/2;
    139             es_sub.sub_n2 = sub_n2-q;
    140             AppendCQ(&bfs_list,&es_sub);
    141         }
    142     }




No comments: