Saturday, April 19, 2008

[TECH] What does computing the edit distance between a string and its reverse tell us?

I came across a problem recently which asks for the following "Given a string find out the minimum number of characters to be inserted to make it a palindrome, and also give the string". There might be several ways to solve this problem I have a simple algorithm to solve this problem in O(n2). The observation is the following reccurence.

Let LCSr[x,y] = longest common sub sequence between S1...x and Sr1...y , where Sr is the reverse of the string S.
Let X=x1x2....xk be the be the palindrome with minimum # of inserted characters to make it a palindrome, then X has to be formed from S as follows.
1. Find a index k in S such that the cost of transforming S[1..k] to S[k+1...n] is minimum and the palindrome would be X = S'[1...k]S'r[k+1...n] or X = S'[1..k-1]S[k]S'r[k+1..n]
2. The index k such that LCSr[k][len-k] is maximum.
3. Get the code here

/*Builds the LCS between the forward and backwards and finds
a index 'p' which will give minimum # of operations to transform
forward string to backward string.*/
void FindLCSReverse(char *str,unsigned int len){
    int i,j;
    int current_min,max_lcs;
    int imax,jmax,palindex,palindex_r;
    unsigned int operations;
    char path=0;
    for(i=0;i<len;i++){
        LCS[i][0] = 0;
    }
    for(j=0;j<len;j++){
        LCS[0][j] = 0;
    }
    for(i=1;i<len;i++){
        for(j=1;j<len;j++){
            /*use len-j to access other string*/
            if(buf[i] == buf[len-j]){
                LCS[i][j] = LCS[i-1][j-1]+1;
            }else{
                LCS[i][j] = (LCS[i-1][j]>LCS[i][j-1])?LCS[i-1][j]:LCS[i][j-1];
            }
        }
    }
    /*Compute the p which gives minimum inserts to transform 
     *forward string to backward
     */
    max_lcs=0;
    imax=len-1;jmax=0;
    for(i=len-1;i>=1;i--){
        if(LCS[i][len-i-1] > max_lcs){
            max_lcs = LCS[i][len-i-1];
            imax = i;
            jmax = len-i-1;
        }
        if(i>0 && LCS[i-1][len-i-1] >= max_lcs){
            max_lcs = LCS[i-1][len-i-1];
            imax = i-1;
            jmax = len-i-1;
        }
    }
    /*Now find the actual string*/
    i=imax;
    j=jmax; 

    palindex_r=0;
    while(i!=0 || j!=0){
        if(i>0 && j>0){
            if(buf[i] == buf[len-j]){
                palindrome_rev[palindex_r++] = buf[i];
                i--; j--;
                continue;
            }
            current_min = (LCS[i-1][j] > LCS[i][j-1])?LCS[i-1][j]:LCS[i][j-1];
            if(current_min == LCS[i-1][j]){
                if(current_min == LCS[i][j-1] && buf[i] < buf[len-j]){
                    palindrome_rev[palindex_r++] = buf[len-j];
                    j--;
                }else{
                    palindrome_rev[palindex_r++] = buf[i];
                    i--;
                }
            }else if(current_min == LCS[i][j-1]){
                if(current_min == LCS[i-1][j] && buf[len-j] < buf[i]){
                    palindrome_rev[palindex_r++] = buf[i];
                    i--;
                }else{
                    palindrome_rev[palindex_r++] = buf[len-j];
                    j--;
                }
            }
        }else if(j==0){
            palindrome_rev[palindex_r++] = buf[i];
            i--;
        }else if(i==0){
            palindrome_rev[palindex_r++] = buf[len-j];
            j--;
        }
        /*path=1 (left) path=2 (down) path=3 (diag)*/
    }

    for(i=0;i<palindex_r;i++){
        palindrome[palindex_r-1-i] = palindrome_rev[i];
    }
    palindrome_rev[palindex_r] = '\0';
    palindrome[palindex_r] = '\0';

    /*printf("imax = %u jmax = %u len=%u \n",imax,jmax,len);*/
    if(imax+jmax==len-1){
        printf("%s%s\n",palindrome,palindrome_rev);
    }else{
        printf("%s%c%s\n",palindrome,buf[imax+1],palindrome_rev);
    }
}

No comments: