## Saturday, April 19, 2008

### [TECH] What does computing the edit distance between a string and its reverse tell us?

I came across a problem recently which asks for the following "Given a string find out the minimum number of characters to be inserted to make it a palindrome, and also give the string". There might be several ways to solve this problem I have a simple algorithm to solve this problem in O(n2). The observation is the following reccurence.

Let LCSr[x,y] = longest common sub sequence between S1...x and Sr1...y , where Sr is the reverse of the string S.
Let X=x1x2....xk be the be the palindrome with minimum # of inserted characters to make it a palindrome, then X has to be formed from S as follows.
1. Find a index k in S such that the cost of transforming S[1..k] to S[k+1...n] is minimum and the palindrome would be X = S'[1...k]S'r[k+1...n] or X = S'[1..k-1]S[k]S'r[k+1..n]
2. The index k such that LCSr[k][len-k] is maximum.
3. Get the code here

/*Builds the LCS between the forward and backwards and finds
a index 'p' which will give minimum # of operations to transform
forward string to backward string.*/
void FindLCSReverse(char *str,unsigned int len){
int i,j;
int current_min,max_lcs;
int imax,jmax,palindex,palindex_r;
unsigned int operations;
char path=0;
for(i=0;i<len;i++){
LCS[i][0] = 0;
}
for(j=0;j<len;j++){
LCS[0][j] = 0;
}
for(i=1;i<len;i++){
for(j=1;j<len;j++){
/*use len-j to access other string*/
if(buf[i] == buf[len-j]){
LCS[i][j] = LCS[i-1][j-1]+1;
}else{
LCS[i][j] = (LCS[i-1][j]>LCS[i][j-1])?LCS[i-1][j]:LCS[i][j-1];
}
}
}
/*Compute the p which gives minimum inserts to transform
*forward string to backward
*/
max_lcs=0;
imax=len-1;jmax=0;
for(i=len-1;i>=1;i--){
if(LCS[i][len-i-1] > max_lcs){
max_lcs = LCS[i][len-i-1];
imax = i;
jmax = len-i-1;
}
if(i>0 && LCS[i-1][len-i-1] >= max_lcs){
max_lcs = LCS[i-1][len-i-1];
imax = i-1;
jmax = len-i-1;
}
}
/*Now find the actual string*/
i=imax;
j=jmax;

palindex_r=0;
while(i!=0 || j!=0){
if(i>0 && j>0){
if(buf[i] == buf[len-j]){
palindrome_rev[palindex_r++] = buf[i];
i--; j--;
continue;
}
current_min = (LCS[i-1][j] > LCS[i][j-1])?LCS[i-1][j]:LCS[i][j-1];
if(current_min == LCS[i-1][j]){
if(current_min == LCS[i][j-1] && buf[i] < buf[len-j]){
palindrome_rev[palindex_r++] = buf[len-j];
j--;
}else{
palindrome_rev[palindex_r++] = buf[i];
i--;
}
}else if(current_min == LCS[i][j-1]){
if(current_min == LCS[i-1][j] && buf[len-j] < buf[i]){
palindrome_rev[palindex_r++] = buf[i];
i--;
}else{
palindrome_rev[palindex_r++] = buf[len-j];
j--;
}
}
}else if(j==0){
palindrome_rev[palindex_r++] = buf[i];
i--;
}else if(i==0){
palindrome_rev[palindex_r++] = buf[len-j];
j--;
}
/*path=1 (left) path=2 (down) path=3 (diag)*/
}

for(i=0;i<palindex_r;i++){
palindrome[palindex_r-1-i] = palindrome_rev[i];
}
palindrome_rev[palindex_r] = '\0';
palindrome[palindex_r] = '\0';

/*printf("imax = %u jmax = %u len=%u \n",imax,jmax,len);*/
if(imax+jmax==len-1){
printf("%s%s\n",palindrome,palindrome_rev);
}else{
printf("%s%c%s\n",palindrome,buf[imax+1],palindrome_rev);
}
}