Thursday, December 25, 2008

[TECH] Simple comparision of real numbers

We know that the fixed precision representation of real numbers is done by mapping them into integer space I from real space R. However the underlying integer representation of the 32-bit floats and 64-bit doubles is a sign magnitude representation (that is the reason why we see -0.00 during output of some numerical algorithms). People seem to be adding a constant to convert the underlying sign magnitude form to 2's compliment. However following is a more logical conversion (since we know that 2's compliment is 1's compliment +1 , the motivation of 2's compliment is to avoid the -0 as we know). The following comparison routine compares two floating point numbers (add the tolerance in the terms of the number the real numbers to make it approximate comparison).

/*Less than or greater than*/
char CompareFloats(float a, float b){
    int aint = *(int *)&a;
    int bint = *(int *)&b;
    aint = (aint & (1UL<<31))? ~(aint^(1UL<<31))+1:aint;
    bint = (bint & (1UL<<31))? ~(bint^(1UL<<31))+1:bint;
    aint -= bint;
    printf("%f %s %f \n",a,(aint <0)?"<":">=",b);
    printf("There are %d real numbers \n",(aint<0)?-aint:aint);
    printf("between %f and %f (in 32 bit float representation)\n",a,b);

Thursday, December 18, 2008

[TECH] Launchpad and Bazaar

I have now started to move all my development efforts to launchpad and bazaar. I really found launchpad a great place to maintain and release my work. I tried several alternatives before I moved to launchpad, I tried to use but unfortunately its not as rich as launchpad and bazaar. I have released version 0.9 of libEditScript project whose aim to build a high performance and space efficient sequence alignment library. I have charted out the features for the next release, one thing I'm sure people would like to use is a JNI (Java Native Interface) so that people can use this space efficient code in their java code. I have seen that this problem of the need of edit script arises in several occasions, I have seen people using the space inefficient version O(n^2) of edit script computation. This is were we gain significantly in terms of the space saving. Also I have one more idea to make it more parallel in the sense I want to use some concurrent techniques to make this parallel, especially since I have circular queue I can easily use several threads to make this concurrent which I will add as a next step in the next release..

Checkout the libEditScript at Next in my list would be to make a release of the weightedmatching library.

Thursday, December 04, 2008

[TECH] A non-recursive algorithm to compute the Edit Script in O(min(n1,n2)+log(min(n1,n2)) space

Recently I was working on an area efficient and synthesizable design to compute the edit script (minimum cost sequence of INSERT, DELETE and CHANGE) operations to transform string S1 to S2. After a little bit of thought I have the following idea to build a non-recursive version of the Hirschberg's algorithm, since the algorithm is non-recursive we can build an efficient digital circuit with this idea. We use a simple circular queue and apply DFS (Depth First Search) and we can prove that the capacity of this queue at any stage of the algorithm is θ(log(min(n1,n2)). The proof is simple we choose the geometrically decreasing string which has smaller length from the given strings(min(n1,n2)). Since we do a depth first search and the depth of subproblem tree is θ(log(min(n1,n2)), so we will have atmost θ(log(min(n1,n2)) subproblems in the circular queue at any stage of the algorithm.

The core non-recursive algorithm starts off with the initial problem and finds the minimum alignment split ('q') between (S1[1.....n1/2] and S2 creates two sub problems and appends in the circular queue (currently it uses BFS). Download the linear space implementation from here Following is the outline of the core non recursive implementation.

     91     CQueue bfs_list;
     92     ESSubProblem es_sub;
     94     InitializeCQ(&bfs_list,sizeof(ESSubProblem));
     95     /*Put the toplevel subproblem into the CQueue*/
     96     es_sub.sub_s1 = s1; es_sub.sub_n1 = n1;
     97     es_sub.sub_s2 = s2; es_sub.sub_n2 = n2;
     99     AppendCQ(&bfs_list,&es_sub);
    100     while(DequeueCQ(&bfs_list,&es_sub)){
    101         sub_s1 = es_sub.sub_s1; sub_s2 = es_sub.sub_s2;
    102         sub_n1 = es_sub.sub_n1; sub_n2 = es_sub.sub_n2;
    104         if(sub_n1 <=1){
    105             /*If sub_n1 is 1 or 0 we cannot split it any more*/
    106             EditDistanceLS(sub_s1,sub_n1,sub_s2,sub_n2);
    107             i = sub_n1; j = sub_n2;
    108             /*Figure out the edit operations*/
    109             while(i || j){
    110                 op = GetOp(i,j,(i?sub_s1[i-1]:'\0'),
    111                     (j?sub_s2[j-1]:'\0'));
    112                 switch(op){
    113                     case 'I':
    114                             EditScriptS2[(&sub_s2[j-1]) - s2] = 'I';
    115                             j--;
    116                             break;
    117                     case 'D':
    118                             EditScriptS1[(&sub_s1[i-1]) - s1] = 'D';
    119                             i--;
    120                             break;
    121                     case 'C':
    122                             EditScriptS1[(&sub_s1[i-1]) - s1] =
    123                                 (sub_s1[i-1] == sub_s2[j-1])?':':'C';
    124                                 i--; j--;
    125                 }
    126             }
    127         }else {
    128             /*Add the two subproblems*/
    129             q = FindMinSplit(sub_s1,sub_n1,sub_s2,sub_n2);
    130             /*SUB PROBLEM 1*/
    131             es_sub.sub_n1 = CELING(sub_n1,2);
    132             es_sub.sub_n2 = q;
    133             AppendCQ(&bfs_list,&es_sub);
    135             /*SUB PROBLEM 2*/
    136             es_sub.sub_s1 = &(sub_s1[CELING(sub_n1,2)]);
    137             es_sub.sub_s2 = &(sub_s2[q]);
    138             es_sub.sub_n1 = sub_n1/2;
    139             es_sub.sub_n2 = sub_n2-q;
    140             AppendCQ(&bfs_list,&es_sub);
    141         }
    142     }