Saturday, October 27, 2007

[TECH] A simple O(n^2) time algorithm to construct the Ultra-metric trees.

Given a matrix 'D' which is symmetric an Ultra-metric tree 'T' is a binary tree with internal nodes corresponding to D(i,j) and leaves corresponding to rows in the matrix 'D', and a path from the root to the leaf must be strictly decreasing (Ultra-metric tree). Also D(i,j) is the lowest common ancestor for 'i' and 'j'.
The problem is given 'D' we need to figure out corresponding 'T'.

The well know fact about the Ultra-metric trees is that if we take any 3 leaves i,j,k the maximum is not unique (e.x D(i,j)=5, D(j,k)=6,D(i,k)=5) we have D(i,j) and D(i,k) having same value.
How do we test if D is Ultra-metric?

  • A trivial solution is O(n^3).
  • The Errata shows construction of Ultra-metric trees in O(n^2) and asks a question if we can check for the Ultra-metric property in O(n^2) ? which my next item answers affirmatively.
  • My observation is we can partition the leaves of the ultra-metric tree with out any need of building a complete weighted graph as in Gusfields solution, one fact is that every row in D is going to have a maximum (global) we don't need to spend O(n^2) (comparing all the elements in the matrix D) to find one. The algorithm below we perform such a check if an ultra-metric tree exists and also builds one.
    L = {1,2,....n};
     i = select_a_random_element_in(L) ;
     max = find_max(D(i,*)); /*Takes linear time*/
     /*max = D(i,q) && q should be in L*/
     Part_q = {};
     Part_i = {};
     foreach(j in L){
            if(D(j,i)<=max && D(j,q)==max){
                 Part_i += {j};
            }else if(D(j,q)<=max && D(j,i)==max){
                 Part_q += {j};
               printf("D is not ultra-metric");
               return 0;
Running time would be T(n) = T(n-k) + T(k) + O(n), clearly its quadratic O(n^2).

Tuesday, October 16, 2007

[TECH] Implementing Reliability(FEARLESS) on top of FUSE

FUSE is a user space file system implementation framework, the file system implementation can run as a user process and interact with the VFS of the kernel through a named pipe /dev/fuse, FUSE comes by default with all kernels > 2.6.8

"Any data which persists on just one disk is very unreliable" , I was talking with Rohit couple of days back and he compared disk as an "Electric Bulb", and u know about the filament in the bulb it can go off any time. So what do you think of your laptop how many disk's does it have? JUST ONE.....thats extremely unreliable until and unless your system is not backed up every day. So how can we get the reliability achieved by RAID (Redundant Array of Inexpensive Disks) ? the answer is persist what is called an "ACTIVE DATA" on some thing like FLASH memory and we can integrate this with a backup mechanism to get good reliability for personal storage devices like laptops. FEARLESS is idea which came up with this reliability for personal storage devices, you can read paper by Dr.Chandy.

Our IDEA is now to realize FEARLESS using FUSE and rsync (for backup), so FEARLESS would be just another file system similar to SSHFS but very reliable


Monday, October 08, 2007

[BDAY] unsigned char age = age++;

Today happened to be my birthday, we just need an unsigned char to represent age, can you tell whats wrong with the code in the title (or below?).

static unsigned char age;

void Birthday(){
  age = age++;

Looks like the Standard 'C' says the value in the age is undefined ? because the assignment (=) is not a sequence point where the side effects (++,--) are settled. That can make the age of the person remain the same or increase it by one its ambiguous.

Vamsi. google index this

Friday, October 05, 2007

[TECH] O(n^2/(log(n)^2)) time algorithm for finding edit distance. [How does it feel when you know have reinvented the wheel :(]

Yesterday night during the workout suddenly I thought about what will happen if we encode the t-vector {-1,0,1}in the Four Russian algorithm as an integer.....the idea was a BOOM! I really felt like EUREKA! just like Archimedes thought when he discovered the law of flotation, I felt that suddenly I speedup the edit-distance dynamic programming algorithm which is O(n^2) by a factor of O(log^2(n)), it was really great I spend working on the details all special cases and written up every thing was great till then. I discussed the solution with Dr.Wu and found that it was actually solution to some exercise problem in Gusfield's book which says that in a RAM based computation model the copy of vector 't' can be done in O(log(t)) time.

How do you feel when you know some thing which you are exited just slips from you ? its very painful and its like killing you, but thats the way the world is its full of SMART people and any path which you are taking might have been taken before........