Wednesday, March 12, 2008

[TECH] A simple algorithm to find the coefficient's of characteristic polynomial.

Its often the case that the we need coefficients of the characteristic polynomial (xA = λx) rather than just the Eigen values and Eigen vectors of the matrix A, The following simple algorithm which can determine the coefficients of the polynomial from its roots (Eigen values) would be extremely handy, its a simple dynamic programming algorithm and also illustrates how quickly we can code dynamic programming algorithms in Matlab because the matrices resize automatically.

% eigen_poly_coeff: input 'A' should be a square matrix
% The program finds the eigen values of A, and constructs
% the caracteristic polynomial 
% output is the order of coefficients in the increasing 
% degree order.
function retval = eigen_poly_coeff (A)
v = eig(A)';
B = size(v);
coeff_matrix = eye(1,(B(1,2)+1));
coeff_matrix(1,1) = v(1,1)*-1;
coeff_matrix(1,2) = 1;

for i=3:1:(B(1,2)+1)
   coeff_matrix(1,i) = 1;
   for j=(i-1):-1:1
    coeff_matrix(1,j) = (coeff_matrix(1,j)*
    if(j-1 >=1)
      coeff_matrix(1,j) = 
 retval = coeff_matrix;

Sunday, March 09, 2008

[TECH] Algorithmic details of UNIX Sort command.

I happened to look at the algorithmic details of UNIX Sort, a LINUX version of the classic UNIX sort is a part of GNU coreutils-6.9.90. This is classic example of the standard External R-Way merge , to sort a data of size N bytes with a main memory size of M so it creates N/M runs and merges R at a time, the number of passes through the data is log(N/M)/log(R) passes.In fact the lower bound(runtime) for external sorting is Ω((N/M)log(N/M)/log(R)). All the external memory sorting algorithms provided in the literature are optimal so the fight here is minimizing the constant before the number of passes.

UNIX sort treats keys are lines (strings), the algorithm followed by unix sort is in fact the R-Way merge. Let the input file size be IN_SIZE.
1. Choosing Run Size:
The sizes of the initial runs are chosen from the total physical memory (TOTAL_PHY) and available memory (AVAIL_PHY). RUN_SIZE = (MAX(TOTAL_PHY/8,AVAIL_PHY))/2
maximum of 1/8th of TOTAL_PHY and AVAIL_PHY and divided by 2. See function "default_sort_size (void)" in the code.
2. Creating Runs:
Unix sort creates a temporary file for every run. So it creates IN_SIZE/RUN_SIZE (celing) temporary files. Internally it uses merge sort to sort internally it uses an optimization mentioned in Knuth volume 3 (2nd edition), problem 5.2.4-23.
3. Merging:
The number of runs merged at any time is hard coded in the program see macro NMERGE , NMERGE is defined to be 16 so it merges exactly 16 runs at any time.

Wednesday, March 05, 2008

[TECH] Sorting partially sorted sequences.

Partially Sorted Sequence: A sequence k1,k2, of n keys such that any key kj differs from its sorted position by atmost d as an example for d=3 we have 3,5,6,1,2,4 I came across this problem recently, turns out that we can solve this problem in several ways in O(nlog(d)) time. I found an interesting and simple way to solve this problem, a simple observation reveals that the smallest element in the entire sequence of n keys exists in the first d keys.

/*Build a heap(min) H with first d elements
 *in O(d) time
    DeleteMin(H); /*output this key*/
/*Note the Heap will have atmost d keys in it
at any time so the Insert and Delete can be done in 
O(log(d)) worst case.

(n-d)log(d) = O(nlog(d))

Tuesday, March 04, 2008

[TECH] Converting an Las Vegas algorithm from an Expected Time Bound to High Probability bound.

Some times its not always possible to derive the High Probability bounds (1-n) for Las Vegas algorithms, I found an interesting way to convert any Las Vegas algorithm with expected bounds on the resources to the High Probability bounds. Assume that Tn be the expected runtime(or any resource) for a Las Vegas algorithm. The plain vanilla Las Vegas algorithm have the following structure.

  /*Select a random sample*/
  if(answer found) quit;

Basically any analysis of the above randomized algorithm should tell how long should we run the loop. So let Tn gives the expected time on how long this loop runs. What we can do is as follows

  • Let the algorithm run for exactly 2Tn steps, if the algorithm quits before that we are fine, if the algorithm does not quit after Tn times then stop and restart.

counter = 0;
 /*Pick a random sample*/
 if(counter++ == 2Tn){
    counter=0; /*restarting*/
 if(answer found) quit; 

We can now show that the above algorithm will now give high probability bounds. Let X be the random variable which indicates the runtime of the algorithm. Then using Markov inequality P[X ≥ aTn] ≤ 1/a

  • P[X ≥ 2Tn] ≤ 1/2
  • Lets assume that the algorithm runs for k ≥ 2Tn time steps, then we should have done a reset exactly k/(2Tn) times.
  • The probability of doing a reset equals P[X ≥ 2Tn] = P[reset].
  • Lets assume all the events are independent then the probability of resetting k/(2Tn) P[reset k/(2Tn) times] = P[algorithm running k time steps]
  • P[reset k/(2Tn)] ≤ (1/2)*(1/2)*(1/2)...k/(2Tn) times
  • P[reset k/(2Tn)] ≤ (1/2)k/(2Tn)
  • we want the above probability to be very small (n), to make that (1/2)k/(2Tn) = n). So the value of k for which this happens is 2Tnαlog(n)
  • So with high probability after 2Tnαlog(n) time steps if we quit the loop then we give the correct answer.
  • So we just have taken an expected bound Tn and converted into ∼O(log(n)Tn) time algorithm with high probability rather than expected bounds.