Saturday, November 30, 2013

Getting around with integer overflows during Combinatorial counting

Many closed form expressions characterizing the number of combinatorial objects involve manipulation of raising and falling factorials (e.g. $$\displaystyle\Pi_{i=1}^{k} (n-i+1)$$). These quantities accumulate very fast we need to use extreme caution during computation. Lets take the example in which we want to compute $${n\choose k} = \frac{n\times (n-1)\ldots (n-k+1)}{k\times k-1\times \ldots 1}$$ which is the ratio of two falling factorials which is always an integer. A naive algorithm which computes numerator and denominator will easily overflow even though the actual value is well within the limits of an integer. If $$n_{max}$$ is the largest value represented by the underlying data-type, the naive algorithm would overflow when: $$n\geq \sqrt[k]{n_{max}}$$. One way to fix this issue is to maintain the ratio implicitly as two integers and knocking off the GCD when ever possible. This reduces the overflow during computation see below for a simple example. Thanks to my friend Chittu for discussing this over lunch.


  public static long computeNchooseK(int n, int k){
    assert n >=1 && k<=n : "Invalid n and k to compute";
    long num=1, denom=1, factor;
    do{
      factor = gcd(n,k);
      num *= (n--/factor); denom *= (k--/factor);
      factor = gcd(num, denom);
      num /= factor; denom /= factor;
    }while(k>=1);
    assert denom == 1 : "Something went wrong in computing NchooseK";
    return num;
  }