Monday, February 01, 2010

[TECH] An efficient External sorting API

Sorting is one of the most fundamental operation on data. Very efficient algorithms exists to perform this operation. Most of the programming languages are shipped with some sorting API with them. Most of them are in-memory sorting algorithms. However I find all these APIs are so bloated which are unfortunately by produces of abusing and overusing the object oriented concepts. On the other hand the bloated code may run fine as long as the input size is bounded by some constant -- which is most of the case for many applications. Any way I guess I'm getting into a little off-topic but if you are interested see the flames between C++ and C hackers on the Linux kernel mailing list here

On the other hand most of these programming APIs lack a solid External sorting algorithms. External sorting is * THE MOST * fundamental operation especially when you want to build algorithm on monster and massive datasets. You can try my new -- well old but wrapped in a new API, External sort please get it here . I'll try to post some examples how to use if when I get more time.