Thursday, March 24, 2011

[TECH] juggernaut-asm: An out-of-core sequence assembler

Development updates on the juggernaut-asm project.I'm now branching off the trunk to implement the feature which avoids using the UNIX sort. To checkout the main branch as follows. This is the version of the code I used to obtain the results on 4million reads which the in-memory algorithm of Velvet could not handle.

cvs -d:pserver:anonymous@juggernaut-asm.cvs.sourceforge.net:/cvsroot/juggernaut-asm co .

The code to replace the unix sort being developed in the branch name 'replace-unix-sort'.

cvs -d:ext:vamsi99@juggernaut-asm.cvs.sourceforge.net:/cvsroot/juggernaut-asm co -r replace-unix-sort .

Wednesday, March 23, 2011

[TECH] External R-Way Merge sorting.

http://lib-ex-sort.sourceforge.net/ is an external sorting program the following are the limitations I wish to remove before I leave in May. Most of them are engineering changes but are very useful for several algorithms based on external sorting.

  • Currently the size of each key is a constant. We need to remove this by adding a key_header for each key.
  • Use MMAP for integer sorting to avoid copy between user and kernel space.
  • Support for sorting when the data spans multiple files.