Tuesday, April 29, 2008

[TECH] A simple UNIX based algorithm to run a set of tasks in parallel.

Given a set of tasks T={t1,t2....,tn} such that these tasks can be run independently in parallel, however we have a limited set of resources and our constraint is K which tells us the maximum number of process's which can run in parallel. Our problem now is to realize this in UNIX using standard system calls like fork and wait.

I came across this recently and I came across the same thing several times before. Some examples of the contexts which would need this are as follows.

  • Running test cases (regressions) in parallel, suppose we have several thousands of test cases for a product and we wish to run on a machine which can run K tests at any given instance.
  • Making builds in parallel
  • Parallel crawling/web indexing (which was my motivation to write this)

Download this code

$#ARGV == 1 or die("USAGE: perl parallel_runner.pl {file_with_commands} {no.of.process's}");
$command_file = $ARGV[0];
open(CMD_FILE,$command_file) or die("$!");
$max_process = $ARGV[1];
($max_process < 256) or die("Too many process's....use less number");
%child_hash;
$process_count = 0;
while(<CMD_FILE>){
 $cmd = $_;
 if($cmd =~ /^\#/){
  next;
 }
 if($process_count < $max_process){
  $pid = fork();
  if($pid < 0 ){
   die("$! : RESOURCE ERRORS");
  }elsif($pid == 0){
   system("$cmd");
   exit(0);
  }else{
   $child_hash{$pid} = 1;
   $process_count++;
  }
 }

 if($process_count==$max_process){
  $pid = wait();
  if($child_hash{$pid} == 1){
   $child_hash{$pid} = 0;
  }elsif($pid == -1){
   print "No childs to wait?? Confused...quitting\n";
   last;
  } 
  $process_count--;
 }
}
print "Done creating process's...now waiting for childs...\n";
$i = 0;
while($i < $max_process){
 $pid = wait();
 if($pid == -1){
  last;
 }
 $i++;
}

No comments: