## Tuesday, April 29, 2008

### [TECH] A simple UNIX based algorithm to run a set of tasks in parallel.

Given a set of tasks T={t1,t2....,tn} such that these tasks can be run independently in parallel, however we have a limited set of resources and our constraint is K which tells us the maximum number of process's which can run in parallel. Our problem now is to realize this in UNIX using standard system calls like fork and wait.

I came across this recently and I came across the same thing several times before. Some examples of the contexts which would need this are as follows.

• Running test cases (regressions) in parallel, suppose we have several thousands of test cases for a product and we wish to run on a machine which can run K tests at any given instance.
• Making builds in parallel
• Parallel crawling/web indexing (which was my motivation to write this)

$#ARGV == 1 or die("USAGE: perl parallel_runner.pl {file_with_commands} {no.of.process's}");$command_file = $ARGV[0]; open(CMD_FILE,$command_file) or die("$!");$max_process = $ARGV[1]; ($max_process < 256) or die("Too many process's....use less number");
%child_hash;
$process_count = 0; while(<CMD_FILE>){$cmd = $_; if($cmd =~ /^\#/){
next;
}
if($process_count <$max_process){
$pid = fork(); if($pid < 0 ){
die("$! : RESOURCE ERRORS"); }elsif($pid == 0){
system("$cmd"); exit(0); }else{$child_hash{$pid} = 1;$process_count++;
}
}

if($process_count==$max_process){
$pid = wait(); if($child_hash{$pid} == 1){$child_hash{$pid} = 0; }elsif($pid == -1){
print "No childs to wait?? Confused...quitting\n";
last;
}
$process_count--; } } print "Done creating process's...now waiting for childs...\n";$i = 0;
while($i <$max_process){
$pid = wait(); if($pid == -1){
last;
}
\$i++;
}