Given a set of tasks T={t1,t2....,tn} such that these tasks can be run independently in parallel, however we have a limited set of resources and our constraint is K which tells us the maximum number of process's which can run in parallel. Our problem now is to realize this in UNIX using standard system calls like fork and wait.
I came across this recently and I came across the same thing several times before. Some examples of the contexts which would need this are as follows.
- Running test cases (regressions) in parallel, suppose we have several thousands of test cases for a product and we wish to run on a machine which can run K tests at any given instance.
- Making builds in parallel
- Parallel crawling/web indexing (which was my motivation to write this)
Download this code
$#ARGV == 1 or die("USAGE: perl parallel_runner.pl {file_with_commands} {no.of.process's}"); $command_file = $ARGV[0]; open(CMD_FILE,$command_file) or die("$!"); $max_process = $ARGV[1]; ($max_process < 256) or die("Too many process's....use less number"); %child_hash; $process_count = 0; while(<CMD_FILE>){ $cmd = $_; if($cmd =~ /^\#/){ next; } if($process_count < $max_process){ $pid = fork(); if($pid < 0 ){ die("$! : RESOURCE ERRORS"); }elsif($pid == 0){ system("$cmd"); exit(0); }else{ $child_hash{$pid} = 1; $process_count++; } } if($process_count==$max_process){ $pid = wait(); if($child_hash{$pid} == 1){ $child_hash{$pid} = 0; }elsif($pid == -1){ print "No childs to wait?? Confused...quitting\n"; last; } $process_count--; } } print "Done creating process's...now waiting for childs...\n"; $i = 0; while($i < $max_process){ $pid = wait(); if($pid == -1){ last; } $i++; }