Easy Parallel Processing with PHP on the Command Line

Developers often come across tasks that can be sped up with Parallel processing. When there is a lot of repetitive tasks to be done, which can be split into independent parts, being able to run multiple processes simultaneously can speed the job up immensely. A good example is a running some calculation on every line of a huge report. Here's how I was able to make a command line script in PHP that allowed me to break a process into parts and process them all at the same time. One caveat: this technique only works on the command line and on Linux/Unix servers.

Part 1: starting another process from within PHP using exec()

The exec() function will run the command given to it. Normally, your PHP script will wait for the command to complete before proceeding, but if you redirect the output to a file and end the command with a '&', exec() will run the process in the background. so instead of writing

exec("ls -al");

you can write

exec("ls -al > listing.txt &");

and your script will create a new process and then proceed to the next statement immediately.

Part 2: make your function recursive with $argv

When used on the command line, PHP always will support the $argv superglobal, which is an array of the command line arguments used when the script was launched. If you have a script called "script.php" and you call it with the command

php script.php arg1 arg2 arg3

Then the $argv variable will be an array:

array(4) {
  [0]=>
  string(10) "script.php"
  [1]=>
  string(4) "arg1"
  [2]=>
  string(4) "arg2"
  [3]=>
  string(4) "arg3"
}

You can easily check if a script was called with an argument by checking the value of $argv[1].

Part 3: putting it together
Combining these two steps you can create a script that calls itself multiple times, and they can all run at the same time. For example, let's say I need to process 10,000 rows in a database table, and that I want to split it up into 10 processes of 1,000 rows apiece. I have a function process_records($id, $count) that will start at record number $id, and process $count rows after that. If I wanted to run them sequentially, I would write:

$processor = new MyClass();
$processor->process_records(0, 10000);

But if I want to run 10 processes , each processing 1,000 rows simultaneously, I can write the following:

if ($argv[1]) {
  $processor = new MyClass();
  $processor->process_records($argv[1], 1000);
} else {
  for ($i = 0; $i < 10000; $i += 1000)
    exec("php script.php $i > $i.log &");
}

The for() loop will create new processes of "php script.php 0", "php script.php 1000", "php script.php 2000", etc. all running at the same time, each one processing 1,000 rows and creating a file of the script's output as 0.log, 1000.log, 2000.log, etc. As long as you haven't maxed out your system memory or processor capacity, you'll finish in 1/10th of the time it would take to run sequentially.

Comments

please send me one example