Examples: Multi-Query

In this example we know that we need to fetch several result sets from a database. Traditionally you would make the requests one after the other, gathering results, and finally outputting a page. We’re going to use gearman to execute these queries in parallel to speed up the entire operation.

The Client

The client here could be your webapp, but for the purpose of being quick to demonstrate and try, we’ll stick a command line script.

<?php

$client = new GearmanClient();
$client->addServer();

// initialize the results of our 3 "query results" here
$userInfo = $friends = $posts = null;

// This sets up what gearman will callback to as tasks are returned to us.
// The $context helps us know which function is being returned so we can
// handle it correctly.
$client->setCompleteCallback(function(GearmanTask $task, $context) use (&$userInfo, &$friends, &$posts) {
switch($context) {
case 'lookup_user':
$userInfo = $task->data();
break;
case 'baconate':
$friends = $task->data();
break;
case 'get_latest_posts_by':
$posts = $task->data();
break;
}
});

// Here we queue up multiple tasks to be execute in *as much* parallelism as gearmand can give us
$client->addTask('lookup_user', 'joe@joe.com', 'lookup_user');
$client->addTask('baconate', 'joe@joe.com', 'baconate');
$client->addTask('get_latest_posts_by', 'joe@joe.com', 'get_latest_posts_by');

echo "Fetching...\n";
$start = microtime(true);
$client->runTasks();
$totaltime = number_format(microtime(true) - $start, 2);

echo "Got user info in: $totaltime seconds:\n";
var_dump($userInfo, $friends, $posts);
// todo
// todo

The Worker

The worker, on the other hand, is still pretty simple: emulate the delay you might see when doing some real work and return a dummy result.

<?php

$worker = new GearmanWorker();
$worker->addServer();

$worker->addFunction('lookup_user', function(GearmanJob $job){
// normally you'd so some very safe type checking and query binding to a database here.
// ...and we're gonna fake that.
sleep(3);
return 'The user requested ('. $job->workload() .') is 7 feet tall and awesome';
});

$worker->addFunction('baconate', function(GearmanJob $job){
sleep(3);
return 'The user ('. $job->workload() .') is 1 degree away from Kevin Bacon';
});

$worker->addFunction('get_latest_posts_by', function(GearmanJob $job){
sleep(3);
return 'The user ('. $job->workload() .') has no posts, sorry!';
});

while ($worker->work());
// todo
// todo

The Payoff

“Hey, I ran your stupid code and it took 9 seconds! It’s not at all faster!”

# ./run/client/here
Fetching...
Got user info in: 9.00 seconds:
string(59) "The user requested (joe@joe.com) is 7 feet tall and awesome"
string(56) "The user (joe@joe.com) is 1 degree away from Kevin Bacon"
string(43) "The user (joe@joe.com) has no posts, sorry!"

Ouch, yeah, There’s a comment in the code snippet that states:

Here we queue up multiple tasks to be execute in as much parallelism as gearmand can give us

What this means is that gearman will only run tasks in parallel if there are enough workers to accomplish that. Failing that, it will run them with as much concurrency as it can muster up from available workers. So, if you spin up three (or more) workers you’ll see this:

# ./run/client/here
Fetching...
Got user info in: 3.00 seconds:
string(59) "The user requested (joe@joe.com) is 7 feet tall and awesome"
string(56) "The user (joe@joe.com) is 1 degree away from Kevin Bacon"
string(43) "The user (joe@joe.com) has no posts, sorry!"

And now our code is significantly faster – in fact, it’s only as slow as the slowest query. Parallelism can dramatically speed up pages where tasks must be done while the user waits for a response but that, done in serial, can take a long time.