In this example we’re going to attempt to geocode (estimate lat/long for an address) for a very large list of addresses.
The basic, non-gearman script just reads from STDIN and prints out a JSON-encoded result for every line:
This works, but it’s very slow. A 2GHz 8-core Intel Xeon E5405 would eek out 1700 addresses/second. The problem is that a single thread simply isn’t processing fast enough.
To speed this up we’re going to spin up multiple workers, split the file among them, and pull the results all back togther. By doing this we can process multiple chunks at a time and speed up the overall process.
Startup a bunch of those with
supervisord, split your address file into
reasonable chunks, and pipe them into
gearman -f geocode. With chunks of 1000
lines and 32 workers we can get 18,000 addresses/second; over 10 fold
improvement just from one server. Adding in a slightly older server into the
mix boosts that number to 30,000 addresses/second.
It should be noted that you can get nearly as good performance, 15,500
addresses/sec in our case on a single server, just by wrapping the basic script
gearman -w -f geocode ./scriptname. For quick-n-dirty scripts
this is much quicker to spin up.