Extra Cheese

Me: Gary Bernhardt

Email: gary.bernhardt at gmail

Work: BitBacker

RSS Archive: 2008
2007
2006
May 30

Processes spawn faster than threads?

In general, processes take longer to start than threads. This makes sense if you think about it - a thread lives within the memory space of its parent process, so it takes less work to set one up. (This is a gross oversimplification, but to be honest I find the details of process management incredibly uninteresting in 2008.) I assumed that this difference would hold for the Python processing module. Apparently it doesn't, at least on Mac OS X. Surprise!

Spawning 100 children with Thread took 1.04s
Spawning 100 children with Process took 0.60s

The above result is for starting and joining the children serially. I get the same results in all of these variations:

  • Starting them all at once, then joining them all at once.
  • Using 10 children or 1,000 children.
  • Having each child sleep for one second (to ensure that they're all actually alive at the same time).

I don't know whether this is due to goodness in OS X, or processing, or fork(), or just Unix in general. In any case, it's very good news. I'd dismissed processing for use on the client side of BitBacker because "process management is hard and they're too heavyweight." Clearly at least one of those complaints is invalid; maybe the other is as well. It would be a wonderful relief if I could use processes. I'm going to need parallelization of one form or another soon, and I'm definitely not going to start sprinkling threads around. Only madness lies down that path.

Here's the code that generated those results, in case you're interested:

import time, threading, processing
for cls in [threading.Thread, processing.Process]:
    start = time.time()
    for _ in range(100):
        child = cls(target=lambda: None)
        child.start()
        child.join()
    print 'Spawning 100 children with %s took %.2fs' % (
        cls.__name__, time.time() - start)

Showing 24 comments

Posted by Ed Page at Fri May 30 12:18:22 2008
In theory, spawning threads and processes on linux is pretty much the same.  The only difference between starting a thread and a process is flipping a bit to say that the page table needs to be copied on write. 

A potential reason for threads to be slower is some potential GIL use around the spawning of threads.Â

Posted by Christian Wyglendowski at Fri May 30 12:31:04 2008
On my Ubuntu system threads beat processes:

Linux yga-dowski 2.6.22-14-generic #1 SMP Tue Feb 12 07:42:25 UTC 2008 i686 GNU/Linux

Spawning 100 children with Thread took 0.02s
Spawning 100 children with Process took 0.13s


Posted by Longabow at Fri May 30 12:45:47 2008
Output on windows:

Spawning 100 children with Thread took 0.04s
Spawning 100 children with Process took 11.50s


Posted by Gary Bernhardt at Fri May 30 12:54:15 2008
The slowness of processes on Windows is unsurprising; processing has to spawn instead of fork there, which is much slower.

I apologize for my blog's screwy comments.  I need to find a plugin that supports markdown or something.

Posted by Marcus Cavanaugh at Fri May 30 13:43:42 2008
On OS X 10.5.3:

Spawning 100 children with Thread took 1.05s
Spawning 100 children with Process took 0.34s


Posted by Jesse Noller at Fri May 30 13:57:58 2008
Yes, the windows results are unsurprising. Note the the thread-creation times are annoying. I don't know if you know I've been pushing PEP 371 (for the processing module inclusion into the stdlib) - let me know if you do want help putting this into your app, or if you run into problems.

Posted by Joost at Fri May 30 14:21:07 2008
A reason for processes sometimes being faster than threads might be that more time has been spent on optimizing processes than threads on operating systems.

Posted by Gary Bernhardt at Fri May 30 14:47:51 2008
Jesse, PEP 371 was indirectly responsible for this post.  Someone posted to a list I'm on saying "processing might be added to the standard library; should we switch from threads to that?"  I was about to complain about process startup time, but I wanted to verify my claims before I made them.  That's where this benchmark came from.  Fortunately, it probably stopped me from looking foolish. :)

Thanks for the offer of help.  Hopefully I won't need to take you up on it, but it's much appreciated. :)

Posted by Andrew at Fri May 30 15:37:57 2008
Just a hunch, but does Mac OS X inherit this trait from FreeBSD? Anyone care to try this on FreeBSD 7.0?

Posted by Carl at Fri May 30 16:06:48 2008
I've got some values for FreeBSD.

> uname -rs
FreeBSD 7.0-STABLE
> python test.py
Spawning 100 children with Thread took 0.20s
Spawning 100 children with Process took 0.24s

Posted by Gary Bernhardt at Fri May 30 16:50:30 2008
One of our servers shows similar results to OS X:

$ uname -a
Linux hermes 2.6.16-xenU #1 SMP Sat Aug 5 20:27:10 EDT 2006 x86_64 GNU/Linux
$ python temp.py
Spawning 100 children with Thread took 0.99s
Spawning 100 children with Process took 0.38s

These are very different from Christian's results, but he's using a significantly newer kernel and we didn't control for Python version.

Posted by Finn at Fri May 30 17:01:17 2008
On my Phenom x4 Kubuntu system processing beat threading:

Linux u9ppp 2.6.24-16-generic #1 SMP Thu Apr 10 12:47:45 UTC 2008 x86_64 GNU/Linux

Spawning 100 children with Thread took 0.40s
Spawning 100 children with Process took 0.15s

processing uses all the cores - in my case 4 - whereas threading do not. It also explains the difference in time - with two cores the time should be about half as shown by your quite elegant test program.

Posted by jjgod at Fri May 30 17:21:33 2008
On my Mac OS X 10.5.3 box:

Darwin epicalyx.local 9.3.0 Darwin Kernel Version 9.3.0: Fri May 23 00:49:16 PDT 2008; root:xnu-1228.5.18~1/RELEASE_I386 i386

Spawning 100 children with Thread took 0.02s
Spawning 100 children with Process took 0.38s

Posted by dude at Fri May 30 18:43:28 2008
Uh you should do this in C, not python. Python threads are crippled, performance-wise to one processor (where the interpreter is running because of the global lock). However processes are free to spawn on new processors, so if you got 2 processors, do the math. That's why it's recommended that you use processes and IPC if you are writing a parallel algorithm that will benefit from running on multiprocessors.

Posted by dubwise at Fri May 30 19:14:08 2008
dude: you are correct... python processes are limited to one processor, even with multiple python threads. However, I think calling them crippled is a bit misleading.

It all comes down to what you're using python threads for...

It is a problem if you're using your threads for a cpu intensive task. In this case, you'll have to share a single processor instead of using the full machine.

It is not a problem if you're not cpu bound--for example using python threads for IO where you spend much of your time waiting.

Posted by Faulkner at Fri May 30 21:15:42 2008
OSX 10.5.3

2.4Ghz

Spawning 100 children with Thread took 0.02s
Spawning 100 children with Process took 0.33s

And that is very very consistent over a number of runs.

Posted by me at Sat May 31 01:17:36 2008
(OSX 10.5.3, 1 GHz PowerPC)

uname -a
Darwin reddwarf 9.3.0 Darwin Kernel Version 9.3.0: Fri May 23 00:51:20 PDT 2008; root:xnu-1228.5.18~1/RELEASE_PPC Power Macintosh

./test.py
Spawning 100 children with Thread took 0.06s
Spawning 100 children with Process took 1.46s

Posted by kRYPT at Sat May 31 08:53:42 2008
krypt@ubuntu:~$ python2.5 test.py
Spawning 100 children with Thread took 0.40s
Spawning 100 children with Process took 0.12s

This is a Quad-core machine, so it looks like what "dude" says above is right.. processes are getting their own CPUs, Threads are not.

Note: I had to install python-processing from http://ubuntu.ynet.sk/ubuntu/pool/universe/p/python-processing/ to get the code snippet to run on Ubuntu Hardy.

Posted by Dezro at Sun Jun 1 11:31:14 2008
On OSX 10.5.3, 2GHz Core Duo, Python 2.5.1 (system default)
Darwin Gordo 9.3.0 Darwin Kernel Version 9.3.0: Fri May 23 00:49:16 PDT 2008; root:xnu-1228.5.18~1/RELEASE_I386 i386

Spawning 100 children with Thread took 0.02s
Spawning 100 children with Process took 0.45s

It's kind of bugging me that it's different for you and Marcus Cavanaugh up there than it is for jjgod, Faulkner, "me", and I.

Are you guys using python2.3 or 2.6 py3k or something?

Posted by Chris at Mon Jun 2 18:07:35 2008
This is on a 2.16GHz Core 2 Duo iMac running 10.5.3:

First run:

Spawning 100 children with Thread took 1.02s
Spawning 100 children with Process took 1.20s

Subsequent runs resulted in:

Spawning 100 children with Thread took 1.02s
Spawning 100 children with Process took 1.16s

Python is 2.5.2, installed from macports.

Posted by İsmail Dönmez at Tue Jun 3 02:09:55 2008
Good results on Ubuntu 8.04:

Linux ninbuntu 2.6.24-17-generic #1 SMP Thu May 1 13:57:17 UTC 2008 x86_64 GNU/Linux

Spawning 100 children with Thread took 0.40s
Spawning 100 children with Process took 0.14s

Posted by Gary Bernhardt at Tue Jun 3 12:18:49 2008
Dezro, I agree... the difference in our results is puzzling.  I'm running the framework build of Python 2.5.1, downloaded from python.org.  Maybe Marcus is also running a downloaded build, rather than Apple's preinstalled copy.  It's definitely not an OS version thing - he's running Leopard and I'm running Tiger and we get the same results.

Posted by Marcus Cavanaugh at Mon Jun 30 20:45:50 2008
Delayed response, I know. Anyway, I'm running 2.5, I think standard from Apple:

Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin

Posted by dazvenginzks at Sat Aug 2 23:32:57 2008
I've recently joined and wanted to introduce myself :)

Name:


E-mail:


URL:


Comment: