Extra Cheese

Me: Gary Bernhardt

Email: gary.bernhardt at gmail

Work: BitBacker

RSS Archive: 2008
2007
2006
May 30

Processes spawn faster than threads?

In general, processes take longer to start than threads. This makes sense if you think about it - a thread lives within the memory space of its parent process, so it takes less work to set one up. (This is a gross oversimplification, but to be honest I find the details of process management incredibly uninteresting in 2008.) I assumed that this difference would hold for the Python processing module. Apparently it doesn't, at least on Mac OS X. Surprise!

Spawning 100 children with Thread took 1.04s
Spawning 100 children with Process took 0.60s

The above result is for starting and joining the children serially. I get the same results in all of these variations:

  • Starting them all at once, then joining them all at once.
  • Using 10 children or 1,000 children.
  • Having each child sleep for one second (to ensure that they're all actually alive at the same time).

I don't know whether this is due to goodness in OS X, or processing, or fork(), or just Unix in general. In any case, it's very good news. I'd dismissed processing for use on the client side of BitBacker because "process management is hard and they're too heavyweight." Clearly at least one of those complaints is invalid; maybe the other is as well. It would be a wonderful relief if I could use processes. I'm going to need parallelization of one form or another soon, and I'm definitely not going to start sprinkling threads around. Only madness lies down that path.

Here's the code that generated those results, in case you're interested:

import time, threading, processing
for cls in [threading.Thread, processing.Process]:
    start = time.time()
    for _ in range(100):
        child = cls(target=lambda: None)
        child.start()
        child.join()
    print 'Spawning 100 children with %s took %.2fs' % (
        cls.__name__, time.time() - start)

May 28

Add me on twitter!

I've been twittering for a while, so I guess it's safe to link to my Twitter from here. And since I'm doing that, I may as well do a few others:

These are the only social sites I use regularly. You should friend me on them!


Apr 11

Shell Meme Wins

My most common shell commands:

171 hg
144 fg
77 rm
71 ls
38 cd
28 vi
24 nosetests
17 killall
15 tissue
15 python

I tend to keep Vim open for a long time, running many commands from within it. That's why I don't have lots of task switching like Mike. I learned Emacs before Vim, so blame it on that. I also usually run tests from within Vim; otherwise nosetests would definitely be #1.

Tissue is a ticketing system I've been working on. It's super simple - the whole ticket database is stored in a single plaintext file. The idea is to fit in with DVCSes like Mercurial better. Having a single monolithic Trac instance breaks down when you have dozens of repositories, each of which may have certain tickets fixed or not. By storing the ticket database in a plaintext file within the repository, you get (1) explicit ties between code fixes and ticket changes, and (2) free merging of modified tickets when the corresponding code merge happens. I'm about to switch BitBacker from Trac to this, so it will hopefully get released some time.


Dec 28

Human-Readable Encryption Keys

For BitBacker, we use 128-bit AES encryption, which means our keys are really long and annoying - 32 characters long when printed in hex. And not only do the users sometimes have to type them in, but they have to write them down on paper. (We can't store the key on our servers because then we'd be able to read the user's files; and we obviously can't trust it to their hard drive because that's what we're backing up.)

Somehow, we have to present these random 128-bit keys to the user, and I think I've found a pretty good way. We use RFC 1751, which defines a "Convention for Human-Readable 128-bit Keys" - basically just a mapping of blocks of bits to strings of English words. Here's an example in Python using the RFC 1751 module in PyCrypto:

>>> key = os.urandom(16) # Generate 16 random bytes (128 bits)
>>> bin_to_hex(key) # Show the key in hex (32 characters)
'61aa60e43a5e7fdb4b86a4897b52a0dc'
>>> y = RFC1751.key_to_english(key)
>>> y # Show the pass phrase version of the key
'BUSY BARN RUB DOLE TAUT TOOK ALTO PRY KIT WALL MUG CURT'
>>> # The transformation is always reversible
>>> bin_to_hex(RFC1751.english_to_key(y))
'61aa60e43a5e7fdb4b86a4897b52a0dc'

The keys are still *very* long, of course, and this is unavoidable for our application. But when translated to words, I think it's easier to write them down or type them in without making a mistake. The image below shows BitBacker giving me a pass phrase. (This feature hasn't even gone into beta yet - it's little more than a mockup. So please don't judge it too harshly!)

Screen shot of BitBacker's pass phrase handling

When the user clicks "Continue" here, BitBacker actually makes him re-enter the generated pass phrase he wrote down. To be honest, BitBacker's pass phrase handling is quite annoying. But that's a heck of a lot better than losing your pass phrase, which would make your backups inaccessible! This is the one place in all of BitBacker that isn't optimized for "least user annoyance". Encryption keys are just way too important to mess around with, and I think that most existing software is far too lax with them (including BitBacker's competitors).

(This was derived from a comment I left on Jeff Atwood's "Software Registration Keys" post.)


Dec 22

My blog woes have been soothed

It seems I've mostly solved my "blog woes". I got some quite helpful replies (still visible on Blogger, although the comments didn't come over to my new blog). I also got emails from Will Guaraldi about PyBlosxom, and from Lloyd Dalton about blog_my.

I took at least a brief look at each system mentioned in the comments and emails, but I decided on PyBlosxom. If you're reading this in a web browser, what you're seeing is PyBlosxom rendering a theme I ported from Tumblr, with all of my old Blogger blog's content imported. Quite frankensteinian indeed, as far as blogs go.

It turns out that my impression of PyBlosxom's size when I wrote my "blog woes" post was a bit off - I didn't realize just how little functionality resides in the core. It's pretty slim, but with a decent selection of plugins. I only needed tags, wbgarchives, and metadate, but there are plenty more for those who want more features. With the tag and metadate plugins, I managed to keep my blog posts in almost exactly the format I've always used, so that was nice.

PyBlosxom nicely solves my biggest concern, which I didn't explicitly state in my original post: I want to keep all of the files related to my blog in a Mercurial repository. I've succeeded in that - my entire blog is in Mercurial now. That includes configuration files, the .htaccess file, the template, the entries, and even the queue of unfinished entries. If I ever need to, I should be able to move the blog to another host in a matter of minutes. Not that I ever intend to leave WebFaction (note: that's an affiliate link), which is where it's happily hosted now.

With that all out of the way, hopefully I can quit the detestable practice of metablogging, which I'd managed to avoid for my entire first year. Thanks to everyone who made a suggestion, and special thanks to the PyBlosxom developers.