May
30
Processes spawn faster than threads?
In general, processes take longer to start than threads. This makes sense
if you think about it - a thread lives within the memory space of its parent
process, so it takes less work to set one up. (This is a gross
oversimplification, but to be honest I find the details of process management
incredibly uninteresting in 2008.) I assumed that this difference would hold
for the Python processing module. Apparently it doesn't, at least on Mac OS
X. Surprise!
Spawning 100 children with Thread took 1.04s
Spawning 100 children with Process took 0.60s
The above result is for starting and joining the children serially. I get
the same results in all of these variations:
- Starting them all at once, then joining them all at once.
- Using 10 children or 1,000 children.
- Having each child sleep for one second (to ensure that they're all
actually alive at the same time).
I don't know whether this is due to goodness in OS X, or processing, or
fork(), or just Unix in general. In any case, it's very good news. I'd
dismissed processing for use on the client side of BitBacker because "process management is
hard and they're too heavyweight." Clearly at least one of those complaints
is invalid; maybe the other is as well. It would be a wonderful relief if I
could use processes. I'm going to need parallelization of one form or another
soon, and I'm definitely not going to start sprinkling threads around. Only
madness lies down that path.
Here's the code that generated those results, in case you're
interested:
import time, threading, processing
for cls in [threading.Thread, processing.Process]:
start = time.time()
for _ in range(100):
child = cls(target=lambda: None)
child.start()
child.join()
print 'Spawning 100 children with %s took %.2fs' % (
cls.__name__, time.time() - start)
May
28
Add me on twitter!
I've been twittering for a while, so I guess it's safe to link to my
Twitter from here. And since I'm doing that, I may as well do a few
others:
These are the only social sites I use regularly. You should friend me on
them!
Apr
11
Shell Meme Wins
My most common shell commands:
171 hg
144 fg
77 rm
71 ls
38 cd
28 vi
24 nosetests
17 killall
15 tissue
15 python
I tend to keep Vim open for a long
time, running many commands from within it. That's why I don't have lots of
task switching like Mike. I learned Emacs
before Vim, so blame it on that. I also usually run tests from within Vim;
otherwise nosetests would definitely be #1.
Tissue is a ticketing system I've been working on. It's super simple - the
whole ticket database is stored in a single plaintext file. The idea is to
fit in with DVCSes like Mercurial better. Having a single monolithic Trac
instance breaks down when you have dozens of repositories, each of which may
have certain tickets fixed or not. By storing the ticket database in a
plaintext file within the repository, you get (1) explicit ties between code
fixes and ticket changes, and (2) free merging of modified tickets when the
corresponding code merge happens. I'm about to switch BitBacker from Trac to this, so it will
hopefully get released some time.
Dec
28
Human-Readable Encryption Keys
For BitBacker, we use 128-bit AES
encryption, which means our keys are really long and annoying - 32 characters
long when printed in hex. And not only do the users sometimes have to type
them in, but they have to write them down on paper. (We can't store the key on
our servers because then we'd be able to read the user's files; and we
obviously can't trust it to their hard drive because that's what we're backing
up.)
Somehow, we have to present these random 128-bit keys to the user, and I
think I've found a pretty good way. We use RFC 1751, which defines a
"Convention for Human-Readable 128-bit Keys" - basically just a mapping of
blocks of bits to strings of English words. Here's an example in Python using
the RFC 1751 module in PyCrypto:
>>> key = os.urandom(16) # Generate 16 random bytes (128 bits)
>>> bin_to_hex(key) # Show the key in hex (32 characters)
'61aa60e43a5e7fdb4b86a4897b52a0dc'
>>> y = RFC1751.key_to_english(key)
>>> y # Show the pass phrase version of the key
'BUSY BARN RUB DOLE TAUT TOOK ALTO PRY KIT WALL MUG CURT'
>>> # The transformation is always reversible
>>> bin_to_hex(RFC1751.english_to_key(y))
'61aa60e43a5e7fdb4b86a4897b52a0dc'
The keys are still *very* long, of course, and this is unavoidable for our
application. But when translated to words, I think it's easier to write them
down or type them in without making a mistake. The image below shows
BitBacker giving me a pass phrase. (This feature hasn't even gone into beta
yet - it's little more than a mockup. So please don't judge it too
harshly!)
When the user clicks "Continue" here, BitBacker actually makes him re-enter
the generated pass phrase he wrote down. To be honest, BitBacker's pass
phrase handling is quite annoying. But that's a heck of a lot better than
losing your pass phrase, which would make your backups inaccessible! This is
the one place in all of BitBacker that isn't optimized for "least user
annoyance". Encryption keys are just way too important to mess around
with, and I think that most existing software is far too lax with them
(including BitBacker's competitors).
(This was derived from a comment I left on Jeff Atwood's "Software
Registration Keys" post.)
Dec
22
My blog woes have been soothed
It seems I've mostly solved my "blog woes".
I got some quite helpful replies (still visible on Blogger,
although the comments didn't come over to my new blog). I also got emails
from Will Guaraldi about PyBlosxom, and from Lloyd Dalton about blog_my.
I took at least a brief look at each system mentioned in the comments and
emails, but I decided on PyBlosxom. If you're reading this in a web browser,
what you're seeing is PyBlosxom rendering a theme I ported from Tumblr, with
all of my old Blogger blog's content imported. Quite frankensteinian indeed,
as far as blogs go.
It turns out that my impression of PyBlosxom's size when I wrote my "blog
woes" post was a bit off - I didn't realize just how little functionality
resides in the core. It's pretty slim, but with a decent selection of
plugins. I only needed tags,
wbgarchives,
and metadate,
but there are plenty more for those who want more features. With the tag and
metadate plugins, I managed to keep my blog posts in almost exactly the format
I've always used, so that was nice.
PyBlosxom nicely solves my biggest concern, which I didn't explicitly state
in my original post: I want to keep all of the files related to my blog in a
Mercurial repository. I've
succeeded in that - my entire blog is in Mercurial now. That includes
configuration files, the .htaccess file, the template, the entries, and even
the queue of unfinished entries. If I ever need to, I should be able to move
the blog to another host in a matter of minutes. Not that I ever intend to
leave WebFaction
(note: that's an affiliate link), which is where it's happily hosted
now.
With that all out of the way, hopefully I can quit the detestable practice
of metablogging, which I'd managed to avoid for my entire first year. Thanks
to everyone who made a suggestion, and special thanks to the PyBlosxom
developers.