Software Craftsmanship: Geographical Distribution

Jun 23, 2009

I often compute statistics from publicly available data, tell myself I'll blog about them, and then never do. For once, I'm actually doing it!

The graph below shows how many software craftsmanship manifesto signatories US states have relative to their total population. Only states with at least five signatories are included.

I'm about to move from Cleveland to Seattle – #4 to #1. Not bad!

Have a look at the source code if you'd like. (It was hacked together; please judge it gently.)

UPDATE: In the comments, Joel Helbling suggested a Google Charts map. Here's one with states colored from red to green for lowest to highest signatories per million residents (source):



Dingus Screencast: A mock/stub library with automatic isolation

Apr 01, 2009

Dingus is a mocking/stubbing library I've been working on for about a year. It grew out of a now-defunct project's test suite, and I've used it in about 3,500 lines of unit test code. It does two things that are pretty novel:

  1. A dingus allows you to do almost anything to it, including nesting accesses arbitrarily deep. If you have a dingus d, you can say 99 * (d.foo.bar.baz() ** 'hello')[15] and you'll just get another dingus out. This lets you use dinguses to replace dependencies in legacy code without thinking about what interface they must conform to.
  2. If you want it to, Dingus can automatically replace your dependencies with dinguses. You just tell it what class you're testing, and it will replace everything else in that class's module with a Dingus. This fully isolates your code under test without requiring any work from you.

The second point above is probably a bit hard to think about from such a short description, so I've created a screencast to show it off. I TDD up a bit of code in the screencast, but it is not intended to be a good example of TDD or test design. I use a lot of stubbing and interaction assertions because I'm trying to show off Dingus' features. In real-world code, you should avoid assertions about object interaction as much as you can.

If you'd like to watch the screencast, it's available on Vimeo, but the version on the page is low quality. If you log in to Vimeo, there's a download link in the bottom right of the page. Otherwise, you can view the full size version from my blog's servers. (But please be gentle on my server's bandwidth!)



Twitter Transposition: Version Control

Feb 25, 2009

This blog has been quiet lately, due at least partly to Twitter. In an effort to get it going again, I'd like to start posting some summaries of my Twitter output. This will not be a daily Twitter bridge or anything like that. These are hand-selected from my entire tweet history. This first batch is on version control.

All links have been inserted after the fact, but everything else remains unchanged and in roughly chronological order. Indented entries are continuations of the thoughts in their parents.

For more of my ranting, you can follow me on Twitter.

  • To everyone who lost power due to the storm: you are now advocates of distributed version control and distributed ticketing. ;)
  • Recommending git as an intro to DSCM is like recommending C++ as an intro to OO. Just thought I'd throw that out there...
    • Good OO *can* be done in C++, but it takes a lot of learning and is prone to error. The same goes for DSCM and Git. ;)
  • The way most people use version control is downright offensive. Trailing whitespace changes in diffs? Seriously? Grow some discipline.
  • Workflow using patch queues: Spike a feature (patch 1), then replace the spiked classes with TDDed ones (patches 2..n), then fold patches.
    • That workflow gives you: easy spike-to-TDD transition; everything nicely versioned; a single changeset at the end; no history rewriting.
  • Git was designed by insane space aliens. Whether this is a good or bad thing is a personal preference.
  • One day I will write an editor-VCS that stores all files as the list of vim commands originally used to create them. <0.95 ;)>
  • The funny thing about rebase, patch queues, and multiple heads: Once you truly understand one of them, you understand all of them.
  • Using a DVCS has made me worry a lot about repo size, which I shouldn't have to worry about. I never would've expected this problem.
  • Wish list: Fancy VCS: When I refactor a test, automatically check that the new one would've failed at the point where I originally TDDed it.
  • "This is more complex than OpenGL!" - @jleedev, about five words into my explanation of Mercurial patch queues.


Processes spawn faster than threads?

May 30, 2008

In general, processes take longer to start than threads. This makes sense if you think about it - a thread lives within the memory space of its parent process, so it takes less work to set one up. (This is a gross oversimplification, but to be honest I find the details of process management incredibly uninteresting in 2008.) I assumed that this difference would hold for the Python processing module. Apparently it doesn't, at least on Mac OS X. Surprise!

Spawning 100 children with Thread took 1.04s
Spawning 100 children with Process took 0.60s

The above result is for starting and joining the children serially. I get the same results in all of these variations:

  • Starting them all at once, then joining them all at once.
  • Using 10 children or 1,000 children.
  • Having each child sleep for one second (to ensure that they're all actually alive at the same time).

I don't know whether this is due to goodness in OS X, or processing, or fork(), or just Unix in general. In any case, it's very good news. I'd dismissed processing for use on the client side of BitBacker because "process management is hard and they're too heavyweight." Clearly at least one of those complaints is invalid; maybe the other is as well. It would be a wonderful relief if I could use processes. I'm going to need parallelization of one form or another soon, and I'm definitely not going to start sprinkling threads around. Only madness lies down that path.

Here's the code that generated those results, in case you're interested:

import time, threading, processing
for cls in [threading.Thread, processing.Process]:
    start = time.time()
    for _ in range(100):
        child = cls(target=lambda: None)
        child.start()
        child.join()
    print 'Spawning 100 children with %s took %.2fs' % (
        cls.__name__, time.time() - start)


Shell Meme Wins

Apr 11, 2008

My most common shell commands:

171 hg
144 fg
77 rm
71 ls
38 cd
28 vi
24 nosetests
17 killall
15 tissue
15 python

I tend to keep Vim open for a long time, running many commands from within it. That's why I don't have lots of task switching like Mike. I learned Emacs before Vim, so blame it on that. I also usually run tests from within Vim; otherwise nosetests would definitely be #1.

Tissue is a ticketing system I've been working on. It's super simple - the whole ticket database is stored in a single plaintext file. The idea is to fit in with DVCSes like Mercurial better. Having a single monolithic Trac instance breaks down when you have dozens of repositories, each of which may have certain tickets fixed or not. By storing the ticket database in a plaintext file within the repository, you get (1) explicit ties between code fixes and ticket changes, and (2) free merging of modified tickets when the corresponding code merge happens. I'm about to switch BitBacker from Trac to this, so it will hopefully get released some time.