Extra Cheese

A Blog


String Calculator Kata in Python

Jan 06, 2010

For those who aren't familiar, katas are those impressive sequences of movements that you've surely seen martial arts guys perform. Code katas are the same idea applied to writing code: you solve a problem many times, mastering the movements, and then perform it for others.

My friend Corey Haines has worked with some other people to run a Katacast site dedicated to code katas, posting them roughly once per week. Recently, there's been a string of solutions to the same problem, with my Python version as this week's entry.

Briefly, here's the problem I solve. I have numbers coming in as a string, separated by commas or newlines. My job is to add those numbers and return the sum. There are two complications:

  1. If the first line of the string is of the form "//*", then * is also a possible delimiter. This works for any string, not just *.
  2. Negative numbers must be rejected.

You can see the kata and read my brief commentary on the Katacasts site, or go straight to Vimeo to watch it. It's only 4:32 long, so it's not a big commitment.

String Calculator Kata In Python from Gary Bernhardt on Vimeo.

My screencasts always prompt questions about my Vim configuration, so take a look at my dotfiles repo if you're interested. For this kata, I'm slowing down intentionally – typing slower and inserting small, regular pauses so the viewer has time to look around a bit. I may post a "hard mode" version at full speed if people show interest.

Comments are encouraged, of course – the purpose of a kata is improvement!

If you like this, by the way, you may also enjoy the refactoring screencast I posted recently.



The Limits of TDD

Nov 10, 2009

My last post about TDD generated some great responses, some of which were skeptical. A few common complaints about TDD were brought up, and posed with civility, so I'd like to address them.

Complaint: You weren't stupid enough

When TDDing Fibonacci, we could get to a point where we have this function (and I did write exactly this code in my last post):

def fib(n):
        if n == 0:
            return 0
        else:
            return 1

But why should we write that? Why not this instead?

def fib(n):
    return [0, 1, 1][n]

This comes down to how we define "simple". In TDD, we make tests pass by making the simplest possible change. So, which of the above two is simpler?

Defining that word is our job; TDD as a process says nothing about it. The definition is a huge variable and, in my experience, it's the primary axis along which our skill as TDDers grows once we reach minimal competence. Note that we still have to define "simple" even if we're not doing TDD, but we won't have the test-driving pressure forcing the definition to be refined.

Regardless of how "simple" is defined, we must eventually accept that an arbitrarily long list is not the simplest thing. At that point, we refactor. Depending on the definition of simple, it may take seven tests to get to the final refactor instead of five. So what? Two TDDers need not generate the same tests, and this isn't a problem at all.

Complaint: TDDed tests are prescriptive

This is a complaint that TDDed code does exactly what the tests say it should do, so there might be bugs. If I write the wrong test, the reasoning goes, then it will drive me to write the wrong code.

When would we write the wrong test? Only when we misunderstand the problem. If we misunderstand the problem, and we go straight to the code, then we're be encoding our incorrect understanding directly in the code. That's bad. By writing the tests first, we have some extra protection against misunderstanding: every assumption about what the system should do is encoded as a test, and each test has a good name.

Often, this will point out our confusion during the TDD process – we'll find that we want to write a test whose name contradicts another test's name. Even if we translate our misunderstanding into a bug, however, good test names make it easy to revisit our assumptions later. A subtle, five-character change to the code may have been driven by a sixty-character test name, which will be easier to understand.

Complaint: Choosing tests is hard

When TDDing Fibonacci, I tested fib(0) first. Why did I test fib(1) next instead of fib(37) or fib(51)?

Because it was obvious! The problem domain of a unit test is necessarily small, so it's usually clear what the next step is. If the next step isn't clear, it probably means that the unit under test is too large (making it hard to think about extending it for another case), or that we don't understand the problem well enough (making it hard to think about what the code should do at all). In either case, TDD has just helped us: it's either pointed out a bad design, which we should fix, or it's pointed a gap in our knowledge about the problem, in which case we should put the keyboard away and fill that gap.

Complaint: The code you TDDed was bad

The particular code I came up with in my last blog post was a slow, recursive Fibonacci solution. Two people mentioned this in the comments.

TDD doesn't solve problems like "my run time is superlinear" or "my database loads aren't eager enough." It's not supposed to solve those problems! TDD frees us to solve those hard problems well by (1) pushing us toward a good, decoupled design and (2) providing us with large, fast test suites.

Complaint: TDD requires too much typing

This one has the easiest answer of all: typing is not the bottleneck. Just think about it for a minute. Go back and look at how many lines of code you actually generated yesterday. How long would it take you to type it all in one long burst? A few minutes? Seriously, typing is not the bottleneck.

TDD is not magic

Let's recap:

  • Complaint: You weren't stupid enough
  • Response: There's more than one legitimate definition of "stupid".
  • Complaint: TDDed tests are prescriptive
  • Response: This is a feature. Stating our assumptions up front exposes misunderstandings.
  • Complaint: Choosing tests is hard
  • Response: This is also a feature. It tells us that our design is bad or that we don't understand the problem.
  • Complaint: The code you TDDed was bad!
  • Response: TDD does not free us from thinking. TDD is not magic.
  • Complaint: It's too much typing.
  • Response: Typing is not the bottleneck.

Many complaints about TDD are complaints that it doesn't solve some problem. These are not problems with TDD – it's not supposed to solve every problem!

Dynamic languages don't make coffee, continuous integration doesn't shine shoes, and TDD doesn't make code scale. It's simply the basis of a solid, disciplined process for building software – a beginning, not an end.



How I Started TDD

Nov 05, 2009

This story is about the first code I ever wrote with proper TDD. I'd been doing test-first for several months, but I didn't understand the design aspect. Fortunately, Corey Haines wanted to learn Python, and I wanted to learn TDD, so we paired up at a Coding Dojo. It went something like this.1

Corey: Let's write a test.
def test_fib_of_0_is_0():
        assert fib(0) == 0
1 test failed; 0 tests passed.
Corey: Now let's make it pass.
Me: Well, we could iterate...
Corey: Why?
Me: Because it's fibonacci...
Corey: The test says it returns zero!
Me: Oh. Well, OK.
def fib(n):
        return 0
1 test passed.
Corey: Let's write another test.
def test_fib_of_1_is_1():
        assert fib(1) == 1
1 test failed; 1 tests passed.
Corey: Now let's make it pass.
Me: OK, we need to recursively...

I stop myself. I know what this got me last time.

Me: We can check for which input we got.
Corey: We don't even need that.
def fib(n):
        return n
2 tests passed.
Corey: Let's write another test.
def test_fib_of_2_is_1():
        assert fib(2) == 1
1 test failed; 2 tests passed.
Corey: Now let's make it pass

I pause while I find the correct answer.

Me: Only the zero case is different.
def fib(n):
        if n == 0:
            return 0
        else:
            return 1
3 tests passed.

(I consider the implications of this. "Only the zero case is different." This is an inductive system, so it needs a basis case. Zero is only half of the basis case of a fibonacci sequence, but I never had to think about a basis case or recursion to write this code. The tests showed me what the code needed to do.)

Corey: Let's write another test.
def test_fib_of_3_is_2():
        assert fib(3) == 2
1 test failed; 3 tests passed.
Me: Another if?
Corey: Another if.
def fib(n):
        if n == 0:
            return 0
        elif n < 3:
            return 1
        else:
            return 2
4 tests passed.
Corey: Refactor!
Me: I don't know...

My brain hurts for a moment.

def fib(n):
        if n < 2:
            return n
        else:
            return n - 1
4 tests passed.

The full basis case is in place and we don't even need recursion yet. I'm surprised by how many cases we've written without needing recursion or iteration.

Corey: Another test.
def test_fib_of_4_is_3():
        assert fib(4) == 3
5 tests passed.
Me: It passed without changes. Is that OK?
Corey: Another test!
def test_fib_of_5_is_5():
        assert fib(5) == 5
1 test failed; 5 tests passed.

I think I can handle this now.

def fib(n):
        if n < 2:
            return n
        elif n == 5:
            return 5
        else:
            return n - 1
6 tests passed.
Corey: Refactor!
Me: Combine them into... recursion?
Corey: Combine them into recursion.
def fib(n):
        if n <= 1:
            return n
        else:
            return fib(n - 1) + fib(n - 2)
6 tests passed.

This isn't a perfect example of TDD, but that's not the point. The first thing you need to understand is the rough process: write the smallest failing test you can; then write the smallest code to make it pass; then refactor without changing behavior.

After getting this lesson from Corey, I went off and TDDed a couple thousand lines of code with almost no outside feedback. I was doing it very poorly, and often became frustrated, but in retrospect it was still the best code I'd ever written.

It takes years to learn how to do this well, and consistently, across a wide variety of situations. I've been doing it for two years, and I still have non-trivial problems, but I can almost always move forward confidently.

Building software without TDD was crushingly stressful, but I couldn't see it at the time. It was only shown to me when I started working one test at a time, one line of code at a time, with verification that the entire system is working in less than two seconds.

1 In reality, the Coding Dojo probably went only vaguely like this, and this isn't even the problem we solved, but that's not the point. This is what the first true TDD session always looks like.


My Personal Failures in Test Isolation

Oct 28, 2009

My position paper for SDTConf was about test doubles and my problems with refactoring around fully isolated tests.

Digression: Isolation

There are many colloquial definitions of "unit test". When I use the term, I'm almost always talking about a test that executes code in exactly one production class. If it collaborates, it collaborates only with test doubles like mocks, stubs, and fakes. Every test's world contains 30 or so lines of production code. If you've not heard of this, it probably sounds crazy. It's not.

J. B. Rainsberger found my position paper and responded to it. He quotes me:

In my TDD practice with test doubles I’ve found that, now that all code is 100% isolated, it’s almost impossible to refactor across classes with confidence unless I totally rewrite them.
J. B. replies, in part:

I interpret his comment as though that disappoints him. I invite Gary, and you, to consider an alternative interpretation:

  1. Rewriting classes, rather than refactoring them, shows good compliance with the Open/Closed Principle, which encourages me.
  2. Needing to refactor across multiple classes, as opposed to re-implementing everything behind a given interface, probably indicates a layering problem, which I’d expect to notice with duplication in the isolated tests. That encourages me, because I like it when my tests expose design flaws to me.

Abstraction Errors

J. B. is absolutely correct: this is about layers, and my failure to layer my software correctly. I've only been doing TDD for two or three years, and I still make non-trivial, multiple-class-spanning abstraction errors. The layering errors creep in across many tests and I just don't see them early enough. I can feel myself slowly getting better at this, but it's taking a while.

The problem is that I often notice the problem after it already exists, and my dilemma is "how do I fix it?" If I try to replace the hard dependency with an abstraction while avoiding a rewrite of the class, I lose confidence in my tests.

My concern is that that isolation doesn't work well unless you never make certain classes of errors. There's too much coupling in this skill set! It doesn't have a reasonable entry point; you have to be a relentless jerk – as I clearly am – to break in. I want to ease people into isolation without telling them "just wait five years and you'll be fine."

Vertical Changes in Semantics

I have a closely-related example that isn't a refactoring, but displays exactly this problem. I wrote Mote, a test runner for Python. The original design had the suite class collecting all of the tests, then handing them off to the result printer.

I wanted to replace this with a pure pull process, where the printer pulls one example at a time through the entire stack, with the goal being to output the cases as they're evaluated rather than all at once. But the "push" concept pervaded most of the core classes! A context, upon being created, would recursively create its child contexts and examples. A suite, upon being created, would create its contexts. The isolated tests made this change hard! I had to rewrite many core classes to maintain confidence, and I actually never finished because it sent me into a horrible death spiral of self-doubt about isolation!

Digression: Mote

Mote has many more problems than this, and I consider its internals one of my greatest TDD failures. This is sort of depressing, since I failed to effectively TDD a tool that I was writing to help me do TDD. I think I know why the design ended up so bad, but that's a topic for another blog post entirely.

I have a vague notion of the answer to this push-pull problem. The suite was directly instantiating contexts, which were directly instantiating examples. While I was writing it, I could feel that it was wrong. Usually, when that feeling crops up, I know how to improve the design accordingly. In this case, for some reason, I didn't, and I pushed forward because I wanted Mote to be in a working state for my own personal use. Now I have the same problem I've had with refactorings: I've introduced a suboptimal design and I have to improve it, but I'll end up rewriting 100% of the code to change 10%.

I don't need to be sold on isolation or abstraction; I'm already sold, as evidenced by having written a mock library that forcefully isolates your system under test. I'm not even looking for answers to refactoring and vertical-change problems: I know that these classes of problems grow out of a lack of design skills, I know which design skills those are, and I know how to improve them.

What I'm now looking for is how to grow these skills in a person from the ground up. What if there is a way to do those nasty, vertical refactors with higher confidence? Maybe not full confidence, but enough to prevent the horrible death spiral of self-doubt? As things stand, it's very hard to help other people learn these techniques and it just bothers me.



Dingus Screencast: A mock/stub library with automatic isolation

Apr 01, 2009

Dingus is a mocking/stubbing library I've been working on for about a year. It grew out of a now-defunct project's test suite, and I've used it in about 3,500 lines of unit test code. It does two things that are pretty novel:

  1. A dingus allows you to do almost anything to it, including nesting accesses arbitrarily deep. If you have a dingus d, you can say 99 * (d.foo.bar.baz() ** 'hello')[15] and you'll just get another dingus out. This lets you use dinguses to replace dependencies in legacy code without thinking about what interface they must conform to.
  2. If you want it to, Dingus can automatically replace your dependencies with dinguses. You just tell it what class you're testing, and it will replace everything else in that class's module with a Dingus. This fully isolates your code under test without requiring any work from you.

The second point above is probably a bit hard to think about from such a short description, so I've created a screencast to show it off. I TDD up a bit of code in the screencast, but it is not intended to be a good example of TDD or test design. I use a lot of stubbing and interaction assertions because I'm trying to show off Dingus' features. In real-world code, you should avoid assertions about object interaction as much as you can.

If you'd like to watch the screencast, it's available on Vimeo, but the version on the page is low quality. If you log in to Vimeo, there's a download link in the bottom right of the page. Otherwise, you can view the full size version from my blog's servers. (But please be gentle on my server's bandwidth!)