Extra Cheese

A Blog


Test Double Injection Inversion

Jan 21, 2010

In Dependency Injection Inversion, Uncle Bob wonderfully explains the difference between Dependency Injection and Dependency Injection Frameworks, a topic I've done a lot of thinking about recently. You should go read his post right now if you haven't yet.

At the end, he provides the test code below as an example of testing some dependency-injected Java code:

public class BillingServiceTest {
  private LogSpy log;

  @Before
  public void setup() {
    log = new LogSpy();
  }

  @Test
  public void approval() throws Exception {
    BillingService bs = new BillingService(new Approver(), log);
    bs.processCharge(9000, "Bob");
    assertEquals("Transaction by Bob for 9000 approved",
                 log.getLogged());
  }

  @Test
  public void denial() throws Exception {
    BillingService bs = new BillingService(new Denier(), log);
    bs.processCharge(9000, "Bob");
    assertEquals("Transaction by Bob for 9000 denied",
                 log.getLogged());
  }
}

class Approver implements CreditCardProcessor {
  public boolean approve(int amount, String id) {
    return true;
  }
}

class Denier implements CreditCardProcessor {
  public boolean approve(int amount, String id) {
    return false;
  }
}

class LogSpy implements TransactionLog {
  private String logged;

  public void log(String s) {
    logged = s;
  }

  public String getLogged() {
    return logged;
  }
}

It's perfectly fine Java code, and it wonderfully demonstrates the power of injection. After the code, Uncle bob says:

It would have been tragic to use a mocking framework for such a simple set of tests.

In Java, I agree completely. In a more modern language, I disagree completely! I've translated his example to Python using my Dingus test double library to illustrate the simplicity that doubles can provide:

class BillingServiceTest:
    def setup(self):
        self.log = Dingus()

    def test_approval(self):
        approver = Dingus(approve__returns=True)
        bs = BillingService(approver, self.log)
        bs.process_charge(9000, 'Bob')
        assert self.log.calls(
            'log',
            'Transaction by Bob for 9000 approved').once()

    def test_denial(self):
        denier = Dingus(approve__returns=False)
        bs = BillingService(approver, self.log)
        bs.process_charge(9000, 'Bob')
        assert self.log.calls(
            'log',
            'Transaction by Bob for 9000 denied').once()

In a real system, I'd factor these tests slightly differently; I've left them as close to Bob's as possible. This is 13 ELOC vs. Bob's 38 – only about a third as much code! Some of the difference is in his testing library's ceremony, but most of it is in his test doubles. For example, he says:

class Approver implements CreditCardProcessor {
  public boolean approve(int amount, String id) {
    return true;
  }
}

That is a lot of code! All it really says is "the approve method always returns true", with the rest being a complex dance around Java's rigidity. This is a liability for programmers working in such languages, as well as a learning barrier for new testers. In my Python version, the following takes the place of the Approver class, as well as its instantiation:

approver = Dingus(approve__returns=True)

That line of code is so close to "the approve method always returns true" that I can't imagine it being any clearer. Of course, if the magic double underscores turn you off, you can also say:

approver = Dingus(approve=returner(True))

Digression

I'd love to hear what you think about those two alternate forms. I want to deprecate one, but I don't know which.

I fear that statements like Uncle Bob's about test doubles may lead newer programmers, and static-only programmers, astray. His advice is wonderful, but only in certain domains. Like so many things in software, doubling is far easier when the shackles of Javaesque type systems are removed. And, if you worry that the complexity is simply moved into the test double library, fear not: Dingus is currently 193 ELOC long, including plenty of features not mentioned here!



Functional and Non-Functional Testing

Jan 21, 2010

Different types of complexity interact with test suites in different ways. Consider BitBacker, an online backup product that I worked on for three years. It had very few functional requirements. At its core, it only had to let the user choose files, back them up, and restore them. Almost all of the complexity was non-functional: it had to look good and be easy to use, of course, but it also had to be fast and secure. I spent most of my development effort on the "fast" and "secure" parts.

(Note that when I say "functional" here, I'm talking about requirements for a system's behavior. This has nothing to do with functional programming.)

In this type of app, where there are so few functional requirements, functional test fragility is less of a problem. In a recent discussion with Jonathan Penn, I mentioned that the backup and restore functionality were tested at the unit, subsystem, and full-stack levels: three different levels of tests, all testing the same thing. He asked me whether this made refactoring difficult. It didn't.

BitBacker's functional requirements were never going to change. When the user backed up and then restored files, they had to be identical to the originals. That's all. It took 17,000 lines of code to make that happen efficiently and securely, but the surface area of the user-facing problem is tiny.

I didn't know this at the time. If I was building a business app instead of a backup system, I probably would've ended up with a similar test suite, and in that situation it would've been a burden. Fortunately, I got lucky, and I've learned this lesson by retrospecting about my luck rather than retrospecting about some pain that I felt.

What about automated non-functional tests? The topic is murky in general, and I only know how to test small subsets of the non-functional requirement space. I don't know how to automate testing of user experience, for example.

I have done automated performance testing, however. At one point, I wrote tests for BitBacker that ran backups across a wide range of file counts and asserted that the backup time grew linearly with the number of files. That's clearly a non-functional test, but how fragile is it?

It's very fragile, of course, unless you run it on massive file counts that would've taken far too long for my patience at the time. I left the file counts small, so it ran fast but broke constantly, which eventually led me to remove it.

I replaced the test with a system that could kick off various predefined processes ("do an empty backup", "back up 1,000 files", etc.), graphing the runtimes and memory footprints across revisions in version control. One look at those performance graphs would show whether, and where, there was a problem. This gave me a different kind of feedback: instead of defining "success" and "failure", it would alert me to a change, which I could then investigate on my own.

I suspect that this is a fundamental property of non-functional testing. Trying to fully automate it and boil it down to a set of pass/fail assertions, while sometimes possible, seems prone to fragility. It may be that non-functional testing is best achieved by dashboard apps, like my performance-over-revisions graph, or an app that renders every page in a user flow automatically and highlights recent changes in appearance.



Refactoring A Cyclomatic Complexity Script

Nov 16, 2009

I recently blogged about the cyclomatic complexity script I wrote that highlights Python code based on its complexity. The script itself is written in Python, but the code was rushed together. I decided to do some cleanup, recording the process as an example of refactoring.

You can see a lossless copy of the video by logging into Vimeo and clicking "Download Quicktime version" at the bottom right of the page. I recommend it; MPEG artifacting is no fun.

Refactoring A Cyclomatic Complexity Module from Gary Bernhardt on Vimeo.

I make one notable mistake – I leave an unneeded reference to "code_or_node" lying around. Just imagine that I delete it at the very end. :)



Are your tests lying to you?

Apr 07, 2007

If you've written a test for a module, and the module is changed in the future, there are three things that can happen:

  1. The test keeps passing because nothing is broken. (Good.)
  2. The test fails because something is wrong. (Great - this is the test's job!)
  3. The test keeps passing, but it silently stops testing the thing it claims to (BAD, BAD, BAD!).

Scenario 3 above is very dangerous, and it's a major problem in testing. What you have in that situation is a lying test: it says "I'm testing feature x," but actually passes without doing so. In other words, you have a test that no longer warns you if you break something.

If you've not been bitten by this, it might not be an obvious problem. To make it a little more clear, let's look at a toy example (in Python, of course!) Here's a silly WebClient class and its test.

class WebClient:
    """An HTTP client that supports both SSL and plain connections"""
    def __init__(self):
        self.use_ssl = False

    def get(self, url):
        # Hand any request off to external functions
        if self.use_ssl:
            return get_with_ssl(url)
        else:
            return get_without_ssl(url)

def test_web_client():
    # Make sure everything works with normal HTTP
    client = WebClient()
    assert client.get('/') == expected_data #defined elsewhere

    # Make sure everything works with SSL as well
    client.use_ssl = True
    assert client.get('/') == expected_data

This works fine - the test passes and it tests what it claims to. But what happens if someone renames the use_ssl attribute later?

class WebClient:
    def __init__(self):
        self.using_ssl = False

    def get(self, url):
        # Hand any request off to external functions
        if self.using_ssl:
            return get_with_ssl(url)
        else:
            return get_without_ssl(url)

Take a look back at the test. It's no longer testing what it claims to, because "use_ssl" no longer means anything to WebClient. The test still passes, though - it's just that neither of the two get() calls actually uses SSL.

This is a serious problem - you need to be able to trust your tests, but for all you know your tests are giving you false positives. The question, then, is how can we detect this type of mistake? Well, there is a simple method that will catch at least some of them. What you need is a meta-test: a test that ensures that the tests aren't lying to you. It's really not that bad; here's the pseudocode:

for each test in the suite:
    for each line of code that isn't an assertion:
        remove that line of code (but not the rest)
        run the test and make sure that it fails

Basically, this meta-test is ensuring that every line of code in the test is required: removing any line should cause the test to fail. This sounds complicated, but it only has to be implemented once. Once it exists as a nose plugin, for example, you can use it without writing any extra code.

Let's look at how this would affect the example. Here's the testing code again:

def test_web_client():
    # Make sure everything works with normal HTTP
    client = WebClient()
    assert client.get('/') == expected_data #defined elsewhere

    # Make sure everything works with SSL as well
    client.use_ssl = True
    assert client.get('/') == expected_data

The meta-test will step through, removing each relevant line and making sure that the test fails. The only executable lines that aren't assertions are 3 and 7. When it removes line 3, the test will fail because "client" won't be defined. So that iteration of the meta-test passes. When it removes line 7, the test will still pass. Because the test passes with a line removed, the meta-test will fail. The meta-test has detected the fact that line 7 isn't necessary, which is a red flag that says "this test might lie to you later!"

It's important to note that the meta-test will fail even when the test is working. It really is a meta-test: it's only testing the test. This is a good thing. It tells you when you've written a crappy test - a test that isn't paying enough attention.

Let's return to the example and try to fix it. To make the meta-test pass again, the test could be changed to be more sensitive to WebClient's state:

def test_web_client():
    # Make sure everything works with normal HTTP
    client = WebClient()
    assert client.get('/') == expected_data #defined elsewhere
    assert client.use_ssl == False

    # Make sure everything works with SSL as well
    client.use_ssl = True
    assert client.get('/') == expected_data
    assert client.use_ssl == True

Now the meta-test passes, and the original test_web_client is more resilient to silent failures. If someone renames WebClient's use_ssl attribute, the test won't silently stop testing like it did before. Instead, line 5 will raise an exception and the test will fail.

Of course, this isn't foolproof. If you added line 10 but not line 5, you wouldn't be doing yourself any good (figuring out why is left as an exercise for the reader :). The meta-test would still pass, though, and you would still have a test that may lie to you in the future. So this meta-testing method isn't a magic bullet that will force you to write good tests. For a careful tester, though, it throws up a red flag for tests that might be susceptible to very subtle errors.

(Nitpicker's corner: Yes, the problem in this test was caused by questionable design in WebClient itself. Using an instance variable to control a class's behavior in this way is error-prone to begin with. This testing problem also arises in much more subtle situations, though; I have the scars to prove it.)