A blog by Gary Bernhardt, Creator & Destroyer of Software

Are your tests lying to you?

07 Apr 2007

If you've written a test for a module, and the module is changed in the future, there are three things that can happen:

  1. The test keeps passing because nothing is broken. (Good.)
  2. The test fails because something is wrong. (Great – this is the test's job!)
  3. The test keeps passing, but it silently stops testing the thing it claims to (BAD, BAD, BAD!).

Scenario 3 above is very dangerous, and it's a major problem in testing. What you have in that situation is a lying test: it says "I'm testing feature x," but actually passes without doing so. In other words, you have a test that no longer warns you if you break something.

If you've not been bitten by this, it might not be an obvious problem. To make it a little more clear, let's look at a toy example (in Python, of course!) Here's a silly WebClient class and its test.

class WebClient:
    """An HTTP client that supports both SSL and plain connections"""
    def __init__(self):
        self.use_ssl = False

    def get(self, url):
        # Hand any request off to external functions
        if self.use_ssl:
            return get_with_ssl(url)
        else:
            return get_without_ssl(url)

def test_web_client():
    # Make sure everything works with normal HTTP
    client = WebClient()
    assert client.get('/') == expected_data #defined elsewhere

    # Make sure everything works with SSL as well
    client.use_ssl = True
    assert client.get('/') == expected_data

This works fine – the test passes and it tests what it claims to. But what happens if someone renames the use_ssl attribute later?

class WebClient:
    def __init__(self):
        self.using_ssl = False

    def get(self, url):
        # Hand any request off to external functions
        if self.using_ssl:
            return get_with_ssl(url)
        else:
            return get_without_ssl(url)

Take a look back at the test. It's no longer testing what it claims to, because "use_ssl" no longer means anything to WebClient. The test still passes, though – it's just that neither of the two get() calls actually uses SSL.

This is a serious problem – you need to be able to trust your tests, but for all you know your tests are giving you false positives. The question, then, is how can we detect this type of mistake? Well, there is a simple method that will catch at least some of them. What you need is a meta-test: a test that ensures that the tests aren't lying to you. It's really not that bad; here's the pseudocode:

for each test in the suite:
    for each line of code that isn't an assertion:
        remove that line of code (but not the rest)
        run the test and make sure that it fails

Basically, this meta-test is ensuring that every line of code in the test is required: removing any line should cause the test to fail. This sounds complicated, but it only has to be implemented once. Once it exists as a nose plugin, for example, you can use it without writing any extra code.

Let's look at how this would affect the example. Here's the testing code again:

def test_web_client():
    # Make sure everything works with normal HTTP
    client = WebClient()
    assert client.get('/') == expected_data #defined elsewhere

    # Make sure everything works with SSL as well
    client.use_ssl = True
    assert client.get('/') == expected_data

The meta-test will step through, removing each relevant line and making sure that the test fails. The only executable lines that aren't assertions are 3 and 7. When it removes line 3, the test will fail because "client" won't be defined. So that iteration of the meta-test passes. When it removes line 7, the test will still pass. Because the test passes with a line removed, the meta-test will fail. The meta-test has detected the fact that line 7 isn't necessary, which is a red flag that says "this test might lie to you later!"

It's important to note that the meta-test will fail even when the test is working. It really is a meta-test: it's only testing the test. This is a good thing. It tells you when you've written a crappy test – a test that isn't paying enough attention.

Let's return to the example and try to fix it. To make the meta-test pass again, the test could be changed to be more sensitive to WebClient's state:

def test_web_client():
    # Make sure everything works with normal HTTP
    client = WebClient()
    assert client.get('/') == expected_data #defined elsewhere
    assert client.use_ssl == False

    # Make sure everything works with SSL as well
    client.use_ssl = True
    assert client.get('/') == expected_data
    assert client.use_ssl == True

Now the meta-test passes, and the original test_web_client is more resilient to silent failures. If someone renames WebClient's use_ssl attribute, the test won't silently stop testing like it did before. Instead, line 5 will raise an exception and the test will fail.

Of course, this isn't foolproof. If you added line 10 but not line 5, you wouldn't be doing yourself any good (figuring out why is left as an exercise for the reader :). The meta-test would still pass, though, and you would still have a test that may lie to you in the future. So this meta-testing method isn't a magic bullet that will force you to write good tests. For a careful tester, though, it throws up a red flag for tests that might be susceptible to very subtle errors.

(Nitpicker's corner: Yes, the problem in this test was caused by questionable design in WebClient itself. Using an instance variable to control a class's behavior in this way is error-prone to begin with. This testing problem also arises in much more subtle situations, though; I have the scars to prove it.)