Extra Cheese

A Blog


Functional and Non-Functional Testing

Jan 21, 2010

Different types of complexity interact with test suites in different ways. Consider BitBacker, an online backup product that I worked on for three years. It had very few functional requirements. At its core, it only had to let the user choose files, back them up, and restore them. Almost all of the complexity was non-functional: it had to look good and be easy to use, of course, but it also had to be fast and secure. I spent most of my development effort on the "fast" and "secure" parts.

(Note that when I say "functional" here, I'm talking about requirements for a system's behavior. This has nothing to do with functional programming.)

In this type of app, where there are so few functional requirements, functional test fragility is less of a problem. In a recent discussion with Jonathan Penn, I mentioned that the backup and restore functionality were tested at the unit, subsystem, and full-stack levels: three different levels of tests, all testing the same thing. He asked me whether this made refactoring difficult. It didn't.

BitBacker's functional requirements were never going to change. When the user backed up and then restored files, they had to be identical to the originals. That's all. It took 17,000 lines of code to make that happen efficiently and securely, but the surface area of the user-facing problem is tiny.

I didn't know this at the time. If I was building a business app instead of a backup system, I probably would've ended up with a similar test suite, and in that situation it would've been a burden. Fortunately, I got lucky, and I've learned this lesson by retrospecting about my luck rather than retrospecting about some pain that I felt.

What about automated non-functional tests? The topic is murky in general, and I only know how to test small subsets of the non-functional requirement space. I don't know how to automate testing of user experience, for example.

I have done automated performance testing, however. At one point, I wrote tests for BitBacker that ran backups across a wide range of file counts and asserted that the backup time grew linearly with the number of files. That's clearly a non-functional test, but how fragile is it?

It's very fragile, of course, unless you run it on massive file counts that would've taken far too long for my patience at the time. I left the file counts small, so it ran fast but broke constantly, which eventually led me to remove it.

I replaced the test with a system that could kick off various predefined processes ("do an empty backup", "back up 1,000 files", etc.), graphing the runtimes and memory footprints across revisions in version control. One look at those performance graphs would show whether, and where, there was a problem. This gave me a different kind of feedback: instead of defining "success" and "failure", it would alert me to a change, which I could then investigate on my own.

I suspect that this is a fundamental property of non-functional testing. Trying to fully automate it and boil it down to a set of pass/fail assertions, while sometimes possible, seems prone to fragility. It may be that non-functional testing is best achieved by dashboard apps, like my performance-over-revisions graph, or an app that renders every page in a user flow automatically and highlights recent changes in appearance.



Test Double Injection Inversion

Jan 21, 2010

In Dependency Injection Inversion, Uncle Bob wonderfully explains the difference between Dependency Injection and Dependency Injection Frameworks, a topic I've done a lot of thinking about recently. You should go read his post right now if you haven't yet.

At the end, he provides the test code below as an example of testing some dependency-injected Java code:

public class BillingServiceTest {
  private LogSpy log;

  @Before
  public void setup() {
    log = new LogSpy();
  }

  @Test
  public void approval() throws Exception {
    BillingService bs = new BillingService(new Approver(), log);
    bs.processCharge(9000, "Bob");
    assertEquals("Transaction by Bob for 9000 approved",
                 log.getLogged());
  }

  @Test
  public void denial() throws Exception {
    BillingService bs = new BillingService(new Denier(), log);
    bs.processCharge(9000, "Bob");
    assertEquals("Transaction by Bob for 9000 denied",
                 log.getLogged());
  }
}

class Approver implements CreditCardProcessor {
  public boolean approve(int amount, String id) {
    return true;
  }
}

class Denier implements CreditCardProcessor {
  public boolean approve(int amount, String id) {
    return false;
  }
}

class LogSpy implements TransactionLog {
  private String logged;

  public void log(String s) {
    logged = s;
  }

  public String getLogged() {
    return logged;
  }
}

It's perfectly fine Java code, and it wonderfully demonstrates the power of injection. After the code, Uncle bob says:

It would have been tragic to use a mocking framework for such a simple set of tests.

In Java, I agree completely. In a more modern language, I disagree completely! I've translated his example to Python using my Dingus test double library to illustrate the simplicity that doubles can provide:

class BillingServiceTest:
    def setup(self):
        self.log = Dingus()

    def test_approval(self):
        approver = Dingus(approve__returns=True)
        bs = BillingService(approver, self.log)
        bs.process_charge(9000, 'Bob')
        assert self.log.calls(
            'log',
            'Transaction by Bob for 9000 approved').once()

    def test_denial(self):
        denier = Dingus(approve__returns=False)
        bs = BillingService(approver, self.log)
        bs.process_charge(9000, 'Bob')
        assert self.log.calls(
            'log',
            'Transaction by Bob for 9000 denied').once()

In a real system, I'd factor these tests slightly differently; I've left them as close to Bob's as possible. This is 13 ELOC vs. Bob's 38 – only about a third as much code! Some of the difference is in his testing library's ceremony, but most of it is in his test doubles. For example, he says:

class Approver implements CreditCardProcessor {
  public boolean approve(int amount, String id) {
    return true;
  }
}

That is a lot of code! All it really says is "the approve method always returns true", with the rest being a complex dance around Java's rigidity. This is a liability for programmers working in such languages, as well as a learning barrier for new testers. In my Python version, the following takes the place of the Approver class, as well as its instantiation:

approver = Dingus(approve__returns=True)

That line of code is so close to "the approve method always returns true" that I can't imagine it being any clearer. Of course, if the magic double underscores turn you off, you can also say:

approver = Dingus(approve=returner(True))

Digression

I'd love to hear what you think about those two alternate forms. I want to deprecate one, but I don't know which.

I fear that statements like Uncle Bob's about test doubles may lead newer programmers, and static-only programmers, astray. His advice is wonderful, but only in certain domains. Like so many things in software, doubling is far easier when the shackles of Javaesque type systems are removed. And, if you worry that the complexity is simply moved into the test double library, fear not: Dingus is currently 193 ELOC long, including plenty of features not mentioned here!



String Calculator Kata in Python

Jan 06, 2010

For those who aren't familiar, katas are those impressive sequences of movements that you've surely seen martial arts guys perform. Code katas are the same idea applied to writing code: you solve a problem many times, mastering the movements, and then perform it for others.

My friend Corey Haines has worked with some other people to run a Katacast site dedicated to code katas, posting them roughly once per week. Recently, there's been a string of solutions to the same problem, with my Python version as this week's entry.

Briefly, here's the problem I solve. I have numbers coming in as a string, separated by commas or newlines. My job is to add those numbers and return the sum. There are two complications:

  1. If the first line of the string is of the form "//*", then * is also a possible delimiter. This works for any string, not just *.
  2. Negative numbers must be rejected.

You can see the kata and read my brief commentary on the Katacasts site, or go straight to Vimeo to watch it. It's only 4:32 long, so it's not a big commitment.

String Calculator Kata In Python from Gary Bernhardt on Vimeo.

My screencasts always prompt questions about my Vim configuration, so take a look at my dotfiles repo if you're interested. For this kata, I'm slowing down intentionally – typing slower and inserting small, regular pauses so the viewer has time to look around a bit. I may post a "hard mode" version at full speed if people show interest.

Comments are encouraged, of course – the purpose of a kata is improvement!

If you like this, by the way, you may also enjoy the refactoring screencast I posted recently.



On Abstraction

Dec 16, 2009

Some people seem to consider abstraction a bad word. I think that this is misguided and impedes progress – all software is abstraction. Understanding what our abstractions mean, and what makes them good or bad, is the core of design.

For now, let's define abstractions as concepts; nothing more. If it's a concept in your head, it's an abstraction. (I've tried to define the word more fully about ten times, deleting each definition in turn.)

The interesting part of abstractions is their violation. First, the textual definition of an abstraction – a class, for example – can violate itself. This happens when a class presents information at more than one level of abstraction. Here's Grady Booch, from "Object Oriented Analysis and Design":

[The] class Dog is functionally cohesive if its semantics embrace the behavior of a dog, the whole dog, and nothing but the dog."

It's a wonderfully terse explanation, but doesn't go far enough for our purposes because it doesn't address relationships.

Example

A Person class can have a first_name field. But should Person also have a set of address fields like street and zip_code? Probably not. These fields are part of an Address, which is a concept that exists independent of Person. Moving them into an Address class reifies this natural abstraction in our code, making it mirror the way the ideas are structured in our brains.

This is sort of a restatement of the Single Responsibility Principle (SRP), which is sort of a restatement of the principle of cohesion. We have many tools for thinking about this idea because it's important.

Abstractions can also be violated from outside. If an object exposes a set of fields to me, I should avoid looking into those fields' structure. In other words, I must respect the abstraction provided by the object. If I feel the need to violate the abstraction, I need to reconsider how to modify the boundaries to match that need, rather than violating the boundaries by crossing them.

This is the moment when design happens: I can take the path of short-term gain by reaching into my collaborators' collaborators, or I can take the path of long-term gain by refactoring my design to match the conceptual model.

Example

Suppose I have a Person and need to tell the SnailMailer to send him mail. The SnailMailer, as currently designed, takes a street, a zip_code, etc. I could pull the data out of the address fields, like person.address.zip_code, then pass them to the SnailMailer. But in doing that, I would violate the Person abstraction.

Instead, I should have stepped back and thought about the contract of the SnailMailer. It would be better to pass in the Person's Address instead of its components. That way, I rely only on the Person abstraction (it has an Address) and the SnailMailer abstraction (it sends to addresses). I remove my dependency on the structure of a Person's Address (street, zip, etc.) and I remove my dependency on the SnailMailer's expectations about address fields (street, zip, etc.) The SnailMailer can decide how to deal with those.

This is sort of a restatement of the Law of Demeter, which is sort of a restatement of part of the principle of coupling. These are symmetric with the definition side of abstraction in a pleasing way:

  • My abstraction vs. your use of it;
  • Single Responsibility Principle vs. Law of Demeter;
  • Cohesion vs. Coupling.

Most of the design principles we talk about regularly, like those listed above, are syntactic – they are properties of the text of the code. But syntax is only a means; the thing that really matters is that the semantic model of the code mirror the semantic model in our brains. Thinking about (or being preached to about) the design principles in isolation can make them feel arbitrary; it's much better to view them in the light of abstraction integrity.

Abstraction is important! The result of programming isn't simply a computation; it's also a set of ideas made concrete in a programming language. Nothing can beat the long-term business value of ideas expressed clearly in code.



Refactoring A Cyclomatic Complexity Script

Nov 16, 2009

I recently blogged about the cyclomatic complexity script I wrote that highlights Python code based on its complexity. The script itself is written in Python, but the code was rushed together. I decided to do some cleanup, recording the process as an example of refactoring.

You can see a lossless copy of the video by logging into Vimeo and clicking "Download Quicktime version" at the bottom right of the page. I recommend it; MPEG artifacting is no fun.

Refactoring A Cyclomatic Complexity Module from Gary Bernhardt on Vimeo.

I make one notable mistake – I leave an unneeded reference to "code_or_node" lying around. Just imagine that I delete it at the very end. :)