Extra Cheese

A Blog


On Abstraction

Dec 16, 2009

Some people seem to consider abstraction a bad word. I think that this is misguided and impedes progress – all software is abstraction. Understanding what our abstractions mean, and what makes them good or bad, is the core of design.

For now, let's define abstractions as concepts; nothing more. If it's a concept in your head, it's an abstraction. (I've tried to define the word more fully about ten times, deleting each definition in turn.)

The interesting part of abstractions is their violation. First, the textual definition of an abstraction – a class, for example – can violate itself. This happens when a class presents information at more than one level of abstraction. Here's Grady Booch, from "Object Oriented Analysis and Design":

[The] class Dog is functionally cohesive if its semantics embrace the behavior of a dog, the whole dog, and nothing but the dog."

It's a wonderfully terse explanation, but doesn't go far enough for our purposes because it doesn't address relationships.

Example

A Person class can have a first_name field. But should Person also have a set of address fields like street and zip_code? Probably not. These fields are part of an Address, which is a concept that exists independent of Person. Moving them into an Address class reifies this natural abstraction in our code, making it mirror the way the ideas are structured in our brains.

This is sort of a restatement of the Single Responsibility Principle (SRP), which is sort of a restatement of the principle of cohesion. We have many tools for thinking about this idea because it's important.

Abstractions can also be violated from outside. If an object exposes a set of fields to me, I should avoid looking into those fields' structure. In other words, I must respect the abstraction provided by the object. If I feel the need to violate the abstraction, I need to reconsider how to modify the boundaries to match that need, rather than violating the boundaries by crossing them.

This is the moment when design happens: I can take the path of short-term gain by reaching into my collaborators' collaborators, or I can take the path of long-term gain by refactoring my design to match the conceptual model.

Example

Suppose I have a Person and need to tell the SnailMailer to send him mail. The SnailMailer, as currently designed, takes a street, a zip_code, etc. I could pull the data out of the address fields, like person.address.zip_code, then pass them to the SnailMailer. But in doing that, I would violate the Person abstraction.

Instead, I should have stepped back and thought about the contract of the SnailMailer. It would be better to pass in the Person's Address instead of its components. That way, I rely only on the Person abstraction (it has an Address) and the SnailMailer abstraction (it sends to addresses). I remove my dependency on the structure of a Person's Address (street, zip, etc.) and I remove my dependency on the SnailMailer's expectations about address fields (street, zip, etc.) The SnailMailer can decide how to deal with those.

This is sort of a restatement of the Law of Demeter, which is sort of a restatement of part of the principle of coupling. These are symmetric with the definition side of abstraction in a pleasing way:

  • My abstraction vs. your use of it;
  • Single Responsibility Principle vs. Law of Demeter;
  • Cohesion vs. Coupling.

Most of the design principles we talk about regularly, like those listed above, are syntactic – they are properties of the text of the code. But syntax is only a means; the thing that really matters is that the semantic model of the code mirror the semantic model in our brains. Thinking about (or being preached to about) the design principles in isolation can make them feel arbitrary; it's much better to view them in the light of abstraction integrity.

Abstraction is important! The result of programming isn't simply a computation; it's also a set of ideas made concrete in a programming language. Nothing can beat the long-term business value of ideas expressed clearly in code.



Refactoring A Cyclomatic Complexity Script

Nov 16, 2009

I recently blogged about the cyclomatic complexity script I wrote that highlights Python code based on its complexity. The script itself is written in Python, but the code was rushed together. I decided to do some cleanup, recording the process as an example of refactoring.

You can see a lossless copy of the video by logging into Vimeo and clicking "Download Quicktime version" at the bottom right of the page. I recommend it; MPEG artifacting is no fun.

Refactoring A Cyclomatic Complexity Module from Gary Bernhardt on Vimeo.

I make one notable mistake – I leave an unneeded reference to "code_or_node" lying around. Just imagine that I delete it at the very end. :)



My Personal Failures in Test Isolation

Oct 28, 2009

My position paper for SDTConf was about test doubles and my problems with refactoring around fully isolated tests.

Digression: Isolation

There are many colloquial definitions of "unit test". When I use the term, I'm almost always talking about a test that executes code in exactly one production class. If it collaborates, it collaborates only with test doubles like mocks, stubs, and fakes. Every test's world contains 30 or so lines of production code. If you've not heard of this, it probably sounds crazy. It's not.

J. B. Rainsberger found my position paper and responded to it. He quotes me:

In my TDD practice with test doubles I’ve found that, now that all code is 100% isolated, it’s almost impossible to refactor across classes with confidence unless I totally rewrite them.
J. B. replies, in part:

I interpret his comment as though that disappoints him. I invite Gary, and you, to consider an alternative interpretation:

  1. Rewriting classes, rather than refactoring them, shows good compliance with the Open/Closed Principle, which encourages me.
  2. Needing to refactor across multiple classes, as opposed to re-implementing everything behind a given interface, probably indicates a layering problem, which I’d expect to notice with duplication in the isolated tests. That encourages me, because I like it when my tests expose design flaws to me.

Abstraction Errors

J. B. is absolutely correct: this is about layers, and my failure to layer my software correctly. I've only been doing TDD for two or three years, and I still make non-trivial, multiple-class-spanning abstraction errors. The layering errors creep in across many tests and I just don't see them early enough. I can feel myself slowly getting better at this, but it's taking a while.

The problem is that I often notice the problem after it already exists, and my dilemma is "how do I fix it?" If I try to replace the hard dependency with an abstraction while avoiding a rewrite of the class, I lose confidence in my tests.

My concern is that that isolation doesn't work well unless you never make certain classes of errors. There's too much coupling in this skill set! It doesn't have a reasonable entry point; you have to be a relentless jerk – as I clearly am – to break in. I want to ease people into isolation without telling them "just wait five years and you'll be fine."

Vertical Changes in Semantics

I have a closely-related example that isn't a refactoring, but displays exactly this problem. I wrote Mote, a test runner for Python. The original design had the suite class collecting all of the tests, then handing them off to the result printer.

I wanted to replace this with a pure pull process, where the printer pulls one example at a time through the entire stack, with the goal being to output the cases as they're evaluated rather than all at once. But the "push" concept pervaded most of the core classes! A context, upon being created, would recursively create its child contexts and examples. A suite, upon being created, would create its contexts. The isolated tests made this change hard! I had to rewrite many core classes to maintain confidence, and I actually never finished because it sent me into a horrible death spiral of self-doubt about isolation!

Digression: Mote

Mote has many more problems than this, and I consider its internals one of my greatest TDD failures. This is sort of depressing, since I failed to effectively TDD a tool that I was writing to help me do TDD. I think I know why the design ended up so bad, but that's a topic for another blog post entirely.

I have a vague notion of the answer to this push-pull problem. The suite was directly instantiating contexts, which were directly instantiating examples. While I was writing it, I could feel that it was wrong. Usually, when that feeling crops up, I know how to improve the design accordingly. In this case, for some reason, I didn't, and I pushed forward because I wanted Mote to be in a working state for my own personal use. Now I have the same problem I've had with refactorings: I've introduced a suboptimal design and I have to improve it, but I'll end up rewriting 100% of the code to change 10%.

I don't need to be sold on isolation or abstraction; I'm already sold, as evidenced by having written a mock library that forcefully isolates your system under test. I'm not even looking for answers to refactoring and vertical-change problems: I know that these classes of problems grow out of a lack of design skills, I know which design skills those are, and I know how to improve them.

What I'm now looking for is how to grow these skills in a person from the ground up. What if there is a way to do those nasty, vertical refactors with higher confidence? Maybe not full confidence, but enough to prevent the horrible death spiral of self-doubt? As things stand, it's very hard to help other people learn these techniques and it just bothers me.