A blog by Gary Bernhardt, Creator & Destroyer of Software

On Abstraction

16 Dec 2009

Some people seem to consider abstraction a bad word. I think that this is misguided and impedes progress – all software is abstraction. Understanding what our abstractions mean, and what makes them good or bad, is the core of design.

For now, let's define abstractions as concepts; nothing more. If it's a concept in your head, it's an abstraction. (I've tried to define the word more fully about ten times, deleting each definition in turn.)

The interesting part of abstractions is their violation. First, the textual definition of an abstraction – a class, for example – can violate itself. This happens when a class presents information at more than one level of abstraction. Here's Grady Booch, from "Object Oriented Analysis and Design":

[The] class Dog is functionally cohesive if its semantics embrace the behavior of a dog, the whole dog, and nothing but the dog."

It's a wonderfully terse explanation, but doesn't go far enough for our purposes because it doesn't address relationships.

Example

A Person class can have a first_name field. But should Person also have a set of address fields like street and zip_code? Probably not. These fields are part of an Address, which is a concept that exists independent of Person. Moving them into an Address class reifies this natural abstraction in our code, making it mirror the way the ideas are structured in our brains.

This is sort of a restatement of the Single Responsibility Principle (SRP), which is sort of a restatement of the principle of cohesion. We have many tools for thinking about this idea because it's important.

Abstractions can also be violated from outside. If an object exposes a set of fields to me, I should avoid looking into those fields' structure. In other words, I must respect the abstraction provided by the object. If I feel the need to violate the abstraction, I need to reconsider how to modify the boundaries to match that need, rather than violating the boundaries by crossing them.

This is the moment when design happens: I can take the path of short-term gain by reaching into my collaborators' collaborators, or I can take the path of long-term gain by refactoring my design to match the conceptual model.

Example

Suppose I have a Person and need to tell the SnailMailer to send him mail. The SnailMailer, as currently designed, takes a street, a zip_code, etc. I could pull the data out of the address fields, like person.address.zip_code, then pass them to the SnailMailer. But in doing that, I would violate the Person abstraction.

Instead, I should have stepped back and thought about the contract of the SnailMailer. It would be better to pass in the Person's Address instead of its components. That way, I rely only on the Person abstraction (it has an Address) and the SnailMailer abstraction (it sends to addresses). I remove my dependency on the structure of a Person's Address (street, zip, etc.) and I remove my dependency on the SnailMailer's expectations about address fields (street, zip, etc.) The SnailMailer can decide how to deal with those.

This is sort of a restatement of the Law of Demeter, which is sort of a restatement of part of the principle of coupling. These are symmetric with the definition side of abstraction in a pleasing way:

  • My abstraction vs. your use of it;
  • Single Responsibility Principle vs. Law of Demeter;
  • Cohesion vs. Coupling.

Most of the design principles we talk about regularly, like those listed above, are syntactic – they are properties of the text of the code. But syntax is only a means; the thing that really matters is that the semantic model of the code mirror the semantic model in our brains. Thinking about (or being preached to about) the design principles in isolation can make them feel arbitrary; it's much better to view them in the light of abstraction integrity.

Abstraction is important! The result of programming isn't simply a computation; it's also a set of ideas made concrete in a programming language. Nothing can beat the long-term business value of ideas expressed clearly in code.