Extra Cheese

A Blog


On Abstraction

Dec 16, 2009

Some people seem to consider abstraction a bad word. I think that this is misguided and impedes progress – all software is abstraction. Understanding what our abstractions mean, and what makes them good or bad, is the core of design.

For now, let's define abstractions as concepts; nothing more. If it's a concept in your head, it's an abstraction. (I've tried to define the word more fully about ten times, deleting each definition in turn.)

The interesting part of abstractions is their violation. First, the textual definition of an abstraction – a class, for example – can violate itself. This happens when a class presents information at more than one level of abstraction. Here's Grady Booch, from "Object Oriented Analysis and Design":

[The] class Dog is functionally cohesive if its semantics embrace the behavior of a dog, the whole dog, and nothing but the dog."

It's a wonderfully terse explanation, but doesn't go far enough for our purposes because it doesn't address relationships.

Example

A Person class can have a first_name field. But should Person also have a set of address fields like street and zip_code? Probably not. These fields are part of an Address, which is a concept that exists independent of Person. Moving them into an Address class reifies this natural abstraction in our code, making it mirror the way the ideas are structured in our brains.

This is sort of a restatement of the Single Responsibility Principle (SRP), which is sort of a restatement of the principle of cohesion. We have many tools for thinking about this idea because it's important.

Abstractions can also be violated from outside. If an object exposes a set of fields to me, I should avoid looking into those fields' structure. In other words, I must respect the abstraction provided by the object. If I feel the need to violate the abstraction, I need to reconsider how to modify the boundaries to match that need, rather than violating the boundaries by crossing them.

This is the moment when design happens: I can take the path of short-term gain by reaching into my collaborators' collaborators, or I can take the path of long-term gain by refactoring my design to match the conceptual model.

Example

Suppose I have a Person and need to tell the SnailMailer to send him mail. The SnailMailer, as currently designed, takes a street, a zip_code, etc. I could pull the data out of the address fields, like person.address.zip_code, then pass them to the SnailMailer. But in doing that, I would violate the Person abstraction.

Instead, I should have stepped back and thought about the contract of the SnailMailer. It would be better to pass in the Person's Address instead of its components. That way, I rely only on the Person abstraction (it has an Address) and the SnailMailer abstraction (it sends to addresses). I remove my dependency on the structure of a Person's Address (street, zip, etc.) and I remove my dependency on the SnailMailer's expectations about address fields (street, zip, etc.) The SnailMailer can decide how to deal with those.

This is sort of a restatement of the Law of Demeter, which is sort of a restatement of part of the principle of coupling. These are symmetric with the definition side of abstraction in a pleasing way:

  • My abstraction vs. your use of it;
  • Single Responsibility Principle vs. Law of Demeter;
  • Cohesion vs. Coupling.

Most of the design principles we talk about regularly, like those listed above, are syntactic – they are properties of the text of the code. But syntax is only a means; the thing that really matters is that the semantic model of the code mirror the semantic model in our brains. Thinking about (or being preached to about) the design principles in isolation can make them feel arbitrary; it's much better to view them in the light of abstraction integrity.

Abstraction is important! The result of programming isn't simply a computation; it's also a set of ideas made concrete in a programming language. Nothing can beat the long-term business value of ideas expressed clearly in code.



Showing 3 comments

Posted by Steve Howell at Wed Dec 16 15:39:55 2009

I agree with your points, but I think the examples do not bring out the trickier issues when it comes to abstraction.  Pulling Address out of Person is kind of a no-brainer.

Dog is also an easy target.  Of course, the Dog class should express dogness, and only dogness, but to paraphrase you, the simplicity breaks down as soon as you introduce relationships.  If the Dog has an Owner, and that Owner is a Person, and the Ownership relationship affects both dog and person, which abstractions are important to make explicit in code?

Abstractions need to be cohesive, for sure, but I've found that you still get lots of churn on otherwise well-written code when the boundaries get blurry.  In the case of modelling the connection between Dog and Owner, your first pass at design might just make a somewhat arbitrary decision as to which class owns (no pun intended) the relationship. 

Often the best decision is driven by practical considerations that are more holistic than the individual classes.  You can have forces that go beyond each class taken in isolation, like how often the Dog and Owner classes get used elsewhere, which side of the relationship is one-to-many, how those classes get persisted to a database, which class is more heavy to begin with, etc.

You can resolve the condundrum by introducing a new DogOwnership class, but this never solves your problem completely, because Dog and Owner still have a mutual relationship, even if it just through a class.

A final thing that drives your level of abstraction is the language itself, of course.  Abstractions need to carry their weight, whether that "weight" is syntactic or semantic.  You are more likely to create useful abstractions in a language that has lightweight syntax, for example.  I suppose you are also more likely to create useless abstractions, too!


Posted by Gary Bernhardt at Wed Dec 16 15:54:09 2009

I agree, Steve, and I do apologize for the crappy examples. You know that I'm bad at coming up with them. :)

I'd like to write more about the subtleties you describe, but I find topics like that very difficult to explain without getting buried in boring details. This is probably a fault in my own writing skills. :)


Posted by Steve Howell at Thu Dec 24 12:58:02 2009

The most powerful abstractions for large scale systems, and even small scale systems, are often not traditional objects.

You also have protocols, document models, and relational databases, just to name a few.

It would be interesting to hear your thoughts on those.

When I worked at MerchantLink, we processed over a billion transactions per year through our credit card gateway, and we supported a multitude of vendors and partners, so you can imagine how large the codebase had grown to over more than a decade.  Obviously, a lot of care and craftsmanship went into the individual modules.  When financial transactions break, believe me you hear about them quickly from customers.

The strength of our architecture, though, was our ability to break the system into a series of components that interacted with each other through protocols, whether those protocols were public (HTTP, etc.) or proprietary (our secret sauce).

Arguably the most powerful abstraction ever invented is HTML.  The structure of the markup language itself influences the conceptual structure of websites.  Since HTML gets produced as a linear document, I think its weakness is that it doesn't present a powerful abstraction in terms of positioning elements with respect to each other (i.e. relationships).  And CSS does not do much better.

Having worked with you on a Django project, I also know that you are aware of the tension between the relational database model and OO in terms of defining abstraction.  The fact that relational databases strive to be, in fact, "relational" suggests that they are attempting to solve the hard problems.  It is open for debate how well they do it.  Tables are useful abstractions, but they do not have "behavior" per se, so they are not true objects, and most ORMs do little more than provide some sugar on top of the inherent relational model.

Hope that's food for thought.  It seems that OO works best for medium-sized components within a greater context of large-system architecture.  Most large systems I've worked on in the last 15 years or so all eventually combine protocols, markup language, and relational databases.  Many small problems are more easily solved with functional programming than OO.  Objects usually get driven out in code that falls within the range of 100 to 10,000 lines.  If you push objects too far, it probably means you are not thinking enough about protocols.


Name:


E-mail:


URL:


Comment: