A blog by Gary Bernhardt, Creator & Destroyer of Software

Introducing Another Wildly Ambitious Database Project

17 Feb 2007

One week ago today, I started hacking on a new project: a database implemented as a RESTful HTTP service. Brian has been pestering me to post about it since before any code was even written, so here we are.

I've been calling the project RESTDB, but that's only because I haven't come up with a better name yet. It's sort of relational, but not quite. Depending on how you squint, you might think it is. Likewise, it's not completely RESTful: it lacks arbitrary POST semantics. Despite these caveats, it's quite similar to both RDBMSes and RESTful systems. Let's have a look.

Defining the Schema

We're going to build a multi-user TODO list (basically stolen from Brian's TurboGears tutorial). As with a traditional RDBMS, the first step is to define our schema:

class User(Resource):
    email = String(key=True)
    lists = List(Link('TodoList'))

class TodoList(Resource):
    id = Integer(key=True)
    title = String()
    items = List(Link('Item'))

class Item(Resource):
    id = Integer(key=True)
    value = String()

Obviously, this looks a lot like a SQLObject table definition. One thing is very different, though: the way resources are related to each other. In SQLObject, each TodoList would have a foreign key that points to its User. In RESTDB, this is inverted: the User contains a list of links that point to TodoLists. In SQL terms, you can think of this as a list of foreign keys. This has many serious implications, both for the database's implementation and for how clients interact with it. In the interest of brevity, I will valiantly gloss over every single one of them for now.

The Data

Now that we've defined our schema, let's see what's going on from an HTTP perspective. The definition above will lead to a URL structure like this (with arbitrary example records inserted):

/User
/User/me@example.com
/TodoList
/TodoList/1
/Item
/Item/1
/Item/2
/Item/3

These resources' structures are dictated by the resource classes we defined above. Resources are stored as simple JSON data, so they're human readable even in their raw form. For example, here's what "/TodoList/1" might look like:

{
    'id': 1,
    'title': 'Groceries',
    'items': [
        '/Item/1',
        '/Item/2',
        '/Item/3',
    ]
}

It's just plain old JSON data, but it follows our schema: there's an ID, a title, and a list of item links. You don't even need a database client program to look at it; just point your web browser at the resource and you'll get back the JSON representation. You can download the whole database with wget if you want to.

The Client

Of course, this database isn't designed to be used by humans directly; human readability is just a nice bonus. wgetting your database is a neat gimmick, but what we really care about manipulating it with code.

To illustrate how simple the client is, here's the entire client-side definition for our todo list database:

c = Client('127.0.0.1:17321')
User = Resource(c, 'User')
TodoList = Resource(c, 'TodoList')
Item = Resource(c, 'Item')

That's it: all you have to do is tell it the names of the resources. Note that this doesn't mean that there aren't constraints on the data – there are! Lots of constraints – all the constraints you care to define! It's just that they're only on the server side. If you step out of line, the server will slap you with an "HTTP 400: No Shenanigans Allowed".

We'll get to the shenanigans in a minute. First, let's try the client out by creating a user:

>>> me = User.post(email="me@example.com", lists=[])
>>> me.email
u'me@example.com'
>>> me.lists
[]

All we do is POST a new user resource with an email address and no todo lists. This is literally just an HTTP POST to /User. The database responds with an HTTP "Location" header to tell the client that the new resource is at "/User/me@example.com".

Now that our user is securely fastened to the database, let's create a todo list with some items:

>>> # Create a todo list and assign it to the user
>>> my_list = TodoList.post(title="Groceries", items=[])
>>> me.lists.append(my_list)
>>> me.put()

>>> # Create some items and add them to the todo list
>>> i1 = Item.post(id=0, value="Milk")
>>> i2 = Item.post(id=1, value="Eggs")
>>> i3 = Item.post(id=2, value="Bread")
>>> my_list.items += [i1, i2, i3]
>>> my_list.put()

We POST a new TodoList, just like we POSTed a new user before. Then we have to update the user resource to point at the new list. me.lists is just a plain old Python list, so we append the new todo list, then PUT me to update the server's copy. We then repeat the same process to add items to the todo list.

Light's Green; Trap's Clean

Now that we've trapped a bunch of data in our database, let's start from scratch and pull the todo list's items back:

>>> me = User.get("me@example.com")
>>> print [item.value for item in me.lists[0].items]
[u'Milk', u'Eggs', u'Bread']

Awesome.

But wait, I've conveniently left a loose end untied! I claimed that shenanigans were strictly forbidden. So far, we've been acting nice and giving the server exactly what it wants. Now let's try to feed the server some crap:

>>> me = User.post(email="you@example.com", lists=123)
Traceback (most recent call last):
  ...
client.BadRequestError: "123" is not a list
>>> me = User.post(screw_you_server="EXPLODE!")
Traceback (most recent call last):
  ...
client.BadRequestError: Didn't expect field "screw_you_server"

It's having none of it! It will snub your stupidly-formed data all day long. And it's not just simple things like types that are enforced. You can define regex constraints for your strings, ranges for your numbers, and whatever else you can dream up. You can suffocate your precious data with constraints. Your links are guaranteed to reference valid resources; your URLs are guaranteed to match your data; your data is guaranteed to match your schema.

Everything I've shown here is real, working code. A few of the things I've mentioned, like link validity constraints, aren't done yet, but they're coming. Unfortunately, I can't point you at subversion just yet, because I don't have anywhere to host it. That will hopefully change soon, and you'll be able to prod it for yourself. For now, you'll have to make do with imagining how awesome it would be to speed up your database by sticking a plain old HTTP caching proxy in front of it.