BitWorking

Gloves

Last time I talked about the creation of Robaccia I got to the point of a working framework and just waved my hands and said you could keep going and "just" add conventions. I have pointed out that "just" is a dangerous word, so let's walk through the rest of the steps to building a Rails/Django-like web framework.

Update: Just so there's no confusion, the title of this post comes from The Complicator's Gloves.

They key point of adding 'conventions' is to take a load off the user. You need to actually remove two kinds of load, cognitive and manual. Cognitive load is the number of concepts you need to hold in your head. The fewer the number of concepts, and the more uniformly they are applied, the easier the system will be to use. Manual load is just the amount of manual stuff, like typing, that you need to do. Why should I have to manually create a directory structure when a computer is capable of doing that?

Motivation

Before diving into lots of code and details let's clarify what I'm trying to accomplish here. I am not trying to write Rails for Python. My mission in creating this software, and this write-up, is to lay out the core ideas of constraints, cognitive load, and manual load, and how to apply them to create a Rails/Django-like web framework, in the hopes that it is helpful, so that you can use it to create your own Rails-like framework in your language of choice.

In this example I make some design decisions that might seem a bit extreme. They are. You probably should make different choices for your web framework based on your programming language and problem domain.

The Happy Path

I am going to borrow a term from testing, Happy Path, and use it in the context of creating our conventions. We need to pick a happy path when using our framework, and we need to knock down as many barriers, and make things as easy as possible, as long as users stick to that happy path.

Design

If you are familiar with Ruby on Rails or Django then the development story should be familiar. Here's the core of our design document. Creating an employee application should consist of the following steps:

    $ robaccia createproject myproject 
    $ cd myproject 
    $ robaccia addmodelview employees # Creates  view, templates, and model.
    $ gvim ./models/employees.py      # Add table columns.
    $ robaccia createdb
    $ robaccia run
    $ firefox http://localhost:8080/employees/

Of course, we may already have a model in place, and want to just create another view on that model. Adding a view-only collection should be as simple as:

    $ robaccia addview employees
    $ robaccia run
    $ firefox http://localhost:8080/employees/

Preliminaries

Currently Robaccia consists of a couple modules and a bunch of conventions for how to lay out a project. They include the files:

To accomodate making more complex projects we'll update most of those files to be directories, and add another directory for log files:

    /views
    /templates
    /models
    /logs

We'll leave urls.py as a file, but just rename it to dispacher.py, because while it does match based on the incoming URI, it also matches on the request method, so let's name it for what it does.

So let's look at what the directory structure we have for a Robaccia based application:

views/
models/
templates/
logs/
dispatcher.py

And if we add a new resource to our application we distribute the model, view, and template files under each of those directories.

views/name.py
models/name.py
templates/name/list.html
templates/name/retrieve.html
...
logs/

Now Robaccia, using selector, allows you to dispatch to any view from any form of URI and any method. We need to simplify that, by adding constraints, and guide those constraints using a conceptual model. You can't possibly be surprised that I would choose a RESTful collection as an organizing principle. In particular, we'll assume that everything you want to create will fit into wsgicollection. We will, obviously, build in an escape hatch, but RESTful collections will be our happy path.

In general, using the notation of selector, we are looking at URIs of the form:

 /...people/[{id}][;{noun}] 

And dispatching requests to URIs of that form to functions with nice names:

  GET    /people               list()
  POST   /people               create()
  GET    /people/1             retrieve()
  PUT    /people/1             update()
  DELETE /people/1             delete()
  GET    /people;create_form   get_create_form()
  GET    /people/1;edit_form   get_edit_form()
  

We'll wrap all those target functions up into a single class and make instances of those classes a WSGI application:

from wsgicollection import Collection
class People(Collection):
    # GET /people/
def list(environ, start_response):
        pass
# POST /people/
def create(environ, start_response):
        pass
# GET /people/1
def retrieve(environ, start_response):
        pass
# PUT /people/1
def update(environ, start_response):
        pass
# DELETE /people/1
def delete(environ, start_response):
        pass
# GET /people/;create_form
def get_create_form(environ, start_response):
        pass
# POST /people/1;comment_form
def post_comment_form(environ, start_response):
        pass

So if a collection is our organizing principle, what about our URI structure? Let's keep it simple and presume everything will fall into a simple URI template:


/{view:alnum}/[{id:unreserved}][;{noun:unreserved}]

That is, each {view} represents a collection, and {view}/{id} is a member of that collection. That's very simple and we have introduced a number of constraints by restricting ourselves to this URI structure:

Content type
We presume that this collection has a single content type, such as HTML, or JSON. There is no provision for having a single collection serve up content in different media types. If you want to handle different media types then you can create two views that reference the same underlying model.
Nesting
The namespace is very flat, not allowing nesting of resources beyond the simple collection. For example, if you were creating blogging application, you couldn't create the main blog as a collection of entries, /blog/{id}, and then have a collection of comments for each blog entry, ala /blog/{id}/comments/{commentid}, but you could have a collection that worked on all comments received, /comments/{id}. Again, you can certainly do the nested collections for comments with the tools we are going to supply, but that is not on the happy path.

Also, in the intervening time since Robaccia was initially released I've written my own version of selector, wsgidispatcher, so we'll switch to using that. The activity around Kid has diminished and moved mostly to Genshi, so we will also migrate to Genshi for templating.

RESTful collections, and a highly constrained URI scheme, are the concepts we're using to reduce cognitive load. On the manual load side, we'll build some command line tools to automate the generation of stubs and directories.

Implementation

To fulfill our design we'll take the current Robaccia through a number of incremental steps. The first step is modularization.

Modularize robaccia

We need to convert Robaccia from a set of conventions and a few lines of code into a library on it's own. First, we consolidate all the code into a single installable module. We'll also create a stub for the 'robaccia-admin' program, the one we'll eventually use to create the skeleton for project.

 
   robaccia.py => robaccia/__init__.py # controversial in some quarters
   wsgidispatcher.py => robaccia/wsgidispatcher.py
   wsgicollection.py => robaccia/wsgicollection.py
   mimetypes.py => robaccia/mimetypes.py

The only change that this requires in our code is a change in the imports for wsgicollection and wsgidispatcher.

Now let's create a setup.py so our new module can be installed. This uses the built-in distutils library, and the configuration file is really a Python program.

[setup.py]

Note that this setup file not only installs the robaccia library, but also the 'robaccia-admin' program.

robaccia-admin

We want to create a program that we can run that takes commands as arguments, just like 'svn' or 'bzr'. In addition we want help, and also, like 'bzr', a way to make the command set extensible. At the very least, we want to make adding new command to 'robaccia-admin' easy. The simplest thing that could possibly work is to have the command name match a method name in the module, and just look them up on the fly.

robaccia-bin

members = globals()
if __name__ == "__main__":
    try:
        cmd = sys.argv[1]
    except:
        cmd = "help"
    args = sys.argv[2:]
    if cmd not in members or  
             cmd.startswith("_") or (not callable(members[cmd])):
        cmd = "help"
    members[cmd](args)

Note that if a command isn't found in 'members', then we assume the command "help". That gets us running commands, but how do we do help? We need two kinds of help, a short message on how to use a command, used when we request a list of commands, and a longer help description for when we want help on a single command. Also, I don't want to write a separate help file.

In Python if the first thing in your method is a string, then that string is available via the __doc__ attribute. We'll put in place a convention that the first line of the doc string is a short description, and the whole doc string will be used for help on that command. So, as an example, here is the implementation for the 'commands' command. Run 'robaccia-admin commands' to get a list of all the commands that the robaccia program supports.

robaccia-bin

def commands(args):
    """robaccia commands     

List all known commands that robaccia knows.
"""
    for name in members:
        if not name.startswith("_") and callable(members[name]):
            print members[name].__doc__.splitlines()[0]

So now we can add the 'createproject' command. The 'createproject' command just creates a directory structure populated with some default files. Looking back into 'setup.py' you can see the 'package_data' parameter, which is a list of non-source files that can be packaged with the library. We will stuff a skeleton project directory structure in there and copy it over when we create a project. We'll leverage Python's ability to introspect here, as each module has a __file__ attribute which is the location of the source file, and we will use that to construct a path to the template files.

robaccia-bin

def createproject(args):
    """robaccia createproject <name>   
Creates a new project directory structure.
The target directory must not exist.
"""
try:
        name = args[0]
    except:
        sys.exit("Error: Missing required parameter <name>.")
    if os.path.exists(name):
        sys.exit("Error: Directory '%s' already exists" % name)
    template_dir = os.path.abspath(
       os.path.join(robaccia.__file__, "..", "templates", "project")
    )
    shutil.copytree(template_dir, name)

addview

At this point let's skip to our second scenario and implement the 'addview' command. We'll go back later and add the 'addmodelview' command.

We'd like to keep the cognitive load as small as possible. If you look at the original implementation of Robaccia, each view had to render it's own template:

!! This is the old way, what were trying to replace !!


import selector
import view
urls = selector.Selector()
urls.add('/blog/', GET=view.list)
urls.add('/blog/{id}/', GET=view.member_get)

!! This is the old way, what were trying to replace !!


def list(environ, start_response):
    rows = model.entry_table.select().execute()
    return robaccia.render(start_response, 'list.html', locals())

WSGICollection helps by adding an idiom and removing the need to map each request URI and method to a WSGI application, but we can do better. We can take robaccia.render() and presume that every renderer will conform to that signature. Since we have a convention for how files will be laid out, we can do even more work, we can look up the template to render using the name of the WSGICollection that was called, and the name of the member function that was called. The only piece of information we are missing is the file extension of the template, so we will have to pass that in also.

What we want, for example, is that if we add a collection 'fred':

   /views/fred.py
   /templates/fred/list.html
   /templates/fred/retreive.html

Then a GET to /fred/ will end up calling views.fred.app.list(), and the template /templates/fred/list.html will be rendered.

That's fine default behavior, but we need a way to signal whether we want the default rendering to occur, of the collection view member function has decided to over-ride the default behavior and do it's own processing.

In normal processing a wsgicollection member function returns an iterable. We can look for things besides iterables to indicate that we should look for a template and render it. The simplest thing is to look for a dictionary, since that is what you pass into a template to get rendered. We can also accept 'None' in the response and convert that into some sort of acceptable dictionary to be passed into the templating engine.

Here is the implementation for DefaultCollection, which implements the above design:

defaultcollection.py

from wsgicollection import Collection
import os
class DefaultCollection(Collection):
    def __init__(self, ext, renderer):
        Collection.__init__(self)
        self._ext = ext
        self._renderer = renderer
    def __call__(self, environ, start_response):
        response = Collection.__call__(self, environ, start_response)
        if response == None:
            if self._id:
                response = {'id': self._id}
            else:
                response = {}
        if isinstance(response, dict):
            view = environ['wsgiorg.routing_args'][1].get('view', '.')
            template_file = os.path.join(view, self._function_name + "." + self._ext)
            return self._renderer(environ, start_response, template_file, response)
        else:
            return response

Note that we hand off most of the work to wsgicollection.Collection, and then if the response is a dictionary we look up the template and pass the name into the renderer. The only other thing to note is that the construtor now takes two new arguments, the template filename extension, and the renderer.

DefaultCollection now vastly simplifies our views if we just want the default templates rendered:

from robaccia.defaultcollection import DefaultCollection
from robaccia import render
class Collection(DefaultCollection):
    # GET /{view}/
def list(self, environ, start_response):
        pass
# GET /{view}/{id}
def retrieve(self, environ, start_response):
        pass
# PUT /{view}/{id}
def update(self, environ, start_response):
        pass
# DELETE /{view}/{id}
def delete(self, environ, start_response):
        pass
# POST /{view}/
def create(self, environ, start_response):
        pass
app = Collection('html', render)

To stop handling a particular URI, such as deleting a member of the collection, just remove the associated member function.

run

We are almost done with our first scenario. All we need now is a way to run our application. Since we based Robaccia on WSGI we can use wsgiref to run our application under a local web server for development purposes.

robaccia-admin

def run(args):
    """robaccia run           
Start running the application under
a local web server.
"""
from dispatcher import app
    from wsgiref.simple_server import WSGIServer, WSGIRequestHandler
    robaccia.init_logging()
    httpd = WSGIServer(('', 3100), WSGIRequestHandler)
    httpd.set_app(app)
    print "Serving HTTP on %s port %s ..." % httpd.socket.getsockname()
    httpd.serve_forever()

Note again we know the WSGI app to load because of our file layout conventions. The last piece of the puzzle is how dispatcher.py routes the incoming requests to the right view.

dispatcher.py

from robaccia.wsgidispatcher import Dispatcher
from robaccia import deferred_collection
app = Dispatcher()
app.add('/{view:alnum}/[{id:unreserved}][;{noun:unreserved}]', deferred_collection)

A dispatcher.py is copied into every project so that you can customize it later if you want to do something besides the default everything-is-a-collection convention. Any URI templates added before the one that's already there will match first.

It's deferred_collection that does our lookup of the view to call.

deferred_collection()

def deferred_collection(environ, start_response):
    """Look for a views.* module to handle this incoming
    request. Presumes the module has 
    an 'app' that is a WSGI application."""
# Pull out the view name from the template parameters
    view = environ['wsgiorg.routing_args'][1]['view']
    # Load the named view from the 'views' directory
    m = __import__("views." + view, globals(), locals())
    # Pass along the WSGI call into the given application
    logging.getLogger('robaccia').debug("View: %s" % view)
    return getattr(getattr(m, view), 'app')(environ, start_response)

We construct the view name from the incoming path and then load that module dynamically. Of course a little error handling, like returning a 404 if the module isn't found, would be good, but you get the idea.

So that finishes it, our first phase is complete. If we include a couple simple templates for 'list.html' and 'retrieve.html' then we can run the second scenario with this code:

joe@joe-laptop:~$ robaccia-admin createproject myproject
joe@joe-laptop:~$ cd myproject/
/home/joe/myproject
joe@joe-laptop:~/myproject$ robaccia-admin addview fred
 created views/fred.py 
 created templates/fred/list.html 
 created templates/fred/retrieve.html 
joe@joe-laptop:~/myproject$ robaccia-admin run
Serving HTTP on 0.0.0.0 port 3100 ...
localhost - - [24/May/2007 11:14:49] "GET /fred/ HTTP/1.1" 200 116

Pointing our browser at http://localhost:3100/fred/ gets us:

Web page: Hello World!

Phase Two - Model

Now let's get cracking on adding in the database.

If you know me, you know I'm not a huge fan of relational databases, I prefer my stores more scalable and my columns sparse, but let's not talk about that now. All of the next generation web frameworks rely on a relational database and this section will show how to do just that.

The very first thing we are going to need is a configuration file for the database, which we'll use dbconfig.py and throw that into the default project files so it is always present.

On top of the files we added for 'addview' we will add in a model, and some template code for adding and editing collection members.

DefaultModelCollection does everything that DefaultCollection does, but now we have a 'model' to keep track of. Those dictionaries are going from the model into templates. There is also a way to parse incoming request bodies.

defaultmodelcollection.py

from wsgicollection import Collection
import os
from robaccia import http200, http405, http404, http303
class DefaultModelCollection(Collection):
    def __init__(self, ext, renderer, parser, model):
        Collection.__init__(self)
        self._ext = ext
        self._renderer = renderer # converts dicts to representations
        self._model = model
        self._parser = parser     # converts representations to dicts
        self._repr = {}           # request representation as a dict()
def __call__(self, environ, start_response):
        response = Collection.__call__(self, environ, start_response)
        if environ['REQUEST_METHOD'] in ['PUT', 'POST']:
            self._repr = self._parser(environ)
        if response == None:
            primary = self._model.primary_key.columns.keys()[0]
            view = environ['wsgiorg.routing_args'][1].get('view', '.')
            template_file = os.path.join(view, self._function_name + "." + self._ext)
            method = environ.get('REQUEST_METHOD', 'GET')
            if self._id:
                if method == "POST" and "_method" in 
                    self._repr and self._repr["_method"] in ["PUT", "DELETE"]:
                    method = self._repr["_method"]
                    del self._repr["_method"]
                if method == 'GET':
                    result = self._model.select(self._model.c[primary]==self._id
                       ).execute()
                    row = result.fetchone()
                    if None == row:
                        return http404(environ, start_response)
                    data = dict(zip(result.keys, row))
                    return self._renderer(environ, start_response, template_file, 
                          {"row": data, "primary": primary})
                elif method == 'PUT':
                    self._model.update(self._model.c[primary]==self._id
                        ).execute(self._repr)
                    return http303(environ, start_response, self._id)
                elif method == 'DELETE':
                    self._model.delete(self._model.c[primary]==self._id).execute()
                    return http303(environ, start_response, "./")
                else:
                    print method
                    return http405(environ, start_response)
            else:
                if method == 'GET':
                    result = self._model.select().execute()
                    meta = self._model.columns.keys()
                    data = [dict(zip(result.keys, row)) for row in result.fetchall()]
                    return self._renderer(environ, start_response, template_file, 
                       {"data": data, "primary": primary, "meta": meta})
                elif method == 'POST':
                    self._model.insert(self._repr).execute()
                    return http303(environ, start_response, ".")
        else:
            return response

Note that the constructor takes a renderer, a parser, and a model. The renderer takes dictionaries and turns them into response bodies. The parser takes incoming request bodies and turns them into dictionaries. We already have an HTML renderer, all we need to do form processing is something that takes incoming 'application/x-www-form-urlencoded' data and converts it into a dictionary, which is easy to come by.

We need to update our 'view' to handle the 'edit' and 'new' forms, which just means adding 'get_edit_form()' and 'get_new_form()' functions to the view. The templates are also pretty simple. Here is the updated 'list.html':

list.html

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:py="http://genshi.edgewall.org/"  >
<head>
<title>Fred</title>
</head>
<body>
<p><ahref=";new_form">New</a></p>
<p py:for="row in data">
<p py:for="key, value in row.iteritems()">
<b>$key</b>: $value
            </p>
<ahref="${row[primary]}">${row[primary]}</a>
<hr/>
</p>
</body>
</html>

And now we're able to meet our initial design:

joe@joe-laptop:~$ robaccia-admin createproject myprj
joe@joe-laptop:~$ cd myprj/
/home/joe/myprj
joe@joe-laptop:~/myprj$ robaccia-admin addmodelview employees
 created views/employees.py 
 created models/employees.py 
 created templates/employees/list.html 
 created templates/employees/retrieve.html 
 created templates/employees/get_edit_form.html 
 created templates/employees/get_new_form.html 
joe@joe-laptop:~/myprj$ vim models/employees.py
joe@joe-laptop:~/myprj$ cat models/employees.py
from sqlalchemy import Table, Column, Integer, String
import dbconfig
table = Table('employees', dbconfig.metadata,
        Column('id', Integer(), primary_key=True),
        Column('name', String(250)), 
        Column('title', String(250)),
        Column('office_number', Integer()),
        )

joe@joe-laptop:~/myprj$ robaccia-admin createdb
joe@joe-laptop:~/myprj$ robaccia-admin run
Serving HTTP on 0.0.0.0 port 3100 ...

The database is initially empty. Click on New to create a new member in the collection.

Web page: New

Creating a new employee.

Web page: Form for creating a new employee

Our collection list afer adding couple new employees. Note the link to individual employees.

Web page: List of two added employees

From an individual collection member we can edit and delete the employee.

Web page: A single employee

A form for editing the employee.

Web page: Editing a single employee

Lessons

We came up with a rather simple database to web framework, but in reality haven't lost any flexibility over the original Robaccia. All the pieces of URI dispatching, views and templates are present, all we did was pave the happy path.

We paved the happy path by embracing the following constraints:

  1. All resource fit into a collection.
  2. A single URI path structure.
  3. A fixed model interface: SQLAlchemy.
  4. A fixed project layout.
  5. WSGI as method of communication between components.
  6. A single media-type for any single view. (You can have different views with different media-types that use the same model.)
  7. Not really mentioned explicitly, but all the configuration was done through Python files.

Observations

The code so far has renderers and parsers for HTML and HTML forms, but we could easily add support for JSON or Atom.

There isn't much that ties us to SQLAlchemy; just DefaultModelDispatcher and the initial files put down by 'robaccia addmodelview'. We could easily switch out to another ORM, or even drop the RDBMS and move to another kind of data store.

There is a whole slew of stuff that hasn't been done like handling changes to the database, deployment, security, large collections that require paging, etc. Those are all solvable problems, and not the point of this exercise, which was to demonstrate applying constraints to reduce cognitive and manual load to create a Rails/Django-like web framework.

The code is now available on code.google.com. If you run it under Python 2.5 you will only need to install SQLAlchemy and Genshi. For Python 2.4 you will also need to install wsgiref.

I'm not quite to the point in my projects where I can leverage WSGI directly, so I've done something very similar within TurboGears. Take a look: http://www.graffitiweb.org/blog/2007/05/25/restresource-03-crud-for-turbogears/ or directly to the code at: http://microapps.googlecode.com/svn/restresource/tags/0.3/ One thing, I've been thinking about is a slight tweak to your WSGICollection url scheme where /col/1;edit_form should be /col;1/edit_form and /col/add_form. The advantage to this URL scheme is that context becomes a lot more clear cut. A post to "./" from the forms will always go to the right place--and that way the form doesn't need to know where it lives, so to speak.

Posted by sky on 2007-05-25

2007-05-24