BitWorking

This is Joe Gregorio's writings (archives), projects and status updates.

Why so many Python web frameworks?

When asked about the plethora of web frameworks for Python the answer is often that it is way too easy to put together one in Python. That certainly seems plausible since there are so many libraries that implement the components of a web framework and if it's easy to plug those pieces together then maybe that lowers the bar of entry for new frameworks. So let's give it a shot, we'll pick some components and spend a couple hours seeing how far we can get building a web framework, which we'll call Robaccia.

Executive Summary: Robaccia was built in three hours and a total of 60 lines of Python code.

[Update: Add a link to the WSGI Wiki and cleaned up some typos. And yes, robaccia.py could be even shorter if I had used the mimetypes module.]

For each type of library we are going to need I will choose just one. Because I have to. Does that mean that's the library I prefer, or that the other ones are not good? No. It means I had to choose one. Please don't feel slighted if I didn't choose your favorite templating/routing/sql library.

Templating
There are quite a few templating libraries available for Python, such as Myghty, Cheetah, etc. I chose Kid; "a simple template language for XML based vocabularies".
SQL
For interfacing to the database I chose SQLAlchemy. There are others like SQLObject.
Routing
We need some way to route incoming HTTP requests to the right handlers. For this I chose Selector. Again, there are other options in the Python universe like Routes.
WSGI
WSGI, as defined by PEP 333, is the conceptual glue that holds this all together. The best way to think of WSGI is as the Java servlet API for Python. It is a standard interface between web servers and Python web applications or frameworks, to promote web application portability across a variety of web servers. You can learn more about WSGI and find servers, frameworks, middleware, etc. on the WSGI Wiki

Now that we have all of our components let's start plugging them together.

Actually, at this point you should probably go off and run through the Django tutorial if you haven't already, to give you an idea of what we are aiming for, not that we are going to get anywhere close to the fit and finish of Django.

We're going to follow the classic model/view/controller paradigm, but in the case of web frameworks it is more like model/view/template/dispatcher, so every application will have four required files: model.py, view.py, urls.py and a templates directory. Let's throw in one more file, dbconfig.py that allows you to setup access to your database.

What we'll do is start building a weblog application from these pieces but being very careful about what lands in the application and what becomes part of the framework. The first thing we need to create is a model, which we will do using SQLAlchemy, and capture in model.py.

model.py

from sqlalchemy import Table, Column, String
import dbconfig

entry_table = Table('entry', dbconfig.metadata,
             Column('id', String(100), primary_key=True),
             Column('title', String(100)),
             Column('content', String(30000)),
             Column('updated', String(20), index=True)
         )

Now that's pure a Python description of our model, and the configuration in dbconfig.py is equally simple.

dbconfig.py

from sqlalchemy import *

metadata = BoundMetaData('sqlite:///tutorial.db')

One of the first things you do in the Django tutorial is use such a model to actually create the tables in the database. We'll do the same here, with 'manage.py' which is the first thing in our Robaccia framework.

manage.py

import os, sys

def create():
    from sqlalchemy import Table
    import model
    for (name, table) in vars(model).iteritems():
        if isinstance(table, Table):
            table.create()

if __name__ == "__main__":
   if 'create' in sys.argv:
        create()

Which we can now use to create the database.

$ python manage.py create
$

Now that our database table is created we can go into the Python interpreter and manipulate the data via the 'model' module. Note that we could have also gone into the interpreter to create the table, but that's not normally how you would proceed. In the interpreter session below we add two rows to the table.

$ python
Python 2.4.3 (#2, Apr 27 2006, 14:43:58)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import model
>>> i = model.entry_table.insert()
>>> i.execute(id='first-post', title="Some Title", content="Some pithy text...",  
   updated="2006-09-01T01:00:00Z")

>>> i.execute(id='second-post', title="Moving On", content="Some not so pithy words...",  
   updated="2006-09-01T01:01:00Z")

>>>

Now we have a model with some data in it, time to introduce the URLs and the views. The urls.py file contains information on how the incoming requests are to be routed to views, and view.py contains all those view targets.

urls.py

import selector
import view

urls = selector.Selector()
urls.add('/blog/', GET=view.list)
urls.add('/blog/{id}/', GET=view.member_get)
urls.add('/blog/;create_form', POST=view.create, GET=view.list)
urls.add('/blog/{id}/;edit_form', GET=view.member_get, POST=view.member_update)

Selector maps URIs to views. If an incoming request has a URI that matches then the request gets dispatched to the associated handler. Both Selector and the handler are WSGI compliant objects, which will make plugging all this together much easier.

view.py


import robaccia
import model

def list(environ, start_response):
    rows = model.entry_table.select().execute()
    return robaccia.render(start_response, 'list.html', locals())

def member_get(environ, start_response):
    id = environ['selector.vars']['id']
    row = model.entry_table.select(model.entry_table.c.id==id).execute().fetchone()
    return robaccia.render(start_response, 'entry.html', locals())

def create(environ, start_response):
    pass
def create_form(environ, start_response):
    pass
def member_edit_form(environ, start_response):
    pass
def member_update(environ, start_response):
    pass

Note that in the above code only list() and member_get() are implemented.

In my first implementation the view handlers originally did the rendering of the templates themselves and then put everything together to fit into the WSGI model, but that was just repeated code for every view, so that code got factored out into our second piece of Robaccia:

robaccia.py

import kid
import os

extensions = {
    'html': 'text/html',
    'atom': 'application/atom+xml'
}

def render(start_response, template_file, vars):
    ext = template_file.rsplit(".")
    contenttype = "text/html"
if len(ext) > 1 and (ext[1] in extensions):
        contenttype = extensions[ext[1]]

    template = kid.Template(file=os.path.join('templates', template_file), **vars)
    body = template.serialize(encoding='utf-8')

    start_response("200 OK", [('Content-Type', contenttype)])
    return [body]

The render() function looks at the extension of the template and uses that to determine what to use as the content-type. Then the template and variables are passed into Kid to be processed. The whole thing is processed and returned in a way that conforms to WSGI. Here is the list.html template:

list.html

<?xml version="1.0" encoding="utf-8"?>
<html xmlns:py="http://purl.org/kid/ns#>">
<head>
 <title>A Robaccia Blog</title> 
 </head>
<div py:for="row in rows.fetchall()">
<h2>${row.title}</h2>
<div>${row.content}</div>
<p><a href="./${row.id}/">${row.updated}</a></p>
</div>
</html>

So let's take stock of where we are, urls.urls is a WSGI compliant application that looks at the incoming calls and dispatches to the WSGI compliant applications listed in view.py. Each of those is turn use the model in model.py and pass the results through templates in the templates directory to generate the responses.

Now all we need to do is run the code. Since we are dealing with WSGI applications we can use wsgiref. Let's add a 'run' option to manage.py.

manage.py

import os, sys

def create():
    from sqlalchemy import Table
    import model
    for (name, table) in vars(model).iteritems():
        if isinstance(table, Table):
            table.create()

def run():
    import urls
    if os.environ.get("REQUEST_METHOD", ""):
        from wsgiref.handlers import BaseCGIHandler
        BaseCGIHandler(sys.stdin, sys.stdout, sys.stderr, os.environ).run(urls.urls)
    else:
        from wsgiref.simple_server import WSGIServer, WSGIRequestHandler
        httpd = WSGIServer(('', 8080), WSGIRequestHandler)
        httpd.set_app(urls.urls)
        print "Serving HTTP on %s port %s ..." % httpd.socket.getsockname()
        httpd.serve_forever()

if __name__ == "__main__":
   if 'create' in sys.argv:
        create()
   if 'run' in sys.argv:
        run()

The run() function looks at the environment variables to determine if it is being run as a CGI application, otherwise it runs the application under it's own server at port 8080.

$ python manage.py run
Serving HTTP on 0.0.0.0 port 8080 ...

Point your browser at http://localhost:8080/blog/ and you should get the blog's main page, the list.html template filled in with the two entries we put in the system earlier. That's it, our application is running and our framework is functional.

And what if we want to run our application via CGI? That file is just a few lines long:

main.cgi

#!/usr/bin/python2.4
import manage
manage.run()

Summary

So what do we have here? A set of conventions for how to lay out files in a directory:

Beyond those files which actually implement our example web service we have manage.py, main.cgi, and robaccia.py, the sum total of our framework code, which comes to about 60 lines of code. That's not a lot of glue code to bring four powerful libraries like SQLAlchemy, Kid, Selector, and WSGIref together. And because we used WSGI throughout we can easily plug in WSGI pieces that handle authentication, caching, logging, etc.

Now let's be clear also about what we do not have when compared to Django. We don't have an instant admin interface, we don't have generic views, automatic form generation, automatic form handling, the Django community, bug tracking, IRC, etc, etc.

What I want to draw your attention to is the touch-points between the major components. How much code did we have to write to make the data model consumable by Kid templates? None. How much translation code did we have to write to hook our WSGI views into Selector? None. And how much code did we have to write to pull information out of URLs and use them in pulling information out of our model? About one line:

id = environ['selector.vars']['id']. 

The nice part about the ocean of components that exists for building Python web frameworks is that the same is true for all of them: they would only require a small amount of glue code. Our little framework would be about the same size if I had instead chosen SQLObject, Cheetah and Routes.

Oh yeah, did I tell you why I chose the name Robaccia? It means trash in Italian. It's a throw away. So go on, get out of here, go work on one of the dozens of already established web frameworks for Python.

2006-09-05