BitWorking

This is Joe Gregorio's writings (archives), projects and status updates.

RESTify DayTrader

Let's RESTify DayTrader:

DayTrader is benchmark application built around the paradigm of an online stock trading system. Originally developed by IBM as the Trade Performance Benchmark Sample, DayTrader was donated to the Apache Geronimo community in 2005. The application allows users to login, view their portfolio, lookup stock quotes, and buy or sell stock shares.

Why build a RESTful web service for DayTrader? Because I frequently hear that REST can't be applied to complex situations. I also want to use the example as motivation for talking about some of the idioms that are available to handle more extensive requirements.

Here are the Business Operations that are offered by the DayTrader system.

Business Operations:

login
logout
buy
sell
getMarketSummary
queueOrder
completeOrder
cancelOrder
orderCompleted
getOrders
getClosedOrders
createQuote
getQuote
getAllQuotes
updateQuotePriceVolume
getHoldings
getHolding
getAccountData
getAccountProfileData
updateAccountProfile
register
resetTrade

We will create a RESTful interface into this system.

Let's pull the nouns out of the desciptions above.

Before we go further let's quickly review collections, a concept not only used in the APP but also in Rails (pfd):

Method URI                   Description
GET    /people               list members
POST   /people               create member
GET    /people/1             retrieve member
PUT    /people/1             update member
DELETE /people/1             delete member

That review will come in handy because lots of DayTrader can be viewed as collections. For example, here are the things you can do with an Order:

create an open_order
get all open orders
cancel an open order
get all closed orders

Now that's just a restriced collection. It also appears that there are two types of orders, open and closed, and each of those types of orders fit into a collection model. That is, there is a collection of open orders, and a collection of closed orders. The closed orders collection is also a read-only collection.

Similarly Quotes also fall neatly into a collection model.

Holdings look like a collection with a read-only interface, i.e. you can only list and retrieve collection members.

Accounts, not surprisingly, can also be viewed as a collection.

I'm going to gloss over the representations used in our service at this time. You could use Atom or JSON, and since things look like they are forming into collections then you could use APP or JEP to implement your collections.

We will also ignore authentication and authorization as they are orthogonal to our protocol. I.e. our protocol would work the same whether we used Digest, Basic over HTTPS, or Google Auth.

So what we have so far is that all of our objects can be organized into five collections. Let's give them a URI structure:

Accounts        /{acct_id}
Orders (Open)   /{acct_id}/open_orders/{order_id}
Orders (Closed) /{acct_id}/closed_orders/{order_id}
Quote           /{acct_id}/quotes/{quote_id}
Holdings        /{acct_id}/holdings/{holding_id}

We can see how these operations line up with the Business Operations given earlier. Note that profile data is considered another view into an account.

URI Method Collection Operation Business Operation
/acct/ POST accounts create register
/acct/{acct_id} GET accounts retrieve getAccountData
/acct/{acct_id};profile GET profile retrieve getAccountProfileData
/acct/{acct_id};profile PUT profile update updateAccountProfile
/{acct_id}/open_orders/ POST orders create buy/sell/queueOrder
/{acct_id}/open_orders/{order_id} DELETE orders delete cancelOrder
/{acct_id}/open_orders/ GET orders list getOrders
/{acct_id}/closed_orders/ GET orders list getClosedOrders
/{acct_id}/quotes/ POST quotes create createQuote
/{acct_id}/quotes/{quote_id} GET quotes retrieve getQuote
/{acct_id}/quotes/ GET quotes list getAllQuotes
/{acct_id}/quotes/{quote_id} PUT quotes update updateQuotePriceVolume
/{acct_id}/holdings/ GET holdings list getHoldings
/{acct_id}/holdings/{holding_id} GET holdings retrieve getHolding

So at least functionally we have everything covered.

Note that this also gives some insight into functionality that might be missing, and the obvious place to put that functionality. For example, on accounts we only have 'create' and 'retrieve'. There is no way to currently delete or update an account, the obvious place to locate that functionality would using DELETE or PUT on /acct/{acct_id} respectively.

So this rather complex interface boils down to five collections. Now let's dig into the tough stuff.

Orders should be "reliable"

This is rather weighty stuff, and you don't want a single order missed, nor do you want the same order entered twice. First realize that this sort of reliability is only needed for open_orders, and we dont' need it for accounts, quotes, closed_orders, or holdings.

What guranatees we are looking for?

  1. A single order is only added once.
  2. If there is a failure we can try again.

The answer is suprisingly easy and revolves around making sure each new order is a unique request. For example, if we were using HTML forms then we could create 'tickets' on the server side that get included as a hidden input in the form for creating a new order. If an order creation request is received with a ticket you've already seen then you can discard the request as a duplicate. As long as the client keeps trying to submit the form until it is successful then you have met the requirements.

Here is the method applied in a way that's amenable to HTML forms:

Client                       'ticket' collection
  -------------- POST --------------->
  <------------ ticket ---------------
Client                      'open_orders' collection
   -------------- POST ------------->
            (order + ticket)
   <--------- 201 Created -----------
       (Location: open order URI)

Of course, this works for forms since the user can always press the 'back' button for a request that has failed and re-submit, or press F5-Refresh and say "yes" to the little dialog that asks if you want to re-submit the POST data. Now that is necessary since HTML only supports GET and POST, and a browser has no idea if the POST you happen to be doing is idempotent. Another variation on this is to incorporate the ticket into the URI that the form POSTs to, and not place it in a hidden form control.

Since we are creating a web service we can actually construct the service a little differently, instead of creating a ticket that goes into the data to be submitted, we can use the ticket to construct a 'pending_order', the URI of which the client will PUT the order to. Upon a successful PUT the pending_order will be moved to the open_orders collection and the client can be redirected to its location via a (303 See Other) response.

Client                       'pending order' collection
  -------------------- POST --------------->
  <--------------- 201 Created -------------
          (Location: pending order)
Client                      'pending order'
   -------------- PUT ------------->
                (order)
   <--------- 303 See Other --------
       (Location: open order URI)

This is better since PUT is idempotent and you can keep trying until the request is successful. Once the PUT is successful the client is redirected to get the newly created resource via GET from the URI returned in the Location: header of the response.

That's the simplified high-level overview. For that to really work we need to dig into some details.

Details

To our list of collections we need to add the 'pending_orders' collection. To create a pending order you POST to the collection, which creates a new pending_order, and the URI of that newly created order is returned in the Location: header of the 201 Created response.

URI Method Collection Operation Business Operation
/{acct_id}/pending_orders/ POST pending_orders create create pending order
/{acct_id}/pending_orders/{ticket} PUT pending_orders update update empty pending order

Note that I've used {ticket} instead of {pending_order_id}, since {ticket} is unique and to reinforce where the id of pending_order collection members come from.

Uniqueness

For a ticket scheme to work you need your tickets to have these four properties:

  1. They must be unique.
  2. Require zero server side storage for 'open' tickets.
  3. Tickets must not be forgeable.
  4. Successfully used tickets must be stored, and associated with the created order.

Obviously #1 follows from our requirements. Property #2 comes from the fact that we may be facing either a malicious or poorly written client and we want to avoid a resource exhaustion attack where a client requests many new tickets. If an open ticket, one that hasn't been used yet, requires storage on the server side then you open yourself up to such a resource exhaustion attack. Note that we say the successfully used tickets get stored, which is different from storing all open tickets, since we are recording a successful order when we do that, not just creating a throwaway ticket.

To create such a storageless ticket we need to create something unique, such as a GUID, UUID, etc. But we also need to determine real tickets from bogus tickets, so we need to form them so that only the server can generate and validate them. So we will start with a UUID. To make sure only the server can generate tickets, we will add a hash to the end of our ticket, and that hash value will be formed using the UUID and a secret only the server knows.

In Python:


uuid = uuid.uuid5(uuid.NAMESPACE_DNS, 'bitworking.org')
secret = "some_secret_key_only_the_server_knows"
B = uuid + ":" + secret
hash = sha.sha(B).hexdigest()
ticket = uuid + ":" + hash

We can use this ticket as the member id in our pending_orders collection and can validate it easily on the server by using B = uuid + ":" + secret to recreate B and then use that B to validate the supplied hash. If that is good and we haven't seen that ticket before then we can proceed to create the order.

Retry

Any response from the server is good, as it gives the client information about the request. So the only real problem is if a request fails without getting a response from the server. In that case we would like to be able to re-try the operation. If we retry we would like to be assured that if the previous request was successful, but we didn't receive the response, a duplicate order will not be created.

It looks like there are four possible states we can be in after sending a request to create a new order:

  1. We didn't get a response
  2. Order created (303 See Other)
  3. Order was created during a previous request. (303 See Other)
  4. Some othere response with a status code indicating the request was invalid, malformed, unauthorized, etc.

From the clients perspective they see only three possible states. A non-response, which means that they can try the operation again, since PUT is indempotent. In the case of a successful, or duplicate, request the client will be directed to the corresponding open_order. In all other cases the client will have to do 'the right thing' based on the HTTP status code.

The Ticket Collection

We get a benefit from treating tickets as members of a collection. Each has a URL, and even though we never store the unused tickets, we can still return a 200 for valid open tickets, and 404s for tickets that have actually been used (remember that we have to store them). We can return a 200 for valid open tickets since the server can validate the ticket and can then confirm that the ticket has not been previously used. We could also allow DELETE on a pending_order to invalidate them, and we could allow GET on the pending_order collection to retrieve a list of all pending_orders.

For more reading on this subject you can look at the variations on the above theme in:

Hopefullly what you will take away from this example is not the explicit utility of building a RESTful interface to DayTrader, but that REST can be applied to complex scenarios, that the collection model can make such modelling easier, and that HTTP does offer mechanisms for reliability. I hope it goes without saying that now that you have a RESTful interface you can start taking advantage of other HTTP mechanisms like caching, etags, and gzip.

Good article, thank you. On my current RESTful project, I think we have achieved the ideal, of which I am really thrilled. The complete set of state information for each individual client is passed to the server with each invocation to the server. Sounds like too much data flying back and forth, but after we finished the infrastructure, it turned out not to be true. The JSON data is a manageable size, and transaction states do not need to be saved. When I designed and wrote the back end framework, I strictly adhered to this philosophy, and found that I did not need a 'pending' transaction queue. The Dojo front end receives complete, updated state data from the server after each transaction. Therefore the front end knows if the transaction needs to be queued and retried. I realize that this implementation is front-end heavy, and maybe Dojo makes this design more suitable. But it works out very well for this app, and maybe other transaction based apps as well? I could not imagine trying to queue and retry front end transactions without a tool such as Dojo. Using just HTTP protocol seems to push more of the state data for every client down to the server, which means much more complexity. I'd love for you to talk more about how you resolve this problem in HTTP. I am sure the problem varies based on data set, but hearing about this would be interesting. Thanks for opening this conversation. Gloria

Posted by gloriajw on 2007-06-16

Gloria,

Thanks, I'm glad you liked the article.

I could not imagine trying to queue and retry front end transactions without a tool such as Dojo. Using just HTTP protocol seems to push more of the state data for every client down to the server, which means much more complexity. I'd love for you to talk more about how you resolve this problem in HTTP. I am sure the problem varies based on data set, but hearing about this would be interesting.

I'd love to hear more specifics about the application you're building, or at least the specific scenario you're talking about, can you provide more details or provide a link to something online?

Posted by Joe on 2007-06-16

Oh, I wish I could, but I'm under NDA. We will get to brag and blog about it soon, but not right now. Let's say your transaction fails in the middle of posting to the server. How do you recover from this state using just HTTP, without storing pending or partial transaction info for every client, on the server? We solved this problem with Dojo, which is robust enough to determine state info and queues transactions on the front end. This means the server treats every transaction as a complete entity, with state info and all. If it fails, the onus is on the front end to queue, retry, and remember it's state. The global server state is always reported up, of course, to all clients. How do you solve it without the heavy js front end, and without the complex back end? Thanks, Gloria

Posted by gloriajw on 2007-06-16

Nice post indeed. I do, however, have a few remarks on how this REST mapping could be improved.

First off, us Rails folk have gone back to using slash instead of semicolon for various reasons. But that's a matter of preference. And similarly, I would go for accounts instead of acct.

Without knowing too much about how DayTrader works, I think it is safe to assume you can't just place an order under someone else's account. Because you already get the account information from the authentication, prefixing actions with acct_id is probably redundant.

As for the reliability and tickets, the easiest way to implement this is to just let the client generate the UUID as the ticket only needs to be unique in the scope of the account. Otherwise, I'd probably prefer "muxing" them as proposed in the third link you posted on the subject.

If for some reason it would be desirable to generate tickets through a different path, doing so through a separate /tickets collection is, in my opinion, cleaner because it is more decoupled and that way the same ticket interface could be reused for other resources too. And you're not actually creating a pending_order when sending a POST to /pending_orders, you just want a token.

Finally, instead of having a separate collection for every order status I would strongly recommend placing them all under /orders and implementing a ?status GET switch on the list action. When an order gets closed, it does not actually become a different order and hence it should retain the same URI. This would also allow you to get the status of a specific order by sending a GET to it's location.

Thanks to you (and Thijs, through our blog over at Fingertips) for giving me some interesting material to ponder late at night.

Posted by Norbert Crombach on 2007-06-16

Norbert,

Because you already get the account information from the authentication, prefixing actions with acct_id is probably redundant.

Each person's account is a separate resource and should have a separate URI. This allows me to do things like distribute load among servers, use a proxy cache in front of my main server, and to even change the auth mechanism on an account by account basis. Also, having what amounts to an accounts collection allows you add in new operations, such as deleting an account in their obvious places (DELETE /{acct}/), which might be done by an administrative user. This is something I addressed in the article when I said that the RESTful modeling helps your design.

As for the reliability and tickets, the easiest way to implement this is to just let the client generate the UUID as the ticket only needs to be unique in the scope of the account.

Except for the part where I addressed that in the article when I said I didn't trust the client.

And you're not actually creating a pending_order when sending a POST to /pending_orders, you just want a token.

Actually, I am creating a new resource when I post to /pending_orders, one that accepts PUTs. It's not just the token, it's the whole PUT idempotence thing, something I addressed repeatedly in the article.

Finally, instead of having a separate collection for every order status I would strongly recommend placing them all under /orders and implementing a ?status GET switch on the list action.

That is one way to do it. But that has the drawback of making the semantics a little muddled since you can delete open orders but can't delete closed orders.

Posted by joe on 2007-06-16

Joe,

Each person's account is a separate resource and should have a separate URI. ... Also, having what amounts to an accounts collection allows you add in new operations, such as deleting an account in their obvious places (DELETE /{acct}/), which might be done by an administrative user.

Absolutely. The accounts themselves are obviously resources and should be treated as such.

What I meant was that specifying the acct_id does not make sense for quotes, as quotes are not linked to a specific account. For resources like orders and holdings, this could be done either way, since they should have some kind of system wide unique identifier.

This allows me to do things like distribute load among servers, use a proxy cache in front of my main server, ...

As REST is inherently stateless it should be possible to do so in most cases, regardless of which URI layout is used.

Except for the part where I addressed that in the article when I said I didn't trust the client.

The only mention of this I can find is related to the "zero server side storage" requirement, which is satisfied by what I proposed. Because the tickets are meant as a protection for the client and not so much for the server I believe it is the responsibility of the client to handle them properly. Then again, for sensitive applications like stock trading, it might indeed be a good idea not to trust the client side implementation.

Actually, I am creating a new resource when I post to /pending_orders, one that accepts PUTs. It's not just the token, it's the whole PUT idempotence thing, something I addressed repeatedly in the article.

Perhaps I should have read those parts a little better before responding the first time, especially the piece under "The Ticket Collection". You're right, the pending_orders do really act like "normal" resources.

The point about decoupling still holds true. And I would much prefer to POST to /orders with a ticket than using the /pending_orders collection for creating a new order. That way you can return a 201 for the newly opened order as well.

That is one way to do it. But that has the drawback of making the semantics a little muddled since you can delete open orders but can't delete closed orders.

This is something I have a very strong opinion on. Sending a DELETE to a closed order should just return a 422 or so, that's not really a problem. Persistent URIs are extremely important for workable REST style architectures though.

Posted by Norbert Crombach on 2007-06-16

Thank you for an excellent article. It helped to clarify several things for me. I think that I am finally wrapping my head around the REST approach. However, I have an implementation issue (in Rails) on which I would appreciate feedback.

What is the recommended way to handle views of the data? For example, suppose you were going to present the quotes data in a list view, a summary view, and a detail view. The underlying data is the same, but the presentation is different. How would you form the URI to indicate the different views? Would you simply append a variable to the URI indicating the desired view? (e.g. something like: /quotes?view=list). I am leaning toward this approach since search will be implemented as /quotes;search?q=searchterm (unless you have a better suggestion for this as well :)

I haven't yet found an approach that feels right.

Posted by Mark on 2007-06-18

Joe,

When I posted before, I realized that the REST philosophy as it was taught to me was not correct. I now know that the REST philosophy only applies to the communication between front and back end, and not the state handling of each.

Realizing this, I understand why you have this pending transaction server implementation. But I still wonder if it's more robust in this particular model to push the state handling into the front end using js, and use the back end as the validation/processing without storing partial/incomplete individual client transactions.

This way there is no cleanup/recovery time for for the server to find pending transactions, match them up with new ones, handle them correctly. A simple check for dups based on session id and front-end generated unique transaction id happens, and then it's done. If the transaction fails, the server records nothing, much the same as the philosophy behind a database transaction.

I am curious to know your thoughts regarding this, since my judgment has been skewed by some outstanding Dojo developers.

Thank you,
Gloria

Posted by gloriajw on 2007-06-18

Mark,

I don't use Rails myself, but I can point you to work I've done in Robaccia where I use a trailing ";{word}" on the path to select different views.

Posted by joe on 2007-06-18

Gloria,

I am going to restrict my comments to this particular example of once-and-only-once creation. For a more general write-up of benefits of doing things RESTfully versus a more traditional RPC approach I'll point to an older article of mine: REST and WS-*.

This way there is no cleanup/recovery time for for the server to find pending transactions...

That doesn't exist in the pending_order solution either. The members of the 'pending_order' collection are merely algorithmic, they require no storage on the server, they are merely single points in an infinite URI space.

Posted by joe on 2007-06-19

Excellent article. Given that you're a fan of collection resources, too, I wonder what your opinion is on overlapping sets - e.g. if I have {acct_id}/orders (representing all orders for acct_id), I might also have /orders (representing all orders for all accounts). Now what's the best way to handle the fact that /orders/13 might be the same as /stefan/orders/47? (Or should that rather be /orders/47 and stefan/orders/47?) Should there be two URIs for the same resources, or should one redirect to the other? Currently, I'm using the approach of having {acct_id}/orders return a list of orders that include links to /orders/{id}.

Posted by Stefan Tilkov on 2007-06-19

The arguments I've made in other venues (and have heard others make) focus more on how the URL is (or is not) suited for encoding complex data, not whether a REST service can handle a complex set of methods. The above is a good example of clean REST application, but the data you have to deal with just isn't that complex, either in terms of what you send or what you receive in response. But when REST-enthusiasts turn away tools like XML Schema and/or WSDL (never mind that they're looking at their own description language in WADL currently), you might also end up writing new code for each project just to handle the data<->XML transitions.

Also, I have to say that while I'm sure there are WS-* fans who are vehemently anti-REST, most that I've dealt with directly feel that both tools are useful; use the tool that fits the job, don't try to fit the job to the tool. The only people I've encountered who insist that the other side is useless crap are certain REST folks.

(I'm not saying this article pushes that agenda, like I said earlier I think it's a good illustration of tackling a richer API with REST. And I did see the earlier post about the snide cracks in the W3C article, but I wouldn't say that's the same as the person who said to me, "anyone trying to convince you to ever use SOAP or WS-* is only trying to sell you something.")

Posted by Randy on 2007-06-19

Randy,

The arguments I've made in other venues (and have heard others make) focus more on how the URL is (or is not) suited for encoding complex data, not whether a REST service can handle a complex set of methods.

I don't believe I've encouraged, in this article, or anywhere else, the encoding of all information in the URI. Believing that REST is somehow defined by packing information into a URI is a rather simplistic, and wrong, strawman.

The above is a good example of clean REST application, but the data you have to deal with just isn't that complex, either in terms of what you send or what you receive in response.

I initially tackled this example because a frequent complaint against REST was that it was only good for simple examples. This is a moderately complex example. I have no intention of getting dragged into an endless game of guess-what-I-think-is-complex-this-week.

never mind that they're looking at their own description language in WADL currently

Watch who you're painting with that broad brush, I'm not a fan of WADL.

Also, I have to say that while I'm sure there are WS-* fans who are vehemently anti-REST, most that I've dealt with directly feel that both tools are useful; use the tool that fits the job, don't try to fit the job to the tool.

I think you missed a previous article of mine.

Posted by joe on 2007-06-19

Stefan,

Now what's the best way to handle the fact that /orders/13 might be the same as /stefan/orders/47?

I would try to avoid having more than one URI for the same resource if possible, but there's probably cases where it can't be avoided. It's one of those things where you may have to weigh keeping the interface rational versus cachability. I.e. having more than one URI may make sense for a specific application, but the more URIs there are the worse it is for caching.

Currently, I'm using the approach of having {acct_id}/orders return a list of orders that include links to /orders/{id}.

Yes, that's a good solution. Now that doesn't mean that a list of orders can't contain the complete information for each order, just that the list should also include the URI of each order resource. Balancing the granularity of your resources is another application specific exercise.

Posted by joe on 2007-06-19

2007-06-15