Google App Engine

I've just started as a Developer Advocate for Google App Engine and there's been a lot of talk of "lock-in" recently. Google will have more to say officially in the future, as obviously my blog isn't the place to read about Google's official position, but I would like to point out the following things that have come up in conversation that people didn't know:

The development kit is open sourced.
All of the documentation for App Engine is CC licensed.
Much of the underlying technology has been described in academic publications.
Some of that underlying technology has been open sourced, such as protocol buffers.

Also, you can see that other issues are on the roadmap to be addressed, such as bulk upload and download.

I've played around with App Engine and have an app that I use for tracking my time. You are correct that you aren't technically locked in, but from a practical point of view, you are. For example, GFS and BigTable are well described and even have some open source implementations. But it would take a lot of effort for the average development shop to replicate those technologies. So if your application really was getting a high volume of usage it would be a lot of effort to port it to using a "more standard" backend like Oracle or Mysql.

Posted by Dave Tauzell on 2008-11-06

It would certainly suit Google to have available a version of dev_appserver that could conceivably be used for production — pointing to the "ported to EC2" nonsense (as I've had Googlers on the AppEngine team do) just seems disingenuous.

You could let someone else do it, but that doesn't inspire anywhere near the same confidence that a Google-maintained project would. It would be quite helpful for testing — because of the lack of concurrency some types of applications are simply unusable on dev_appserver. A hypothetical prod_appserver would have to include:

A superficial clone of "Google Frontend", reverse-proxying a maintained pool of wsgi servers
A datastore clone that would allow some concurrent access: it doesn't have to be fast (it'd be best if it performed with the same pokey profile as the real one). Network transparency would be nice but not urgently needed.
The ability to connect to a native memcached server
A realistic replacement for the Users API, preferably OpenID-based (it'd be nice to not have to make a bunch of Google Accounts for testing)

It wouldn't have to be anywhere near as polished or 'scalable' as the real AppEngine: just close enough for real testing, just scalable enough to be respond to simultaneous requests, just supported closely enough to appear sincere.

Posted by Fred Blasdel on 2008-11-06

If the App Engine team really can't see how incredibly proprietary App Engine appears to outsiders, y'all have a big problem. I've had a couple of discussions about this with folks at Google, and I'm pretty disappointed at the disingenuous tone about lock-in I've gotten so far. The general answer seems to be "There's no lock-in: it's almost all Open Source!"

Unfortunately, the "lock-in" problem has less to do with open source than it does with the simple fact that App Engine is drastically different from any other sort of application hosting available anywhere.

It's not that it's hard to take an application written against App Engine elsewhere; it's literally impossible:

Nothing else on the planet (well, nothing approaching production quality) will start WSGI servers given app.yaml, nor is the a spec or a schema describing how some hypothetical app.yaml server would work.
The datastore API runs against something (BigTable) that I can't buy at any price from anywhere but Google.
Worse, the non-relational-database world is even more proprietary and incompatible than the relational world. If I decide to ditch Oracle I've got to rewrite a bunch of SQL queries... but at least both use SQL! I literally would have to rewrite my application from scratch if I wanted to replace App Engine's datastore with, say, CouchDB.
Worse still, bulk data export only starts to solve the problem. In practice I've had to tweak my model design significantly to make it perform well on App Engine; this means that my very schema is tied to App Engine's behavior in uncomfortable ways. I suspect this is another shakeout of the complete lack of standardization in the non-relational database world.
The authentication API only works with Google accounts, and it seems to access them using some "shared knowledge" that I can't duplicate outside of App Engine.
I can't use urllib or anything else that talks to sockets (httplib2!); I have to use urlfetch. Yes, urlfetch is open source... but am I really going to write code against that API not intended for App Engine?

So look. I obviously have a horse in this race and want App Engine to do well. More than that, it's damned cool tech and I want to be able to use it. I very well might be able to ignore the lock-in stuff given how nice App Engine promises to be. But the first step needs to be some real honesty about the unprecedented levels of lock-in App Engine represents.

On preview: this comes across as pretty harsh, and I'm sorry for that. I really want to like App Engine, but I'm just very frustrated that an honest discussion about lock-in really isn't happening. With luck this blog post will be the beginnings of that discussion!

Posted by Jacob Kaplan-Moss on 2008-11-06

Jacob, I don't think your comment is entirely straightforward either — the whole point of using a non-relational database is that it does not pretend to be general purpose, that it applies hard constraints to provide an abstraction specific to your usage. Of course each non-relational database is going to be a special snowflake — that's why we're using them! A good chunk of Django's charm stems from its ORM, but it's so obviated on AppEngine — and I think that's a good thing overall.

I don't agree with you about the Users API, especially when Google has just committed to providing OpenIDv2 authentication for Google Accounts — the only thing that is not straightforwardly replicable is the full functionality of the nickname() instance method on User objects.

I also don't begrudge the need for urlfetch, as the design of AppEngine totally precludes the availability of vanilla sockets. Their intention is for all traffic to be via HTTP: buffered, proxied, and accounted for via Google Frontend. If they let you at raw sockets, you could get up to all sorts of trouble! Their urlfetch is certainly not a perfect solution (no way to set a timeout), but it is not awful. Anything you could do with sockets but not with urlfetch is disallowed for a reason.

A large part of why progress on AppEngine is so slow-going is because the group working on it is kept small, yet I wouldn't really have it any other way — it's certainly preferable to the confused mess of marketing materials that is Windows Azure!

Posted by Fred Blasdel on 2008-11-06

I agree with the tone of Jacob;

There are projects for small non-profits, et al that I would love to host on GAE due to the great availability, cost, and performance factors it can bring them.

However after having taken a harder look at GAE I have severe reservations. Overall it really feels incomplete and that in of itself gives me a lot of insight on the App Engine team's culture. The attitude impressed upon me is one of "Do it our way.. the google way.. the only way". It's also pretty clear that nobody with an outside perspective was asked for feedback.

Honestly Joe while I appreciate your sentiment which I believe is genuine; I gotta be honest that I chuckled at these two points:

Much of the underlying technology has been described in academic publications.
Some of that underlying technology has been open sourced, such as protocol buffers.

Seriously? That just amplifies the concerns as far as I'm concerned. "Some is open source" and "much is in academic publications". Wow! I'm the first to say I'm no academic but the majority of technical academic publications I've read are pretty vague to the point that duplicating their efforts without cooperation would take significant effort - certainly beyond the scope of what the typical developer can do. So really? This is supposed to salve concerns?

Which brings me to another point - what is the target audience of Google App Engine? Cute django apps? It's way too proprietary for me to recommend it to anyone except for the most fickle and transient web applications.

I guess what it really comes down to for me is frankly the sheer arrogance of it. Not the individual persons who I am certain are well-intentioned and believe in their projects - but the notion that we are all supposed to just trust Google and build infrastructure on systems with no migration path off whatsoever.

I'm happy to use GMail to host my private domains; it works great because it has high availability, a great UI, and lets me connect to it in all sorts of ways! Blackberry, Web, IMAP, et al. Supports all the stuff I want like SSL, etc. These were bigger factors in my decision to use it than cost. It also has great anti-spam filters, et al. Best of all if I choose in the future to migrate away I can change a few MX records; dump all of my email off via IMAP/POP/etc and be done with it.

In this way App Engine is a big contrast and a huge step backwards towards obfuscated proprietary systems and massive massive vendor lock-in.

- bri

Posted by brian on 2008-11-08

Until there’s another provider and some people have successfully migrated their App Engine apps to that other service, there’s lock-in for practical purposes.

That said, the app I’m developing has neither a database nor user logins, so it is not as lock-in prone as many other apps. However, it’s written in Java… (Yes, I realize that offering the full JVM functionality on App Engine would be very problematic and limiting the functionality would be problematic, too, when existing code expects to be able to do stuff.)

Posted by Henri Sivonen on 2008-11-09

Jacob,

I understand your frustration. I do want this to begin the conversation, and I hope to have some more news to share in the coming days.

Nothing else on the planet (well, nothing approaching production quality) will start WSGI servers given app.yaml, nor is the a spec or a schema describing how some hypothetical app.yaml server would work.

Here is the documentation for app.yaml, though I'm not sure that addresses your concern completely.

The datastore API runs against something (BigTable) that I can't buy at any price from anywhere but Google.

Not quite true.

Worse, the non-relational-database world is even more proprietary and incompatible than the relational world. If I decide to ditch Oracle I've got to rewrite a bunch of SQL queries... but at least both use SQL! I literally would have to rewrite my application from scratch if I wanted to replace App Engine's datastore with, say, CouchDB.

This is something (MegaData) that I've been talking about for a while now, and it's one of the reasons that I was drawn to App Engine. The point I've been making is that datastores built to manage huge amounts of data work differently from a traditional RDBMS. The general-purpose-RDBMS-all-your-data-in-fourth-normal-form does not scale. The reality is that there is no one-right-way to handle large amounts of data today; Google has BigTable and that's the model exposed in App Engine, Amazon has exposed their model with S3, are there are plenty of other competing ideas in this space, such as CouchDB, the streaming database Michael Stonebraker talks about in his paper, "'One size fits all': an idea whose time has come and gone", and who-knows-what is ultimately going to be built with Drizzle.

While all of them are different, there are some underlying commonalities that will be surprising and uncomfortable if you are coming from a relational background, such as a restricted scope for transactions, a lack of joins, and the accompanying denormalization of data. That's because you can't do those things efficiently in a general way across a large number of machines. Given that, are there idioms and best practices that are common across all these systems? Could you build an abstraction layer that worked across all of them? I don't know the answers to those questions. I do know it's different, I do know it can be frustrating, and I do want to work on bridging that gap between RDBMS's and MegaData datastores.

Posted by Joe Gregorio on 2008-11-10

I do want this to begin the conversation, and I hope to have some more news to share in the coming days.

Extremely good news! As far as I'm concerned, as long as there's an open conversation about the issues (and benefits) that's a real win even if there's stuff that still needs to be private. I do prefer the "don't announce until it works" approach over the hype and vaporware that dominates so much of the industry. But when "don't hype" becomes an excuse to not discuss shortcomings (*looks at Apple*)... that's bad.

I'm very much looking forward to seeing what's next... these are certainly interesting times to be a web developer.

Could you build an abstraction layer that worked across all of [the non-relational databases]?

I think you can*. If you look at the internals of relational databases, you'll notice that they by no means work the same way. Even databases that implement common patterns (e.g. MVCC) do so in radically different ways. However, they do expose a (mostly) common interface in the form of SQL.

Now, anyone who's worked with relational database for some time knows that even if you ignore the different SQL dialects there's dramatic differences in performance and semantics across different engines. Even something seemingly simple like SELECT COUNT(*) has radically different characteristics in different databases. Like all abstractions, SQL leaks.

But this isn't a problem -- or, more accurately, the benefits of a common interface outweigh the drawbacks. You write your SQL against the idealized abstract interface, and then when it starts hurting you dive in and optomize for your specific situation.

I see no reason why we can't treat MegaData stores (nice term, BTW) in a similar manner. Expose a common interface that works "most of the time," and then just dive down and work at a lower level when the abstraction leaks.

* At least, I'm giving it a shot in Django. We'll see where it goes.

Posted by Jacob Kaplan-Moss on 2008-11-11

Jabob,

Here's one of bits of news to share that I alluded to earlier: gae-sqlite.

Posted by Joe Gregorio on 2008-11-12