Monday, November 03, 2008

Ongoing Cloud Investigations

September was a busy month.

That may sound like an excuse... and I guess it is. However, my last few weeks of work were very busy, plus I managed to get away at the end of the month - managed to spend some time unwinding on holiday in Cuba.

But I've not forgotten this blog entirely - and now I'm back to give a brief update.


Firstly I've carried on reading about and playing with Google Apps Engine. The limits in the data storage continue to vex me. I kind of like statistics... and most of the web apps I want to write seem to involve data manipulation - including the use of GROUP BY type SQL clauses - and the use of COUNT, MAX, MIN, SUM. Because of the semi-structured nature of Google's BigTable (which are based on very good principles designed to help applications be extremely quick and scalable), these functions simply aren't available. The workarounds I've looked into for these problems are:
  • Calculating all the required COUNT, MAX, MIN, SUM type stats in Python code when required - but these calculations are certainly not ideal - and not really scalable as a solution.
  • Storing all the required stats in special BigTable entries - and updating these stats whenever a web request arrives to add new data, or to update or delete existing data. This is more efficient that the first solution - but requires a lot of additional code to be written and I'm worried it might be a particularly error prone set of code...
  • Storing all the required stats in special BigTable entries - and updating these stats offline using some batch process - but this is quite awkard as the data store API is not particularly friendly for batch processes (e.g. the upper 1000 limit in the number of rows returned although this can be worked around) and because the Apps Enginer does not support batch processes directly - so you have to mimic them yourself using special web hits (but noting that each web hit can only consume at most 3 seconds of processing and that Google might block these requests for reasons of load balancing...)
In short, I'm more and more frequently considering reverting back to non-cloud application models for many of my planned apps.

Secondly, in the last week (while I was away), Microsoft has finally and fully unveiled it's Azure platform - it's answer to Cloud computing - to Google Apps Engine and to Amazon S3 and EC2 services. I've downloaded the Community Technolody Preview of this and got it partially working - good old Microsoft - they're only supporting Vista as a development platform initially! To speed up my curve I've also spent some time watching some of the PDC speeches - especially:
  • Developing and deploying your first service - http://channel9.msdn.com/pdc2008/ES01/
  • Best practices - http://channel9.msdn.com/pdc2008/ES03/
I'm also planning to watch:
  • The keynote at some point soon - given by Don Box who's always good for talking, coding and *most importantly* not using Powerpoint -
  • Some storage tech talks - http://channel9.msdn.com/pdc2008/ES04/ and http://channel9.msdn.com/pdc2008/ES07/
From what I've seen so far... microsoft have really hit the ground running hard here:
  • Their solution looks very polished
  • Their feature set is already more extensive in some places than Google's - e.g. they have provided a queue feature and further have provided facilities for background "worker" processing too.
  • There are a few places, however, where there support is slightly less than finished - e.g. there were questions in some of the tech sessions about lack of secondary data indexing and enterprise developers were already pushing hard for more advanced transaction support and for more private features (how suitable is the cloud for private or enterprise data?)
  • Their tool support is superb - there are some gaps where you have to drop down to command line tools, but already the level of Visual Studio integration is very significant - plus the team are clearly looking to push the newer MVC ASP application pattern into use for the cloud apps.
  • Behind the generic Azure web application hosting service, there are further Microsoft services in production - including the Live services (for user login), and the SQL Data Services which would provide massively scalable (but not massively gigantically BigTable scalable) full SQL data services - which would specifically help with my COUNT, MIN, MAX, AVG stats.
  • The marketing team really are already pushing hard - Microsoft is definitely 100% behind the cloud.
So what does this mean about my direction? Well, actually, having looked at the Microsoft products I am kind of tempted especially as:
  • Having spent so long working on Windows and with Microsoft code and tools, then Visual Studio just feel like "home" and the Intellisense is so much better than Python's in Ellipse.
  • The fact that Microsoft is so new gives me a chance to establish more of an early uniqueness - perhaps a chance to make a bit of a splash/name for myself (although my lack of .NET 3.5 experience, of LINQ. of WCF, of ASP.NET MVC, etc will all leave me quite some way behind at the start.
  • I think there'll be lots of opportunities in both the .NET Cloud and in the Google Cloud (and beyond - Amazon are still a pioneer here, plus I haven't mentioned Rackspace and Mosso yet!) so it doesn't harm to try to use them
However, at a purely functional level I'm not convinced that I'll have any less problems with achieveing functional websites with Microsoft's Cloud solution instead of Google's - I think both are fairly similar and will show similar problems.

And, of course, I'm also very disappointed that the development tools for the Microsoft Cloud aren't yet Windows XP compatible...

No comments:

Post a Comment