Thursday, November 20, 2008

Azure data challenges

My first app on azure - stacka - is progressing quite well.

However, I keep hitting problems... especially when I try to do things which are very simple and straightforward if I was working with conventional SQL database - but they often prove somewhat more tricky in the distributed Azure storage.

Here's an example.

For stacka, I've got a StackTable structure - within this each StackRow is indexed by:
  • PartitionKey - userid
  • RowKey - a creation tickcount based number - similar to that used by smarx in his ongoing blogger worked example
My thinking behind this was that the partitioning would make it easy to search by user (which is a common task to do) which the row key would ensure that results were returned in time order.

However.... what I hadn't understood was that the sort order returned by the azure table storage is not rowkey - but rather is the tuple (partitionkey, rowkey) with normal lexicographical (alphabetical) ordering defined.

And (unlike in SQL) what I can't now do is reorder the stacks easily in the query - i.e. I cannot simple add a global "" to the query

So....... now I'm looking at:
  • possibly rewriting this indexing, so that the Stacks are not stored with any preference for userid, but so that time based returning is still preserved. This will make it quicker to get the most recent N entities - but will make it slower to get all Stacks associated with one particular user.
  • possibly adding an additional (manually maintained) table which stores the most recent StackRows with a time based ordering. Because this table doesn't need to store everything - just the top N entities - it doesn't need to be particularly large.
There are definitely some interesting patterns that will arise while coding with Azure....

No comments:

Post a Comment