As an example, Google App Engine uses Google Datastore, not a standard database, to store data. Does anybody have any tips for using Google Datastore instead of databases? It seems I've trained my mind to think 100% in object relationships that map directly to table structures, and now it's hard to see anything differently. I can understand some of the benefits of Google Datastore (e.g. performance and the ability to distribute data), but some good database functionality is sacrificed (e.g. joins).

Does anybody who has worked with Google Datastore or BigTable have any good advice to working with them?


Solution 1:

There's two main things to get used to about the App Engine datastore when compared to 'traditional' relational databases:

  • The datastore makes no distinction between inserts and updates. When you call put() on an entity, that entity gets stored to the datastore with its unique key, and anything that has that key gets overwritten. Basically, each entity kind in the datastore acts like an enormous map or sorted list.
  • Querying, as you alluded to, is much more limited. No joins, for a start.

The key thing to realise - and the reason behind both these differences - is that Bigtable basically acts like an enormous ordered dictionary. Thus, a put operation just sets the value for a given key - regardless of any previous value for that key, and fetch operations are limited to fetching single keys or contiguous ranges of keys. More sophisticated queries are made possible with indexes, which are basically just tables of their own, allowing you to implement more complex queries as scans on contiguous ranges.

Once you've absorbed that, you have the basic knowledge needed to understand the capabilities and limitations of the datastore. Restrictions that may have seemed arbitrary probably make more sense.

The key thing here is that although these are restrictions over what you can do in a relational database, these same restrictions are what make it practical to scale up to the sort of magnitude that Bigtable is designed to handle. You simply can't execute the sort of query that looks good on paper but is atrociously slow in an SQL database.

In terms of how to change how you represent data, the most important thing is precalculation. Instead of doing joins at query time, precalculate data and store it in the datastore wherever possible. If you want to pick a random record, generate a random number and store it with each record. There's a whole cookbook of this sort of tips and tricks here.

Solution 2:

The way I have been going about the mind switch is to forget about the database altogether.

In the relational db world you always have to worry about data normalization and your table structure. Ditch it all. Just layout your web page. Lay them all out. Now look at them. You're already 2/3 there.

If you forget the notion that database size matters and data shouldn't be duplicated then you're 3/4 there and you didn't even have to write any code! Let your views dictate your Models. You don't have to take your objects and make them 2 dimensional anymore as in the relational world. You can store objects with shape now.

Yes, this is a simplified explanation of the ordeal, but it helped me forget about databases and just make an application. I have made 4 App Engine apps so far using this philosophy and there are more to come.