How to implement "autoincrement" on Google AppEngine

Solution 1:

If you absolutely have to have sequentially increasing numbers with no gaps, you'll need to use a single entity, which you update in a transaction to 'consume' each new number. You'll be limited, in practice, to about 1-5 numbers generated per second - which sounds like it'll be fine for your requirements.

Solution 2:

If you drop the requirement that IDs must be strictly sequential, you can use a hierarchical allocation scheme. The basic idea/limitation is that transactions must not affect multiple storage groups.

For example, assuming you have the notion of "users", you can allocate a storage group for each user (creating some global object per user). Each user has a list of reserved IDs. When allocating an ID for a user, pick a reserved one (in a transaction). If no IDs are left, make a new transaction allocating 100 IDs (say) from the global pool, then make a new transaction to add them to the user and simultaneously withdraw one. Assuming each user interacts with the application only sequentially, there will be no concurrency on the user objects.

Solution 3:

The gaetk - Google AppEngine Toolkit now comes with a simple library function to get a number in a sequence. It is based on Nick Johnson's transactional approach and can be used quite easily as a foundation for Martin von Löwis' sharding approach:

>>> from gaeth.sequences import * 
>>> init_sequence('invoce_number', start=1, end=0xffffffff)
>>> get_numbers('invoce_number', 2)
[1, 2]

The functionality is basically implemented like this:

def _get_numbers_helper(keys, needed):
  results = []

  for key in keys:
    seq = db.get(key)
    start = seq.current or seq.start
    end = seq.end
    avail = end - start
    consumed = needed
    if avail <= needed:
      seq.active = False
      consumed = avail
    seq.current = start + consumed
    seq.put()
    results += range(start, start + consumed)
    needed -= consumed
    if needed == 0:
      return results
  raise RuntimeError('Not enough sequence space to allocate %d numbers.' % needed)

def get_numbers(needed):
  query = gaetkSequence.all(keys_only=True).filter('active = ', True)
  return db.run_in_transaction(_get_numbers_helper, query.fetch(5), needed)

Solution 4:

If you aren't too strict on the sequential, you can "shard" your incrementer. This could be thought of as an "eventually sequential" counter.

Basically, you have one entity that is the "master" count. Then you have a number of entities (based on the load you need to handle) that have their own counters. These shards reserve chunks of ids from the master and serve out from their range until they run out of values.

Quick algorithm:

You need to get an ID.
Pick a shard at random.
If the shard's start is less than its end, take it's start and increment it.
If the shard's start is equal to (or more oh-oh) its end, go to the master, take the value and add an amount n to it. Set the shards start to the retrieved value plus one and end to the retrieved plus n.

This can scale quite well, however, the amount you can be out by is the number of shards multiplied by your n value. If you want your records to appear to go up this will probably work, but if you want to have them represent order it won't be accurate. It is also important to note that the latest values may have holes, so if you are using that to scan for some reason you will have to mind the gaps.

Edit

I needed this for my app (that was why I was searching the question :P ) so I have implemented my solution. It can grab single IDs as well as efficiently grab batches. I have tested it in a controlled environment (on appengine) and it performed very well. You can find the code on github.

Solution 5:

Take a look at how the sharded counters are made. It may help you. Also do you really need them to be numeric. If unique is satisfying just use the entity keys.