How to get something random in datastore (AppEngine)?
Currently i'm using something like this:
images = Image.all()
count = images.count()
random_numb = random.randrange(1, count)
image = Image.get_by_id(random_numb)
But it turns out that the ids in the datastore on AppEngine don't start from 1. I have two images in datastore and their ids are 6001 and 7001.
Is there a better way to retrieve random images?
The datastore is distributed, so IDs are non-sequential: two datastore nodes need to be able to generate an ID at the same time without causing a conflict.
To get a random entity, you can attach a random float between 0 and 1 to each entity on create. Then to query, do something like this:
rand_num = random.random()
entity = MyModel.all().order('rand_num').filter('rand_num >=', rand_num).get()
if entity is None:
entity = MyModel.all().order('rand_num').get()
Edit: Updated fall-through case per Nick's suggestion.
Another solution (if you don't want to add an additional property). Keep a set of keys in memory.
import random
# Get all the keys, not the Entities
q = ItemUser.all(keys_only=True).filter('is_active =', True)
item_keys = q.fetch(2000)
# Get a random set of those keys, in this case 20
random_keys = random.sample(item_keys, 20)
# Get those 20 Entities
items = db.get(random_keys)
The above code illustrates the basic method for getting only keys and then creating a random set with which to do a batch get. You could keep that set of keys in memory, add to it as you create new ItemUser Entities, and then have a method that returns a n random Entities. You'll have to implement some overhead to manage the memcached keys. I like this solution better if you're performing the query for random elements often (I assume using a batch get for n Entities is more efficient than a query for n Entities).
I think Drew Sears's answer above (attach a random float to each entity on create) has a potential problem: every item doesn't have an equal chance of getting picked. For example, if there are only 2 entities, and one gets a rand_num of 0.2499, and the other gets 0.25, the 0.25 one will get picked almost all the time. This might or might not matter to your application. You could fix this by changing the rand_num of an entity every time it is selected, but that means each read also requires a write.
And pix's answer will always select the first key.
Here's the best general-purpose solution I could come up with:
num_images = Image.all().count()
offset = random.randrange(0, num_images)
image = Image.all().fetch(1, offset)[0]
No additional properties needed, but the downside is that count() and fetch() both have performance implications if the number of Images is large.