Pagination in CouchDB?
How would I go about implementation the queries required for pagination?
Basically, when page 1 is requested, get the first 5 entries. For page 2, get the next 5 and so on.
I plan to use this via the couchdb-python module, but that shouldn't make any difference to the implementation.
Solution 1:
The CouchDB Guide has a good discussion of pagination, including lots of sample code, here: http://guide.couchdb.org/draft/recipes.html#pagination Here's their algorithm:
- Request
rows_per_page + 1
rows from the view - Display
rows_per_page
rows, store last row asnext_startkey
- As page information, keep
startkey
andnext_startkey
- Use the
next_*
values to create the next link, and use the others to create the previous link
N.B.: The proper way to fetch pages in CouchDB is by specifying a starting key, not a starting index like you might think. But how do you know what key to start the 2nd page on? The clever solution: "Instead of requesting 10 rows for a page, you request 11 rows, but display only 10 and use the values in the 11th row as the startkey for the next page."
If you expect to have multiple documents emit identical keys, you'll need to use startdocid
in addition to startkey
to paginate correctly. The reason is that startkey
alone will no longer be sufficient to uniquely identify a row. Those parameters are useless if you don't provide a startkey
. In fact, CouchDB will first look at the startkey
parameter, then it will use the startdocid
parameter to further redefine the beginning of the range if multiple potential staring rows have the same key but different document IDs. Same thing for the enddocid
.
Solution 2:
The CouchDB HTTP View API gives plenty of scope to do paging efficiently.
The simplest method would use startkey
and count
. Count is the max number of entries CouchDB will return for that view request, something that is up to your design, and startkey is where you want CouchDB to start. When you request the view it will also tell you how many entries there are, allowing you to calculate how many pages there will be if you want to show that to users.
So the first request would not specify a startkey, just the count for the number of entries you want to show. You can then note the key of the last entry returned and use that as the start key for the next page. In this simple form, you will get an overlap, where the last entry of one page is the first of the next. If this is not desirable it is trivial to simply not display the last entry of the page.
A simpler method of doing this is to use the skip parameter to work out the starting document for the page, however this method should be used with caution. The skip parameter simply causes the internal engine to not return entries that it is iterating over. While this gives the desired behaviour it is much slower than finding the first document for the page by key. The more documents that are skipped, the slower the request will be.
Solution 3:
This is what I have came up with so far - to get the ids of all posts, then retrieve the actual items for the first x number of IDs..
It's not terribly efficient, but more so than retrieving all the posts, then throwing most of the away. That said, to my surprise, it seemed to run quite quickly - I ran the posthelper.page()
method 100 times and it took about 0.5 seconds.
I didn't want to post this in the actual question, so it wouldn't influence the answers as much - here's the code:
allPostsUuid = """
function(doc) {
if(doc.type == 'post'){
emit(doc._id, null);
}
}
"""
class PostsHelper:
def __init__(self):
server = Server(config.dbhost)
db = server[config.dbname]
return db
def _getPostByUuid(self, uuid):
return self.db.get(uuid)
def page(self, number = 1):
number -= 1 # start at zero offset
start = number * config.perPage
end = start + config.perPage
allUuids = [
x.key for x in self.db.query(allPostsUuid)
]
ret = [
self._getPostByUuid(x) for x in allUuids[start : end]
]
if len(ret) == 0:
raise Error404("Invalid page (%s results)" % (len(allUuids)))
else:
return ret