How to listen for changes to a MongoDB collection?

I'm creating a sort of background job queue system with MongoDB as the data store. How can I "listen" for inserts to a MongoDB collection before spawning workers to process the job?

Do I need to poll every few seconds to see if there are any changes from last time, or is there a way my script can wait for inserts to occur?

This is a PHP project that I am working on, but feel free to answer in Ruby or language agnostic.


Solution 1:

What you are thinking of sounds a lot like triggers. MongoDB does not have any support for triggers, however some people have "rolled their own" using some tricks. The key here is the oplog.

When you run MongoDB in a Replica Set, all of the MongoDB actions are logged to an operations log (known as the oplog). The oplog is basically just a running list of the modifications made to the data. Replicas Sets function by listening to changes on this oplog and then applying the changes locally.

Does this sound familiar?

I cannot detail the whole process here, it is several pages of documentation, but the tools you need are available.

First some write-ups on the oplog - Brief description - Layout of the local collection (which contains the oplog)

You will also want to leverage tailable cursors. These will provide you with a way to listen for changes instead of polling for them. Note that replication uses tailable cursors, so this is a supported feature.

Solution 2:

MongoDB has what is called capped collections and tailable cursors that allows MongoDB to push data to the listeners.

A capped collection is essentially a collection that is a fixed size and only allows insertions. Here's what it would look like to create one:

db.createCollection("messages", { capped: true, size: 100000000 })

MongoDB Tailable cursors (original post by Jonathan H. Wage)

Ruby

coll = db.collection('my_collection')
cursor = Mongo::Cursor.new(coll, :tailable => true)
loop do
  if doc = cursor.next_document
    puts doc
  else
    sleep 1
  end
end

PHP

$mongo = new Mongo();
$db = $mongo->selectDB('my_db')
$coll = $db->selectCollection('my_collection');
$cursor = $coll->find()->tailable(true);
while (true) {
    if ($cursor->hasNext()) {
        $doc = $cursor->getNext();
        print_r($doc);
    } else {
        sleep(1);
    }
}

Python (by Robert Stewart)

from pymongo import Connection
import time

db = Connection().my_db
coll = db.my_collection
cursor = coll.find(tailable=True)
while cursor.alive:
    try:
        doc = cursor.next()
        print doc
    except StopIteration:
        time.sleep(1)

Perl (by Max)

use 5.010;

use strict;
use warnings;
use MongoDB;

my $db = MongoDB::Connection->new;
my $coll = $db->my_db->my_collection;
my $cursor = $coll->find->tailable(1);
for (;;)
{
    if (defined(my $doc = $cursor->next))
    {
        say $doc;
    }
    else
    {
        sleep 1;
    }
}

Additional Resources:

Ruby/Node.js Tutorial which walks you through creating an application that listens to inserts in a MongoDB capped collection.

An article talking about tailable cursors in more detail.

PHP, Ruby, Python, and Perl examples of using tailable cursors.

Solution 3:

Check out this: Change Streams

January 10, 2018 - Release 3.6

*EDIT: I wrote an article about how to do this https://medium.com/riow/mongodb-data-collection-change-85b63d96ff76

https://docs.mongodb.com/v3.6/changeStreams/


It's new in mongodb 3.6 https://docs.mongodb.com/manual/release-notes/3.6/ 2018/01/10

$ mongod --version
db version v3.6.2

In order to use changeStreams the database must be a Replication Set

More about Replication Sets: https://docs.mongodb.com/manual/replication/

Your Database will be a "Standalone" by default.

How to Convert a Standalone to a Replica Set: https://docs.mongodb.com/manual/tutorial/convert-standalone-to-replica-set/


The following example is a practical application for how you might use this.
* Specifically for Node.

/* file.js */
'use strict'


module.exports = function (
    app,
    io,
    User // Collection Name
) {
    // SET WATCH ON COLLECTION 
    const changeStream = User.watch();  

    // Socket Connection  
    io.on('connection', function (socket) {
        console.log('Connection!');

        // USERS - Change
        changeStream.on('change', function(change) {
            console.log('COLLECTION CHANGED');

            User.find({}, (err, data) => {
                if (err) throw err;

                if (data) {
                    // RESEND ALL USERS
                    socket.emit('users', data);
                }
            });
        });
    });
};
/* END - file.js */

Useful links:
https://docs.mongodb.com/manual/tutorial/convert-standalone-to-replica-set
https://docs.mongodb.com/manual/tutorial/change-streams-example

https://docs.mongodb.com/v3.6/tutorial/change-streams-example
http://plusnconsulting.com/post/MongoDB-Change-Streams

Solution 4:

Since MongoDB 3.6 there will be a new notifications API called Change Streams which you can use for this. See this blog post for an example. Example from it:

cursor = client.my_db.my_collection.changes([
    {'$match': {
        'operationType': {'$in': ['insert', 'replace']}
    }},
    {'$match': {
        'newDocument.n': {'$gte': 1}
    }}
])

# Loops forever.
for change in cursor:
    print(change['newDocument'])