MongoDB Many-to-Many Association

How would you do a many-to-many association with MongoDB?

For example; let's say you have a Users table and a Roles table. Users have many roles, and roles have many users. In SQL land you would create a UserRoles table.

Users:
    Id
    Name

Roles:
    Id
    Name

UserRoles:
    UserId
    RoleId

How is same sort of relationship handled in MongoDB?


Solution 1:

Depending on your query needs you can put everything in the user document:

{name:"Joe"
,roles:["Admin","User","Engineer"]
}

To get all the Engineers, use:

db.things.find( { roles : "Engineer" } );

If you want to maintain the roles in separate documents then you can include the document's _id in the roles array instead of the name:

{name:"Joe"
,roles:["4b5783300334000000000aa9","5783300334000000000aa943","6c6793300334001000000006"]
}

and set up the roles like:

{_id:"6c6793300334001000000006"
,rolename:"Engineer"
}

Solution 2:

Instead of trying to model according to our years of experience with RDBMS's, I have found it much easier to model document-repository solutions using MongoDB, Redis, and other NoSQL data stores by optimizing for the read use cases, while being considerate of the atomic write operations that need to be supported by the write use cases.

For instance, the uses of a "Users in Roles" domain follow:

  1. Role - Create, Read, Update, Delete, List Users, Add User, Remove User, Clear All Users, Index of User or similar to support "Is User In Role" (operations like a container + its own metadata).
  2. User - Create, Read, Update, Delete (CRUD operations like a free-standing entity)

This can be modeled as the following document templates:

User: { _id: UniqueId, name: string, roles: string[] }
    Indexes: unique: [ name ]
Role: { _id: UniqueId, name: string, users: string[] }
    Indexes: unique: [ name ]

To support the high frequency uses, such as Role-related features from the User entity, User.Roles is intentionally denormalized, stored on the User as well as Role.Users having duplicate storage.

If it is not readily apparent in the text, but this is the type of thinking that is encouraged when using document repositories.

I hope that this helps bridge the gap with regard to the read side of the operations.

For the write side, what is encouraged is to model according to atomic writes. For instance, if the document structures require acquiring a lock, updating one document, then another, and possibly more documents, then releasing the lock, likely the model has failed. Just because we can build distributed locks doesn't mean that we are supposed to use them.

For the case of the User in Roles model, the operations that stretch our atomic write avoidance of locks is adding or removing a User from a Role. In either case, a successful operation results in both a single User and a single Role document being updated. If something fails, it is easy to perform cleanup. This is the one reason the Unit of Work pattern comes up quite a lot where document repositories are used.

The operation that really stretches our atomic write avoidance of locks is clearing a Role, which would result in many User updates to remove the Role.name from the User.roles array. This operation of clear then is generally discouraged, but if needed can be implemented by ordering the operations:

  1. Get the list of user names from Role.users.
  2. Iterate the user names from step 1, remove the role name from User.roles.
  3. Clear the Role.users.

In the case of an issue, which is most likely to occur within step 2, a rollback is easy as the same set of user names from step 1 can be used to recover or continue.

Solution 3:

I've just stumbled upon this question and, although it's an old one, I thought it would be useful to add a couple of possibilities not mentioned in the answers given. Also, things have moved on a bit in the last few years, so it is worth emphasising that SQL and NoSQL are moving closer to each other.

One of the commenters brought up the wise cautionary attitude that “if data is relational, use relational”. However, that comment only makes sense in the relational world, where schemas always come before the application.

RELATIONAL WORLD: Structure data > Write application to get it
NOSQL WORLD: Design application > Structure data accordingly

Even if data is relational, NoSQL is still an option. For example, one-to-many relationships are no problem at all and are widely covered in MongoDB docs

A 2015 SOLUTION TO A 2010 PROBLEM

Since this question was posted, there have been serious attempts at bringing noSQL closer to SQL. The team led by Yannis Papakonstantinou at the University of California (San Diego) have been working on FORWARD, an implementation of SQL++ which could soon be the solution to persistent problems like the one posted here.

At a more practical level, the release of Couchbase 4.0 has meant that, for the first time, you can do native JOINs in NoSQL. They use their own N1QL. This is an example of a JOIN from their tutorials:

SELECT usr.personal_details, orders 
        FROM users_with_orders usr 
            USE KEYS "Elinor_33313792" 
                JOIN orders_with_users orders 
                    ON KEYS ARRAY s.order_id FOR s IN usr.shipped_order_history END

N1QL allows for most if not all SQL operations including aggregration, filtering, etc.

THE NOT-SO-NEW HYBRID SOLUTION

If MongoDB is still the only option, then I'd like to go back to my point that the application should take precedence over the structure of data. None of the answers mention hybrid embedding, whereby most queried data is embedded in the document/object, and references are kept for a minority of cases.

Example: can information (other than role name) wait? could bootstrapping the application be faster by not requesting anything that the user doesn't need yet?

This could be the case if user logs in and s/he needs to see all the options for all the roles s/he belongs to. However, the user is an “Engineer” and options for this role are rarely used. This means the application only needs to show the options for an engineer in case s/he wants to click on them.

This can be achieved with a document which tells the application at the start (1) which roles the user belongs to and (2) where to get information about an event linked to a particular role.

   {_id: ObjectID(),
    roles: [[“Engineer”, “ObjectId()”],
            [“Administrator”, “ObjectId()”]]
   }

Or, even better, index the role.name field in the roles collection, and you may not need to embed ObjectID() either.

Another example: is information about ALL the roles requested ALL the time?

It could also be the case that the user logs in to the dashboard and 90% of the time performs tasks linked to the “Engineer” role. Hybrid embedding could be done for that particular role in full and keep references for the rest only.

{_id: ObjectID(),
  roles: [{name: “Engineer”, 
           property1: value1,
           property2: value2
          },   
          [“Administrator”, “ObjectId()”]
         ]
}

Being schemaless is not just a characteristic of NoSQL, it could be an advantage in this case. It's perfectly valid to nest different types of objects in the “Roles” property of an user object.

Solution 4:

in case when employee and company is entity-object try to use following schema:

employee{
   //put your contract to employee
   contracts:{ item1, item2, item3,...}
}

company{
   //and duplicate it in company
   contracts:{ item1, item2, item3,...}
}