Getting a distinct aggregation of an array field across indexes

I'm trying to learn MongoDB and how it'd be useful for analytics for me. I'm simply playing around with the JavaScript console available on their website and have created the following items:

{"title": "Cool", "_id": {"$oid": "503e4dc0cc93742e0d0ccad3"}, "tags": ["twenty", "sixty"]}
{"title": "Other", "_id": {"$oid": "503e4e5bcc93742e0d0ccad4"}, "tags": ["ten", "thirty"]}
{"title": "Ouch", "_id": {"$oid": "503e4e72cc93742e0d0ccad5"}, "tags": ["twenty", "seventy"]}
{"title": "Final", "_id": {"$oid": "503e4e72cc93742e0d0ccad6"}, "tags": ["sixty", "seventy"]}

What I'd like to do is query so I get a list of unique tags for all of these objects. The result should look something like this:

["ten", "twenty", "thirty", "sixty", "seventy"]

How do I query for this? I'm trying to distinct() it, but the call always fails without even querying.


Solution 1:

The code that fails on their website works on an actual MongoDB instance:

> db.posts.insert({title: "Hello", tags: ["one", "five"]});
> db.posts.insert({title: "World", tags: ["one", "three"]});
> db.posts.distinct("tags");
[ "one", "three", "five"]

Weird.

Solution 2:

You can use the aggregation framework. Depending on how you'd like the results structured, you can use either

var pipeline = [ 
        {"$unwind": "$tags" } ,
        { "$group": { _id: "$tags" } }
    ];
R = db.tb.aggregate( pipeline );
printjson(R);

{
        "result" : [
                {
                        "_id" : "seventy"
                },
                {
                        "_id" : "ten"
                },
                {
                        "_id" : "sixty"
                },
                {
                        "_id" : "thirty"
                },
                {
                        "_id" : "twenty"
                }
        ],
        "ok" : 1
}

or

var pipeline = [ 
        {"$unwind": "$tags" } ,
        { "$group": 
            { _id: null, tags: {"$addToSet": "$tags" }  }
        }
    ];
R = db.tb.aggregate( pipeline );
printjson(R);

{
        "result" : [
                {
                        "_id" : null,
                        "tags" : [
                                "seventy",
                                "ten",
                                "sixty",
                                "thirty",
                                "twenty"
                        ]
                }
        ],
        "ok" : 1
}

Solution 3:

You should be able to use this:

db.mycollection.distinct("tags").sort()

Solution 4:

Another way of getting unique array elements using aggregation pipeline

db.blogs.aggregate(
  [
    {$group:{_id : null, uniqueTags : {$push : "$tags"}}},
    {$project:{
      _id : 0,
      uniqueTags : {
        $reduce : {
          input : "$uniqueTags", 
          initialValue :[], 
          in : {$let : {
            vars : {elem : { $concatArrays : ["$$this", "$$value"] }},
            in : {$setUnion : "$$elem"}
          }}
        }
      }
    }}
  ]
)

collection

> db.blogs.find()
{ "_id" : ObjectId("5a6d53faca11d88f428a2999"), "name" : "sdfdef", "tags" : [ "abc", "def", "efg", "abc" ] }
{ "_id" : ObjectId("5a6d5434ca11d88f428a299a"), "name" : "abcdef", "tags" : [ "abc", "ijk", "lmo", "zyx" ] }
> 

pipeline

>   db.blogs.aggregate(
...     [
...       {$group:{_id : null, uniqueTags : {$push : "$tags"}}},
...       {$project:{
...         _id : 0,
...         uniqueTags : {
...           $reduce : {
...             input : "$uniqueTags", 
...             initialValue :[], 
...             in : {$let : {
...               vars : {elem : { $concatArrays : ["$$this", "$$value"] }},
...               in : {$setUnion : "$$elem"}
...             }}
...           }
...         }
...       }}
...     ]
...   )

result

{ "uniqueTags" : [ "abc", "def", "efg", "ijk", "lmo", "zyx" ] }