MongoDB - Find documents with earliest occurrence of duplicate value

Solution 1:

You can do the followings in an aggregation pipeline:

  1. $unwind links so the documents are in links level
  2. $sort on isoDate to get the first document
  3. $group by links to get count inbetween group and the id of the first document. In your example, title is taken as unique identifier.
  4. $match with count > 1 to get title that share the same link
  5. $group to dedupe the unique identifier we found in step 3
  6. $lookup back the original document and do some cosmetics by $replaceRoot
db.collection.aggregate([
  {
    "$unwind": "$links"
  },
  {
    $sort: {
      isoDate: 1
    }
  },
  {
    $group: {
      _id: "$links",
      first: {
        $first: "$title"
      },
      count: {
        $sum: 1
      }
    }
  },
  {
    $match: {
      count: {
        $gt: 1
      }
    }
  },
  {
    $group: {
      _id: "$first"
    }
  },
  {
    "$lookup": {
      "from": "collection",
      "localField": "_id",
      "foreignField": "title",
      "as": "rawDocument"
    }
  },
  {
    "$unwind": "$rawDocument"
  },
  {
    "$replaceRoot": {
      "newRoot": "$rawDocument"
    }
  }
])

Here is the Mongo playground for your reference.