What's the best approach to prefill Core Data store when using NSPersistentCloudKitContainer?
I'm having the following scenario where I'm parsing objects from a JSON file and store them into my Core Data store. Now I'm using NSPersistentCloudKitContainer
and when I'm running the app on a different device, it also parses the JSON file and adds objects to Core Data. That results in duplicate objects.
Now I'm wondering if there is:
- An easy way if I can check that an entity already exists remotely?
- Any other way to avoid objects being saved twice in CloudKit?
- Getting notified when fetching data from remote has finished?
Solution 1:
Maybe it's too late to answer but I am working on the same issue recently. After weeks of research and I would like to leave here what I've learned, hope to help someone having the same problem.
An easy way if I can check that an entity already exists remotely?
Any other way to avoid objects being saved twice in CloudKit?
Yes, we can check if the entity already exists on iCloud, but that's not the best way to decide whether to parse the JSON file and save it to CoreData persistentStore or not. Chances are the app is not connected to an Apple ID / iCloud, or having some network issue that makes it not reliable to check if that entity exists remotely or not.
The current solution is to deduplicate the data ourselves, by adding a UUID field to every data object added from the JSON file, and remove the object with the same UUID. Most of the time I would also add a lastUpdate field, so we can keep the most latest data object.
Getting notified when fetching data from remote has finished?
We can add an observer of NSPersistentStoreRemoteChange, and get notifications whenever the remote store changes.
Apple provided a demo project on using CoreData with CloudKit, and explain the deduplication quite well.
Synchronizing a Local Store to the Cloud https://developer.apple.com/documentation/coredata/synchronizing_a_local_store_to_the_cloud
WWDC2019 session 202: Using CoreData with CloudKit https://developer.apple.com/videos/play/wwdc2019/202
The whole idea is to listen to changes in remote store, keep track of the changes history, and deduplicate our data when there is any new data coming in. (And of course we need some field to determine whether the data is duplicated or not). The persistent store provides a history tracking feature, and we can fetch those transactions when they are merging to the local store, and run our deduplication process. Let's say we will parse JSON and import Tags when app launched:
// Use a custom queue to ensure only one process of history handling at the same time
private lazy var historyQueue: OperationQueue = {
let queue = OperationQueue()
queue.maxConcurrentOperationCount = 1
return queue
}()
lazy var persistentContainer: NSPersistentContainer = {
let container = NSPersistentCloudKitContainer(name: "CoreDataCloudKitDemo")
...
// set the persistentStoreDescription to track history and generate notificaiton (NSPersistentHistoryTrackingKey, NSPersistentStoreRemoteChangeNotificationPostOptionKey)
// load the persistentStores
// set the mergePolicy of the viewContext
...
// Observe Core Data remote change notifications.
NotificationCenter.default.addObserver(
self, selector: #selector(type(of: self).storeRemoteChange(_:)),
name: .NSPersistentStoreRemoteChange, object: container.persistentStoreCoordinator)
return container
}()
@objc func storeRemoteChange(_ notification: Notification) {
// Process persistent history to merge changes from other coordinators.
historyQueue.addOperation {
self.processPersistentHistory()
}
}
// To fetch change since last update, deduplicate if any new insert data, and save the updated token
private func processPersistentHistory() {
// run in a background context and not blocking the view context.
// when background context is saved, it will merge to the view context based on the merge policy
let taskContext = persistentContainer.newBackgroundContext()
taskContext.performAndWait {
// Fetch history received from outside the app since the last token
let historyFetchRequest = NSPersistentHistoryTransaction.fetchRequest!
let request = NSPersistentHistoryChangeRequest.fetchHistory(after: lastHistoryToken)
request.fetchRequest = historyFetchRequest
let result = (try? taskContext.execute(request)) as? NSPersistentHistoryResult
guard let transactions = result?.result as? [NSPersistentHistoryTransaction],
!transactions.isEmpty
else { return }
// Tags from remote store
var newTagObjectIDs = [NSManagedObjectID]()
let tagEntityName = Tag.entity().name
// Append those .insert change in the trasactions that we want to deduplicate
for transaction in transactions where transaction.changes != nil {
for change in transaction.changes!
where change.changedObjectID.entity.name == tagEntityName && change.changeType == .insert {
newTagObjectIDs.append(change.changedObjectID)
}
}
if !newTagObjectIDs.isEmpty {
deduplicateAndWait(tagObjectIDs: newTagObjectIDs)
}
// Update the history token using the last transaction.
lastHistoryToken = transactions.last!.token
}
}
Here we save the ObjectID of the added Tags so we can deduplicate them on any other object context,
private func deduplicateAndWait(tagObjectIDs: [NSManagedObjectID]) {
let taskContext = persistentContainer.backgroundContext()
// Use performAndWait because each step relies on the sequence. Since historyQueue runs in the background, waiting won’t block the main queue.
taskContext.performAndWait {
tagObjectIDs.forEach { tagObjectID in
self.deduplicate(tagObjectID: tagObjectID, performingContext: taskContext)
}
// Save the background context to trigger a notification and merge the result into the viewContext.
taskContext.save(with: .deduplicate)
}
}
private func deduplicate(tagObjectID: NSManagedObjectID, performingContext: NSManagedObjectContext) {
// Get tag by the objectID
guard let tag = performingContext.object(with: tagObjectID) as? Tag,
let tagUUID = tag.uuid else {
fatalError("###\(#function): Failed to retrieve a valid tag with ID: \(tagObjectID)")
}
// Fetch all tags with the same uuid
let fetchRequest: NSFetchRequest<Tag> = Tag.fetchRequest()
// Sort by lastUpdate, keep the latest Tag
fetchRequest.sortDescriptors = [NSSortDescriptor(key: "lastUpdate", ascending: false)]
fetchRequest.predicate = NSPredicate(format: "uuid == %@", tagUUID)
// Return if there are no duplicates.
guard var duplicatedTags = try? performingContext.fetch(fetchRequest), duplicatedTags.count > 1 else {
return
}
// Pick the first tag as the winner.
guard let winner = duplicatedTags.first else {
fatalError("###\(#function): Failed to retrieve the first duplicated tag")
}
duplicatedTags.removeFirst()
remove(duplicatedTags: duplicatedTags, winner: winner, performingContext: performingContext)
}
And the most difficult part (in my opinion) is to handle those relationship of the duplicated object that got deleted, lets say our Tag object have a one-to-many relationship with a Category object (each Tag may have multiple Category)
private func remove(duplicatedTags: [Tag], winner: Tag, performingContext: NSManagedObjectContext) {
duplicatedTags.forEach { tag in
// delete the tag AFTER we handle the relationship
// and be careful that the delete rule will also activate
defer { performingContext.delete(tag) }
if let categorys = tag.categorys as? Set<Category> {
for category in categorys {
// re-map those category to the winner Tag, or it will become nil when the duplicated Tag got delete
category.ofTag = winner
}
}
}
}
One interesting thing is, if the Category objects are also added from the remote store, they may not yet exist when we handle the relationship, but that's another story.