How do iOS devices perform on-device machine learning on photos while most of the photos are uploaded to iCloud?
When iCloud photo is enabled, all photos are uploaded to iCloud and only a low-resolution copy is saved on the phone for browsing, the high res photo is download on-demand if the user clicks it.
Photos app will perform on-device machine learning when the device is not in use and is charging(typically while you’re sleeping) to detect faces in your photo library so that you have “people albums”. This is just one case that the app is using ML to achieve.
What confuses me is that the low res images are just not good enough to perform ML on, especially for face detection. So how can it analyze those low-res photos and giving the correct prediction?
I have two guesses:
-
Don’t care low or high res, just analyze what is on the device. (I don’t really believe this is the case as the face detection is too accurate to be believed that they were a result of ML on 480p photos)
-
On-demand loading, download the original photo from iCloud while the ML algorithm is running, and delete the original photo from the device when it is done. (Isn’t this a bandwidth waste for iCloud servers?)
Edit 1: I’m interested in how will it handle the case when I manually added some photo to one person(this action causes the model to change, well, eventually change after it learns the new photos that weren’t detected previously), all photos must be analyzed again using the new model. At this time, most high-resolution images (let's say 199GB/200GB) are in iCloud only, so will the system pull them one by one to perform ML using the new model again? What if I “edit” the ML regularly? (by adding an un-detected photo to people album but not all at once) If so, this could be a huge drain to the iCloud servers... don’t forget some people do have more than 1TB of photos in iCloud.
Edit 2:
Don't take me wrong, I'm purely interested in how this whole thing works because to me it is done very nicely. And to be clear, I completely trust Apple on the "on-device" promise, and this is exactly why I asked this question to get an understanding of the technical details that make it both respecting privacy and giving accurate results.
Obviously the phone cannot do its work without the data - so it performs the on-device ML right after the photo was taken, before it is uploaded to iCloud and deleted on device - or if that wasn't possible - it downloads the photo data from iCloud and performs the on-device ML (without necessarily storing the photo on the device).
Whether or not that is a "bandwidth waste" is fully subjective. If you care about the privacy benefits on performing ML on-device, it is not a waste - if you do not care, it could be seen as a waste.
UPDATE: With your edit, you added a question on what happens if you add new information after the fact - for example changing a name, adding a photo to a specific detected person, etc. Actually the phone does not need to download all photos again to do the whole thing all over.
When the phone has the original image, it can extract features from the photo using machine learning. It is not known exactly which features iPhoto look at, but imagine extracting numbers such as the distance between the eyes, the color of the eye, etc. Another type of machine learning then can group thee numbers into classifications of individuals, which you them can name.
These numbers are stored on the phone and allow for you to update certain information and “recalculate” the detected persons without actually needing the full image data again.