Where does Apple's predictive keyboard get its "contextual" content from?
So I was playing around with the iOS 8, when I notice that upon typing "new" anywhere, the predictive keyboard suggests the term "NewTerm". I recall using this word in a tweet, so I brushed it off. But then, when writing the word "swift", the keyboard suggested "Swisslapse" upon having written "swi". Now this shocked me quite a bit, as I have only used the term "Swisslapse" in a private iMessage chat, which had been deleted three months back (from all my devices). I tried typing a few other words that I had used recently in Safari (running in private mode), but to my relief the keyboard did not suggest them.
However, this creates a couple of questions:
- Does Apple cache my iMessage chats?
- Does Apple cache my keyboard input in selected apps?
- Is it possible to request a purge of my caches?
- How can I disable further caching (if any).
I think it would be helpful to add that I had installed iOS 8 on the 9th of September, right after the GM seed was released to the developer center. Therefore it is impossible for iOS 8's predictive keyboard to have learnt these words at time of input.
Based on the answer provided by Brian Nickel, iOS added the words to my local keyboard dictionary (i.e. words I want autocorrect to learn), which I then synced to iOS 8 while restoring my device from a backup; which resulted into the predictive keyboard suggesting them when seemingly necessary.
So it seems that Apple isn't infringing upon our privacy after all. Glad to have that cleared up!
Speaking only from my experience with iOS 7, the device saves any words you enter but don't autocorrect into your "Keyboard dictionary". The logic is that if you didn't want to correct it, it is a word that you have used and may use again.
I did a basic test. I typed "Swisslapse" into Messages but did not send. After a while I typed "Swiss" and it came up in autocomplete. I confirmed it came up in Notes too. To verify it wasn't Apple app specific I typed a new word into Avocado and it also showed up as a completion suggestion in Notes.
You can clear out your keyboard dictionary by going into Settings.app, General > Reset > Reset Keyboard Dictionary. This answer shows the path to the cache and you could theoretically remove just the offending words using a third-party tool to access and modify the file.
As for preventing new words from being learnt, I would speculate that disabling Auto-Correction would do the trick in iOS 7. Even though "Swisslapse" is an autocomplete suggestion now, it still shows up as an invalid word so it is not interacting with the spell check dictionary. iOS 8 may have more fine grained settings but you have to assume that the predictive keyboard is learning from everything you type and may have to just disable it.
To your question on iMessages, Apple is adamant that they cannot read your messages in transit as they are encrypted on the sender's device only to be read on the receiver's device(s). I would doubt that Apple saves the message contents on device if you delete it there. I haven't tested, but I would doubt they also scan incoming messages for words to autocorrect, though they have started scanning them for predictive text responses. (E.g., "Pizza or Chinese?" generates Pizza and Chinese as the first two predictive responses.)
If you read the Apple patents there is a explanation how it works.
US patents
Patent No. 8,232,973 for a "Method, device, and graphical user interface providing word recommendations for text input"
and
Apple’s U.S Patent No. 8,074,172 for a "method, system, and graphical user interface for providing word recommendations" or predictive text.
…..However, the size of these portable communication devices also restricts the size of the text input device, such as a physical or virtual keyboard, in the portable device. With a size-restricted keyboard, designers are often forced to make the keys smaller or overload the keys. Both may lead to typing mistakes and thus more backtracking to correct the mistakes. This makes the process of communication by text on the devices inefficient and reduces user satisfaction with such portable communication devices.
.....The set of strings are compared against a dictionary. Words in the dictionary that have any of the set of strings as a prefix are identified (206). As used herein, "prefix" means that the string is a prefix of a word in the dictionary or is itself a word in the dictionary. A dictionary, as used herein, refers to a list of words. The dictionary may be pre-made and stored in the memory. The dictionary may also include usage frequency rankings for each word in the dictionary. A usage frequency ranking for a word indicates (or more generally, corresponds to) the statistical usage frequency for that word in a language. In some embodiments, the dictionary may include different usage frequency rankings for different variants of a language. For example, a dictionary of words in the English language may have different usage frequency rankings with respect to American English and British English.
In some embodiments, the dictionary may be customizable. That is, additional words may be added to the dictionary by the user. Furthermore, in some embodiments, different applications may have different dictionaries with different words and usage frequency rankings. For example, an email application and an SMS application may have different dictionaries, with different words and perhaps different usage frequency rankings within the same language.
The identified words are the candidate words that may be presented to the user as recommended replacements for the input sequence. The candidate words are scored (208). Each candidate word is scored based on a character-to-character comparison with the input sequence and optionally other factors. Further details regarding the scoring of candidate words are described below, in relation to FIGS. 3 and 7A-7C. A subset of the candidate words are selected based on predefined criteria (210) and the selected subset is presented to the user (212). In some embodiments, the selected candidate words are presented to the user as a horizontal listing of words.
Graphical view:
I did not intend to provide a complete explanation on how it works, but provide a guide to it.
So what about
Notice my dictionary does not have it so it is red underlined and it is recommending to look it up.
The choices are:
1- look it up and correct
2- add to dictionary as typed
3- ignore it
The predictive keyboard logic will take in account all 3 inputs. Even the ignored version, and it will assume it is what I wanted. So in your case, you probably did not add it to your dictionary, but used that word more than once, so it became marked as the most probable (predictive).