Javascript trick? How does scribd make it difficult to even copy & paste text

Lately, I have seen that scribd makes it very difficult for users (free users) to browse through a document hosted on their site. There is no ability to search within a document, let alone being able to download the same.

Using javascript, they load pages on demand in the browser, and so the browser's "save as" feature does not help much.

To my amazement, I saw that even copy/ pasting text copies gibberish to the clipboard! To check out what was wrong, I turned off javascript in the browser and then loaded the same document again. Voila, I did see the gibberish. And so, it looks like the javascript from scribd somehow decodes the gibberish text and then displays it in the browser.

Now, my question is, even after javascript is enabled, and the text is rendered properly in the browser, if I go and look at the DOM objects corresponding to the text I select, I still see the gibberish text.

So, now, I am confused. The text is displayed alright to the user, but the DOM objects still contain gibberish. So the question is, what kind of javascript hooks/ code is the site using, so as to be able to retain the gibberish in the DOM objects and still render the decoded text?

Is there a way I can access the decoded text? My intention is not to reverse engineer the algorithm to decode, but to locate where the decoded text is being stored?

Example document is:

http://www.scribd.com/doc/143886351/OCP-Upgrade-to-Oracle-Database-12c-Student-Guide-vol-1-Exam-1Z0-060

See what happens when you turn Javascript on/ off!


Look at the font-family for the span. They use a custom font (in this case ff6).

They must do this so that more PDF documents are displayed correctly. As in PDF documents there is no requirement that the text in the document must use a standard character set. It need only use one that has code that map to the glyphs in the embedded font.


If you look at the displayed text vs. the "gibberish" you can see that some of the letters are the same, while some of the letters are substituted. For example, "Mltmrprfsm Jblbemr" is "Enterprise Manager". Given enough text, you should be able to build a quick translation table. Already, we know that M translates to E, L -> N, T, R, and P are clear, F -> R, etc. Given some time, detective work, and modest programming skills, one could translate the whole document.

Of course, there's no guarantee that the next document would use the same ff6 font that Dan D. mentioned, so grabbing that font for local use should be your next step if you want to save the text for later.