Capture text from website that's disabled copy-paste, printing and viewing source

I'd like to capture text from a website (for personal usage, not business) but unfortunately they have disabled copy-paste, printing and viewing the source.

Is there another way of capturing the text?


Solution 1:

Is there another way of capturing the text?

Likely so, but the methods you may want to consider using likely depend on several factors:

  • What is the intended use of this text? (does it need to be editable or are you interested in simply reading it)?

  • What type of text is it? (i.e. is it "normal" text or e.g. a .pdf or another type of document?)

  • Is the text produced by JavaScript?

  • What security measures has the site implemented to protect this text?

  • How much effort are are you willing to expend to copy the text?

Images

As hinted at in the answer by @John, producing an image of the text would be one simple way to obtain a copy.

Personally, I wouldn't choose a smart phone for copying a website, however. Most browsers have the ability to be maximized and there are options for taking screenshots (pictures of program windows or your desktop) for most operating systems.

One drawback to this method might be the need to edit or stitch images together to obtain complete documents and/or the need to extract text with optical character recognition (OCR). There are free programs available that will allow you to do this, but this is obviously more complex than simple "cut and paste" operations.

Source Code

You can (potentially) get the source code of the web page by saving the web page from the browser (as HTML) or by using another program (e.g. curl or wget) to download the page. However, this may not always capture the desired text. If the text is rendered by JavaScript, or is an embedded document, it may be loaded/rendered by the browser as a secondary operation after the "basic" page source is loaded.

Removing The Blocking Code

Unfortunately they have disabled copy-paste, printing and viewing the source.

This is almost certainly done via JavaScript. It might be possible to use a browser plugin to run custom JavaScript (a "user script") on the page to modify the page and remove the blocking JavaScript, so the page behaves normally. That said, this would probably depend on any "user script" running first, as well as finding or creating a script able to remove the offending code in the first place.

Browser Automation

Modern Chromium-based browsers (e.g. Chrome, Chromium, Firefox) can be used with scripting languages such as JavaScript or Python (my personal preference) to manipulate them. Importantly, they can (in certain instances) extract text from the page.