How to get text from a webpage from the command line?
I am looking for a command line equivalent to the Get text from webpage
option in /Applications/Automator.app
. The Get text from webpage
option is pretty self explanatory, it gets the text only, without HTML tags, CSS, JavaScript, etc. from a webpage. I know I can use wget
or curl
but that will give me all the HTML tags, CSS, JavaScript, etc. not the only text version of the webpage.
Your best option is the textutil
command. Read the man page, but something like:
textutil -convert txt webpage.html
should work.