I used wget to download html files, where are the images in the file stored?

Firefox was loading very slow, so I decided to use wget to save HTML files.I used the following command,

wget http://textbook.s-anand.net/ncert/class-xii/chemistry/hello-this-first-chapter

The files have been saved in my home folder.But I don't know where the images are stored.I need them to use in Anki.

So where are the images stored?


I prefer to use --page-requisites (-p for short) instead of -r here as it downloads everything the page needs to display but no other pages, and I don't have to think about what kind of files I want.

Actually I'm usually using something like

wget -E -H -k -p http://textbook.s-anand.net/ncert/class-xii/chemistry/hello-this-first-chapter

This means:

  • -E: Append .html to the file name if it is an HTML file but doesn't end in .html or similar
  • -H: Download files from other hosts, too
  • -k: After downloading convert any link in it so they point to the downloaded files
  • -p: Download anything the page needs for proper offline viewing

using the -r parameter should enable wget to download the whole folder, including your images.

wget -r http://textbook.s-anand.net/ncert/class-xii/chemistry/hello-this-first-chapter

Downloading the image files separately as well

I think this command could get you started.

 wget -r -P /save/location -A jpeg,jpg,bmp,gif,png http://textbook.s-anand.net/ncert/class-xii/chemistry/hello-this-first-chapter

It allows you to specify the location to save the images and which types of files you wants. Maybe downloading the images as such is easier.

Source:

-r enables recursive retrieval. See Recursive Download for more information.

-P sets the directory prefix where all files and directories are saved to.

-A sets a whitelist for retrieving only certain file types. Strings and patterns are accepted, and both can be used in a comma separated list (as seen above). See Types of Files for more information.

Copying the image files from your folder

I have noticed that the website uses PNG image files. You can just copy those from your folder. This should be run in the folder where you stored the webpage.

find . -name "*.png" -exec cp '{}' ./some_dir/somewhere/ \;