I used wget to download html files, where are the images in the file stored?

Firefox was loading very slow, so I decided to use wget to save HTML files.I used the following command,

wget http://textbook.s-anand.net/ncert/class-xii/chemistry/hello-this-first-chapter

The files have been saved in my home folder.But I don't know where the images are stored.I need them to use in Anki.

So where are the images stored?

I prefer to use --page-requisites (-p for short) instead of -r here as it downloads everything the page needs to display but no other pages, and I don't have to think about what kind of files I want.

Actually I'm usually using something like

wget -E -H -k -p http://textbook.s-anand.net/ncert/class-xii/chemistry/hello-this-first-chapter

This means:

-E: Append .html to the file name if it is an HTML file but doesn't end in .html or similar
-H: Download files from other hosts, too
-k: After downloading convert any link in it so they point to the downloaded files
-p: Download anything the page needs for proper offline viewing

using the -r parameter should enable wget to download the whole folder, including your images.

wget -r http://textbook.s-anand.net/ncert/class-xii/chemistry/hello-this-first-chapter

Downloading the image files separately as well

I think this command could get you started.

 wget -r -P /save/location -A jpeg,jpg,bmp,gif,png http://textbook.s-anand.net/ncert/class-xii/chemistry/hello-this-first-chapter

It allows you to specify the location to save the images and which types of files you wants. Maybe downloading the images as such is easier.

Source:

-r enables recursive retrieval. See Recursive Download for more information.

-P sets the directory prefix where all files and directories are saved to.

-A sets a whitelist for retrieving only certain file types. Strings and patterns are accepted, and both can be used in a comma separated list (as seen above). See Types of Files for more information.

Copying the image files from your folder

I have noticed that the website uses PNG image files. You can just copy those from your folder. This should be run in the folder where you stored the webpage.

find . -name "*.png" -exec cp '{}' ./some_dir/somewhere/ \;

I used wget to download html files, where are the images in the file stored?

Downloading the image files separately as well

Copying the image files from your folder

Related

Recent Posts