I used wget to download html files, where are the images in the file stored?
Firefox was loading very slow, so I decided to use wget
to save HTML files.I used the following command,
wget http://textbook.s-anand.net/ncert/class-xii/chemistry/hello-this-first-chapter
The files have been saved in my home folder.But I don't know where the images are stored.I need them to use in Anki
.
So where are the images stored?
I prefer to use --page-requisites
(-p
for short) instead of -r
here as it downloads everything the page needs to display but no other pages, and I don't have to think about what kind of files I want.
Actually I'm usually using something like
wget -E -H -k -p http://textbook.s-anand.net/ncert/class-xii/chemistry/hello-this-first-chapter
This means:
-
-E
: Append.html
to the file name if it is an HTML file but doesn't end in.html
or similar -
-H
: Download files from other hosts, too -
-k
: After downloading convert any link in it so they point to the downloaded files -
-p
: Download anything the page needs for proper offline viewing
using the -r parameter should enable wget to download the whole folder, including your images.
wget -r http://textbook.s-anand.net/ncert/class-xii/chemistry/hello-this-first-chapter
Downloading the image files separately as well
I think this command could get you started.
wget -r -P /save/location -A jpeg,jpg,bmp,gif,png http://textbook.s-anand.net/ncert/class-xii/chemistry/hello-this-first-chapter
It allows you to specify the location to save the images and which types of files you wants. Maybe downloading the images as such is easier.
Source:
-r enables recursive retrieval. See Recursive Download for more information.
-P sets the directory prefix where all files and directories are saved to.
-A sets a whitelist for retrieving only certain file types. Strings and patterns are accepted, and both can be used in a comma separated list (as seen above). See Types of Files for more information.
Copying the image files from your folder
I have noticed that the website uses PNG image files. You can just copy those from your folder. This should be run in the folder where you stored the webpage.
find . -name "*.png" -exec cp '{}' ./some_dir/somewhere/ \;