Downloading linked images of a web
Is it possible to download all .jpg and .png files linked in a web? I want to download the images from each post of each thread of [this forum][1] containing a link. For example [this post][2] contains a link to [this file][3].
I've tried with wget:
wget -r -np http://www.mtgsalvation.com/forums/creativity/artwork/340782-official-digital-rendering-thread?
and it copied all the html files of that thread. Although I don't know why it jumps from ...thread?comment=336
to ...thread?comment=3232
, when it was going one by one until comment 336.
Solution 1:
Try with this command:
wget -P path/where/save/result -A jpg,png -r http://www.mtgsalvation.com/forums/creativity/artwork/
According to wget man page:
-A acclist --accept acclist
Specify comma-separated lists of file name suffixes or patterns to
accept or reject (@pxref{Types of Files} for more details).
-P prefix
Set directory prefix to prefix. The directory prefix is the direc‐
tory where all other files and subdirectories will be saved to,
i.e. the top of the retrieval tree. The default is . (the current
directory).
-r
--recursive
Turn on recursive retrieving.
Try this:
mkdir wgetDir
wget -P wgetDir http://www.mtgsalvation.com/forums/creativity/artwork/340782-official-digital-rendering-thread?page=145
This command will get html page and put it in wgetDir
. When I tried this command I found this file:
340782-official-digital-rendering-thread?page=145
then, I tried this command:
wget -P wgetDir -A png,jpg,jpeg,gif -nd --force-html -r -i "wgetDir/340782-official-digital-rendering-thread?page=145"
and it downloads images. So, it seems to work, although I do not know if these pictures are the ones you want to download.
Solution 2:
#include <stdio.h>
#include <stdlib.h> // for using system calls
#include <unistd.h> // for sleep
int main ()
{
char body[] = "forum-post-body-content", notes[] = "p-comment-notes", img[] = "img src=", link[200], cmd[200]={0}, file[10];
int c, pos = 0, pos2 = 0, fin = 0, i, j, num = 0, found = 0;
FILE *fp;
for (i = 1; i <= 149; ++i)
{
sprintf(cmd,"wget -O page%d.txt 'http://www.mtgsalvation.com/forums/creativity/artwork/340782-official-digital-rendering-thread?page=%d'",i,i);
system(cmd);
sprintf(file, "page%d.txt", i);
fp = fopen (file, "r");
while ((c = fgetc(fp)) != EOF)
{
if (body[pos] == c)
{
if (pos == 22)
{
pos = 0;
while (fin == 0)
{
c = fgetc (fp);
if (feof (fp))
break;
if (notes[pos] == c)
{
if (pos == 14)
{
fin = 1;
pos = -1;
}
++pos;
}
else
{
if(pos > 0)
pos = 0;
}
if (img[pos2] == c)
{
if (pos2 == 7)
{
pos2 = 0;
while (found == 0)
{
c = fgetc (fp); // get char from file
link[pos2] = c;
if (pos2 > 0)
{
if(link[pos2-1] == 'g' && link[pos2] == '\"')
{
found = 1;
}
}
++pos2;
}
--pos2;
found = 0;
char link2[pos2];
for (j = 1; j < pos2; ++j)
{
link2[j - 1] = link[j];
}
link2[j - 1] = '\0';
sprintf(cmd, "wget -O /home/arturo/Dropbox/Digital_Renders/%d \'%s\'", ++num, link2);
system(cmd);
pos2 = -1;
}
++pos2;
}
else
{
if(pos2 > 0)
pos2 = 0;
}
}
fin = 0;
}
++pos;
}
else
pos = 0;
}
// closing file
fclose (fp);
if (remove (file))
fprintf(stderr, "Can't remove file\n");
}
}