Using wget to download PDF files from a site that requires cookies to be set

I think you need to use --keep-session-cookies to preserve session cookies, rather than just --save-cookies (you need both).

Basically, you

wget --keep-session-cookies --save-cookies ..... url

to login and get your session cookie.

then

wget --load-cookie ...... url

to download the PDF.


Maybe this will help. The site I was trying to login into had some hidden fields that I needed to get before I could successfully login. So the first wget gets the login page to find the extra fields, the second wget logs into the site and saves the cookies, the third one then uses those cookies to get the page you're after.

#!/bin/bash

# get the login page to get the hidden field data
wget -a log.txt -O loginpage.html http://foobar/default.aspx
hiddendata=`cat loginpage.html | grep value | grep foobarhidden | tr '=' ' ' | awk '{print $9}' | sed s/\"//g`
rm loginpage.html

# login into the page and save the cookies
postData=user=fakeuser'&'pw=password'&'foobarhidden=${hiddendata}
wget -a log.txt -O /dev/null --post-data ${postData} --keep-session-cookies --save-cookies cookies.txt http://foobar/default.aspx

# get the page you're after
wget -a log.txt -O results.html --load-cookies cookies.txt http://foobar/lister.aspx?id=42
rm cookies.txt

There's some useful information on this other SO post: