How to download website offline with authenticated username and password?
I have an account of this tutorial website testdriven.io, and I would like to download the tutorial offline for my team member to lean without having to login the credential.
So, I tried several ways without success.
First, I logined account and start download as wget -r --mirror -p --convert-links -P . https://testdriven.io/courses/
. However, the result was an offline website without login account and tutorial was limitted accordingly.
Second, I tried to pass the parameter string as this
wget --save-cookies cookies.txt \
--keep-session-cookies \
--post-data '[email protected]&password=z9vi2gE82lO@sTN' \
--delete-after \
https://testdriven.io/courses/
Yet, it returned
--2019-12-18 02:01:22-- https://testdriven.io/courses/
Resolving testdriven.io (testdriven.io)... 104.27.143.239, 104.27.142.239, 2606:4700:30::681b:8eef, ...
Connecting to testdriven.io (testdriven.io)|104.27.143.239|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2019-12-18 02:01:23 ERROR 403: Forbidden.
Thus, how can I manage to download full offline tutorial with providing authenticated username and password? Thanks.
Solution 1:
The website will store your auth information in a cookie.
You can find this in your browser's network inspector. Look under the request headers and grab the cookies for use with wget.
You will need to pass the cookie into wget
, and theoretically maintain a cookie jar as well using --save-cookies
and --load-cookies
.
For example:
wget -r --mirror -p --convert-links -P . \
--header="Cookie: __cfduid=ddebc00435655a6a20430c65436f729851576611229; csrftoken=6QuufXScgoQkyEe18dAL9YmqhxlyJpegNtyMCr4LgAUuvBs3KUzQwqEYBvWZV4yg; sessionid=c5gbfxkhqwpblxlhatgfh3wtfgy0zgpp" \
--save-cookies cookies.txt \
--load-cookies cookies.txt \
--accept-regex '/courses/' \
https://testdriven.io/courses/auth-flask-react/
Solution 2:
Read man wget
, especially the part that says:
--user=user
--password=password
Specify the username user and password password for both FTP and HTTP file retrieval. These parameters can be
overridden using the --ftp-user and --ftp-password options for FTP connections and the --http-user and --http-password
options for HTTP connections.
Read about all the wget
options. Would this help?:
--metalink-over-http
Issues HTTP HEAD request instead of GET and extracts Metalink metadata from response headers. Then it switches to
Metalink download. If no valid Metalink metadata is found, it falls back to ordinary HTTP download.