Extracting website URL
Is there a way in Ubuntu to find all the directories in a website?
I have a website, and I want to check the internal links (directories) of that website.
Something like this:
...
The problem with this website is when I enter something like ubuntu.com/cloud, it doesn't show the subdirectories.
Solution 1:
Open the terminal and type:
sudo apt install lynx
lynx -dump -listonly -nonumbers "https://www.ubuntu.com/" | uniq -u
This command improves upon the previous command by redirecting the output to a text file named links.txt.
lynx -dump "https://www.ubuntu.com/" | awk '/http/{print $2}' | uniq -u > links.txt
Solution 2:
See this answer from superuser.com:
wget --spider -r --no-parent http://some.served.dir.ca/
ls -l some.served.dir.ca
There are free websites which will do this for you and convert the output to xml format though. I suggest you look into one of those as well to see which method is more suitable for your needs.
Edit OP has included a new screenprint