Extracting website URL

Is there a way in Ubuntu to find all the directories in a website?

I have a website, and I want to check the internal links (directories) of that website.

Something like this:

Enter image description here

...

The problem with this website is when I enter something like ubuntu.com/cloud, it doesn't show the subdirectories.


Solution 1:

Open the terminal and type:

sudo apt install lynx  
lynx -dump -listonly -nonumbers "https://www.ubuntu.com/" | uniq -u 

This command improves upon the previous command by redirecting the output to a text file named links.txt.

lynx -dump "https://www.ubuntu.com/" | awk '/http/{print $2}' | uniq -u > links.txt

Solution 2:

See this answer from superuser.com:

wget --spider -r --no-parent http://some.served.dir.ca/
ls -l some.served.dir.ca

There are free websites which will do this for you and convert the output to xml format though. I suggest you look into one of those as well to see which method is more suitable for your needs.

Edit OP has included a new screenprint