Extracting website URL

Is there a way in Ubuntu to find all the directories in a website?

I have a website, and I want to check the internal links (directories) of that website.

Something like this:

Enter image description here

...

The problem with this website is when I enter something like ubuntu.com/cloud, it doesn't show the subdirectories.

Solution 1:

Open the terminal and type:

sudo apt install lynx  
lynx -dump -listonly -nonumbers "https://www.ubuntu.com/" | uniq -u

This command improves upon the previous command by redirecting the output to a text file named links.txt.

lynx -dump "https://www.ubuntu.com/" | awk '/http/{print $2}' | uniq -u > links.txt

Solution 2:

See this answer from superuser.com:

wget --spider -r --no-parent http://some.served.dir.ca/
ls -l some.served.dir.ca

There are free websites which will do this for you and convert the output to xml format though. I suggest you look into one of those as well to see which method is more suitable for your needs.

Edit OP has included a new screenprint

My Linux desktop sees my HDMI-connected monitor, but my monitor says "No signal"

How can I convert a whole XML dump from Wikia to Kindle? [closed]

How to remove a file with unprintable charaters in Mac OS X Terminal [duplicate]

Unable to partition MacBook Air (Disk Utility and free space re. local snapshots)

Win8x64, after format/upgrade from Win7x86, "GPT Protective Partition" on secondary drive

How does a software like Cheat Engine work?

Is there a way to make zsh run a command after REPORTTIME?

Roommate lagging internet connection by watching videos from Chinese website. Can QoS fix issue?

Windows 8 stuck at loading screen when booting - black screen when repairing

Quickly strike-through a whole line in org-mode tables

Slow msysGit performance on mapped network drive

Why is "flashplayerplugin" appearing in my command line output?

Extracting website URL

Solution 1:

Solution 2:

Related

Recent Posts