How to grab only the 2nd-level domains from a list of subdomains
What I need
I have a list of domains like so:
a.example.com
b.foo.com
a.b.bar.com
I only want the output to grab the second-level domains and nothing else, i.e., no 3rd-level or higher. This is what I'm looking for from my example list above:
example.com
foo.com
bar.com
What I tried
I've tried using sed
, awk
, and cut
as follows:
sed
cat domains.txt | sed 's/\.$//g'
cat domains.txt | sed -r 's/^(.*)_/\1\\/; s/.$//g' # this removes the last character for some reason
awk
awk '{ sub(/\.$/, ""); print $NF }' domains.txt
cat domains.txt | awk -F\. '{print $1,$2}' | tr ' ' '.' # won't work since there are 4th level domains
cut
cat domains.txt | cut -d '.' -f[field] # won't work since there are 4th level domains
In cases where you need to start your match from the right, you can use an end anchor $
to fixate the pattern to the end of the line.
grep:
grep -Po '[^.]+\.[^.]+$' domains.txt
sed:
sed 's/.*\.\([^.]\+\.[^.]\+\)$/\1/' domains.txt
awk
has a pre-defined variable named NF
holding the number of fields for the current record. You may combine the NF
variable with the field specifier $
to reference the value instead.
awk:
awk -F . -vOFS=. '{print $(NF-1), $NF}' domains.txt
You can also reverse the text for commands like: read
or cut
that purely reads from left to right.
rev, cut:
rev domains.txt | cut -d . -f1,2 | rev
Bash only example:
while read -r; do \
printf %s\\n ${REPLY/#${REPLY%.*.*}.}; \
done < domains.txt