Identifying the TLD and SLD of a URL

Solution 1:

You can't do this by just looking at the URL (string). You need either to consult the Public Suffix List, or do live DNS queries. Both cases have strengths and drawbacks.

So you can reply into those kind of questions only if you know the rules of all TLDs existing (aka: it is impossible)

Why? Mostly because DNS is both not widespread knowledge, there is a lot of variations per TLDs, because each node in the DNS tree is free to have whatever policies (including names and types of delegations) at his level and everything below, and because, contrary to widely spread belief, a . in a name does not necessarily mean a delegation and hence a cut between administrative and technical boundaries.

Also TLD is often abused. Technically it is the rightmost (top most) label, this is the meaning of T. Hence in your example it should be uk. BUT not long ago it was not possible to register names directly under uk, only co.uk, org.uk, me.uk and others would be open and hence in this case, we would say that co.uk is the TLD, which is wrong technically but widely understood. It means in fact the "effective TLD" or the "public suffix".

See all these valid hostnames (they work inside an URLs):

  • dk
  • www.sante.gouv.fr
  • www.com.com
  • www.nominet.co.uk
  • www.uk.com
  • www.walton.k12.fl.us
  • lagazettedesancetres.blogspot.fr
  • www.al.ma.leg.br
  • ab.m.wikibooks.nom.nu
  • 1512f1.станок.спб.рус

BTW, useful terminologies I suggest using which may often be clearer than domain/subdomain, as taken from https://url.spec.whatwg.org/#host-miscellaneous

"A host’s public suffix is the portion of a host which is included on the Public Suffix List."

"A host’s registrable domain is a domain formed by the most specific public suffix, along with the domain label immediately preceding it, if any."