How discoverable are IPv6 addresses and AAAA names by potential attackers?
It is fairly standard to receive a significant number of minor hacking attempts each day trying common username / passwords for services like SSH and SMTP. I've always assumed these attempts are using the "small" address space of IPv4 to guess IP addresses. I notice that I get zero hacking attempts on IPv6 despite my domain having AAAA Name records mirroring every A Name record and all IPv4 services are also open to IPv6.
Assuming a public DNS (AWS route 53) with an obscure subdomain pointing to a reasonably randomised /64 suffix; Are IPv6 addresses and / subdomains remotely discoverable without trying every address in a /64 bit prefix or every subdomain in a very long list of common names?
I am of course aware that crawling the web looking for listed (sub)domain names is simple enough. I'm also aware that machines on the same subnet can use NDP. I'm more interested in whether DNS or the underlying protocols of IPv6 allow discovery / listing unknown domains and addresses by remote.
Solution 1:
Malicious bots don't guess IPv4 addresses anymore. They simply try them all. On modern systems this can take as little as a few hours.
With IPv6, this is not really possible any longer, as you've surmised. The address space is so much larger that it's not even possible to brute-force scan a single /64 subnet within a human lifetime.
Bots will have to get more creative if they are to continue blind scanning on IPv6 as on IPv4, and malicious bot operators will have to get accustomed to waiting far longer between finding any machines, let alone vulnerable ones.
Fortunately for the bad guys and unfortunately for everyone else, IPv6 adoption has gone much more slowly than it really should have. IPv6 is 23 years old but has only seen significant adoption in the last five years or so. But everyone is keeping their IPv4 networks active, and extremely few hosts are IPv6-only, so malicious bot operators have had little incentive to make the switch. They probably won't do until there is a significant abandonment of IPv4, which probably won't happen in the next five years.
I expect that blind guessing probably won't be productive for malicious bots, when they finally do move to IPv6, so they'll have to move to other means, like brute-forcing DNS names, or targeted brute-forcing of small subsets of each subnet.
For instance, a common DHCPv6 server configuration gives out addresses in ::100
through ::1ff
by default. That's just 256 addresses to try, out of a whole /64. Reconfiguring the DHCPv6 server to pick addresses from a much larger range mitigates this problem.
And using modified EUI-64 addresses for SLAAC reduces the search space to 224 multiplied by the number of assigned OUIs. While this is over 100 billion addresses, it's far less than 264. Random bots won't bother to search this space, but state-level malicious actors will, for targeted attacks, especially if they can make educated guesses as to which NICs might be in use, to reduce the search space further. Using RFC 7217 stable privacy addresses for SLAAC is easy (at least on modern operating systems that support it) and mitigates this risk.
RFC 7707 describes several other ways in which reconnaissance might be performed in IPv6 networks to locate IPv6 addresses, and how to mitigate against those threats.
Solution 2:
I've found that MANY bots these days are not guessing, with IPv4 or IPv6. Security through obscurity is not security at all. Obscurity simply delays / reduces the number of attacks for a while, and then it is irrelevant.
Hackers know your company's domain name from your website or email address, what public server IPs you publish for things like email, SPF, web servers, etc. Though it may take them a bit longer to learn a random server name, but they will guess the common names, like www, mail, smtp, imap, pop, pop3, ns1, etc, and then scrape your website for any additional data they can find. They will retrieve from their store of previous scans your DNS names, IPs and what ports to focus on. They will also retrieve a list of email address / password pairs from any data breaches they can find and try all of those logins plus some extra ones with whatever systems they think you are running on your ports. They even go to the extent of learning the names and job roles of your staff to try and execute a social engineered attack. Our spam filter is continuously bombarded with attempts by scammers claiming to be someone from management needing an urgent wire transfer of funds. Oh they also learn who your business partners are and claim to be them, and letting you know their bank details have changed. Sometimes they even know what cloud platforms your business partners are using for their invoicing.
Criminals have access to big data tools just the same as everyone else, and they have amassed a surprisingly huge amount of data. See this testimony by some IT professionals to US congress https://www.troyhunt.com/heres-what-im-telling-us-congress-about-data-breaches/
Talking about data breaches, if a company looses something even as seemingly useless as a web server log, this will contain IP addresses v4 or v6 of everyone who used that server at that time, and what pages they accessed.
In conclusion, none of those methods require an attacker to guess what IP you are using, they already know.
Edit: As a bit of an exercise I spent all of 2 minutes browsing your site (from your profile), trying one of the online scan tools linked elsewhere here, and a bit of a look with nslookup and found out a few things about you. I'm guessing that one of the obscure addresses you are talking about involves
- a planet name similar to one of the ones you publish
- freeddns
- and an IPv6 address that ends with 2e85:eb7a
- and it runs ssh
As most of your other published IPv6 addresses end with ::1. This is only from information that you publish publicly with 1 tiny guess. Is this from the IP you wanted to hide?
Edit 2: Another quick look, I see you publish your email address on your website. Checking the https://haveibeenpwned.com/ site for what data breaches that address has been in and what data is out there on the black market. I see it's been in the breaches
- Adobe breach October 2013: Compromised data: Email addresses, Password hints, Passwords, Usernames
- MyFitnessPal: In February 2018 Compromised data: Email addresses, IP addresses, Passwords, Usernames
- MySpace: In approximately 2008 Compromised data: Email addresses, Passwords, Usernames
- PHP Freaks: In October 2015 Compromised data: Dates of birth, Email addresses, IP addresses, Passwords, Usernames, Website activity
- QuinStreet: In approximately late 2015 Compromised data: Dates of birth, Email addresses, IP addresses, Passwords, Usernames, Website activity
Seeing if that username part of the email address is used at some other popular email providers I see there is plenty more data. This would be another tiny guess that a bot could make. If some of it correlates with the part that is already known about you then the bot can assume that it is all you, it doesn't have to be certain, reasonably likely is enough. With additional data in these breaches
- Verifications.io: In February 2019 Compromised data: Dates of birth, Email addresses, Employers, Genders, Geographic locations, IP addresses, Job titles, Names, Phone numbers, Physical addresses
- River City Media Spam List In January 2017 Compromised data: Email addresses, IP addresses, Names, Physical addresses
- Apollo: In July 2018, the sales engagement startup Compromised data: Email addresses, Employers, Geographic locations, Job titles, Names, Phone numbers, Salutations, Social media profiles
- B2B USA Businesses In mid-2017 Compromised data: Email addresses, Employers, Job titles, Names, Phone numbers, Physical addresses
- Bitly: In May 2014 Compromised data: Email addresses, Passwords, Usernames
- Collection #1 (unverified): In January 2019, a large collection of credential stuffing lists (combinations of email addresses and passwords used to hijack accounts on other services) was discovered being distributed on a popular hacking forum
- Dropbox: In mid-2012 Compromised data: Email addresses, Passwords
- Exploit.In (unverified): In late 2016, a huge list of email address and password pairs appeared in a "combo list" referred to as "Exploit.In"
- HauteLook: In mid-2018 Compromised data: Dates of birth, Email addresses, Genders, Geographic locations, Names, Passwords
- Pemiblanc (unverified): In April 2018, a credential stuffing list containing 111 million email addresses and passwords known as Pemiblanc was discovered on a French server
- ShareThis: In July 2018 Compromised data: Dates of birth, Email addresses, Names, Passwords
- Ticketfly: In May 2018 Compromised data: Email addresses, Names, Phone numbers, Physical addresses
While the bot is at it, it can check facebook and it can see that one of the facebook pages with your name has the same photo as on your website, and now it knows some more about you and your friends. Plus I'm guessing that family member you list is your mother, who lists "your mother's maiden name". From facebook it can also verify which linkedin profile is yours.
There is much more information online about us than people realise. Big data and machine learning analysis is real, it's here now and much of the data that has been posted or leaked online can be correlated and used. Which you should know, seeing as you list that you've done a Bachelor's degree in AI and computer science in 2003-2007. Things have come a long way since then, particularly with the advances that Google was publishing from towards the end of your degree onwards. People being people, most will only be looking to profit from you, with some using the data reasonably and legally, but others will use it any way they can.
My point with all of this is two fold, that we publish more information than we think we do, and the whole point of DNS is to publish the conversion of names to IP addresses.
Solution 3:
Regarding AAAA records:
DNS is traditionally unencrypted. While there is a family of standards (DNSSEC) for signing DNS, the encryption of DNS records has had a far more haphazard deployment process, and so it is generally safest to assume that any MitM can read all of your DNS queries unless you have gone out of your way to configure encrypted DNS explicitly on the client side. You would know if you had done so because it's quite an ordeal.
(Also, your web browser is probably sending unencrypted SNI in the TLS handshake, after it has resolved the domain. It is not obvious how you would go about plugging this hole, since a VPN or Tor can still be MitM'd between the exit node or VPN termination point and the remote server. The good folks at Cloudflare are working on fixing this problem for good, but ESNI will also depend on client implementation, particularly for Chrome, if it's going to really get off the ground.)
However, MitM attacks may or may not be a problem, depending on your threat model. More important is the simple fact that DNS names are intended to be public information. Lots of people (search engines, DNS registrars, etc.) collect and publicize DNS names for entirely benign reasons. DNS resolvers typically apply rate limits, but these limits are usually quite generous, because they're meant to stop DoS attacks, not subdomain enumeration. Creating an HTTPS certificate often involves publishing the domain name for all to see, depending on the CA (Let's Encrypt does it, and so do many others). In practice, it is quite impossible to keep a domain or subdomain a secret, because just about everyone assumes they are public and makes no effort to hide them.
So, to answer this question:
I'm more interested in whether DNS or the underlying protocols of IPv6 allow discovery / listing unknown domains and addresses by remote.
Technically, no, it doesn't. But that does not matter because an enormous amount of higher-layer technology just assumes your DNS records are public, so public they will inevitably be.