How bad is IPv4 address exhaustion really?

For years the press has been writing about the problem that there are now very few IPv4 addresses available. But on the other hand, I'm using a server hosting company which gladly gives out public IPv4 addresses for a small amount of money. And my private internet connection comes with a public IPv4 address.

How is that possible? Is the problem as bad as the press wants us to believe?


Solution 1:

It's very bad. Here is a list of examples of what I have first hand experience with consumer ISPs doing to fight the shortage of IPv4 addresses:

  • Repeatedly shuffling around IPv4 blocks between cities causing brief outages and connection resets for customers.
  • Shortening DHCP lease times from days to minutes.
  • Allow users to choose if they want network address translation (NAT) on the Customer Premise Equipment (CPE) or not, then retroactively turn it on for everybody anyway.
  • Enabling NAT on CPE for customers who already used the opportunity to opt out of NAT.
  • Reducing the cap on number of concurrently active media access control (MAC) addresses enforced by CPE.
  • Deploying carrier-grade NAT (CGN) for customers who had a real IP address when they signed up for the service.

All of these are reducing the quality of the product the ISP is selling to their customers. The only sensible explanation for why they would be doing this to their customers is shortage of IPv4 addresses.

The shortage of IPv4 addresses has lead to fragmentation of the address space which has multiple shortcomings:

  • Administrative overhead which not only costs time and money, but also is error prone and has lead to outages.
  • Large usage of content addressable memory (CAM) capacity on backbone routers which a few years back lead to a significant outage across multiple ISPs when it crossed the limit of a particular popular model of routers.

Without NAT there is no way we could get by today with the 3700 million routable IPv4 addresses. But NAT is a brittle solution which gives you a less reliable connectivity and problems that are difficult to debug. The more layers of NAT the worse it will be. Two decades of hard work has made a single layer of NAT mostly work, but we have already crossed the point where a single layer of NAT was sufficient to work around the shortage of IPv4 addresses.

Solution 2:

Before we started to run out of IPv4 addresses, we didn't (widely) use NAT. Every internet connected computer would have its own globally unique address. When NAT was first introduced, it was to move from giving ISP's customers 1 real address per device the customer used/owned to giving 1 customer 1 real address. That fixed the problem for a while (years) while we were supposed to be switching to IPv6. Instead of switching to IPv6, (mostly) everybody waited for everybody else to switch and so (mostly) nobody rolled out IPv6. Now we are hitting the same problem again, but this time, a second layer of NAT is being deployed (CGN) so that ISPs can share 1 real address between multiple customers.

IP address exhaustion is not a big deal if NAT is not terrible, including in the case where the end user has no control over it (Carrier Grade NAT or CGN).

But I would argue that NAT is terrible, especially in the case where the end user does not have control over it. And (as a person whose job is network engineering/administration but has a software engineering degree) I would argue that by deploying NAT instead of IPv6, network administrators have shifted the weight of solving the address exhaustion out of their field and on to end users and application developers.

So (in my opinion), why is NAT a terrible, evil thing that should be avoided?

Lets see if I can do it justice in explaining what it breaks (and what issues it causes that we've become so accustomed to that we don't even realize it could be better):

  • Network layer independence
  • Peer to peer connections
  • Consistent naming and location of resources
  • Optimal routing of traffic, hosts knowing their real address
  • Tracking the source of malicious traffic
  • Network protocols that separate data and control into separate connections

Let's see if I can explain each of those items.

Network layer independence

ISPs are supposed to just pass around layer 3 packets and not care what is in the layers above that. Whether you are passing around TCP, UDP, or something better/more exotic (SCTP maybe? or even some other protocol that is better than TCP/UDP, but is obscure because of a lack of NAT support), your ISP is not supposed to care; it's all supposed to just look like data to them.

But it doesn't -- not when they are implementing the "second wave" of NAT, "Carrier Grade" NAT. Then they necessarily have to look at, and support, the layer 4 protocols you want to use. Right now, that practically means you can only use TCP and UDP. Other protocols would either just be blocked/dropped (vast majority of the cases in my experience) or just forwarded to the last host "inside" the NAT that used that protocol (I've seen 1 implementation that does this). Even forwarding to the last host that used that protocol isn't a real fix -- as soon as two hosts use it, it breaks.

I imagine there are some replacement protocols for TCP & UDP out there that are currently untested and unused just because of this issue. Don't get me wrong, TCP & UDP were impressively well designed and it is amazing how both of them have been able to scale up to the way we use the internet today. But who knows what we've missed out on? I've read about SCTP and it sounds good, but never used it because it was impractical because of NAT.

Peer to Peer connections

This is a big one. Actually, the biggest in my opinion. If you have two end users, both behind their own NAT, no matter which one tries to connect first, the other user's NAT will drop their packet and the connection will not succeed.

This affects games, voice/video chat (like Skype), hosting your own servers, etc.

There are workarounds. The problem is that those workarounds cost either developer time, end user time & inconvenience, or service infrastructure costs. And they aren't foolproof and sometimes break. (See other users comments about the outage suffered by Skype.)

One workaround is port forwarding, where you program the NAT device to forward a specific incoming port to a specific computer behind the NAT device. There are entire websites devoted to how to do this for all the different NAT devices there are out there. See https://portforward.com/. This typically costs end user time and frustration.

Another workaround is to add support for things like hole punching to applications, and maintain server infrastructure that is not behind a NAT to introduce two NATed clients. This usually costs development time, and puts developers in a position of potentially maintaining server infrastructure where it would have not be previously required.

(Remember what I said about deploying NAT instead of IPv6 shifting the weight of the issue from network administrators to end users and application developers?)

Consistent naming/location of network resources

Because a different address space is used on the inside of a NAT then on the outside, any service offered by a device inside a NAT has multiple addresses to reach it by, and the correct one to use depends on where the client is accessing it from. (This is still a problem even after you get port forwarding working.)

If you have a web server inside a NAT, say on port 192.168.0.23 port 80, and your NAT device (router/gateway) has a external address of 35.72.216.228, and you set up port forwarding for TCP port 80, now your webserver can be accessed by using either 192.168.0.23 port 80 OR 35.72.216.228 port 80. The one you should use depends on whether you are inside or outside of the NAT. If you are outside of the NAT, and use the 192.168.0.23 address, you will not get to where you are expecting. If you are inside the NAT, and you use the external address 35.72.216.228, you might get where you want to, if your NAT implementation is an advanced one that supports hairpin, but then the the web server serving your request will see the request as coming from your NAT device. This means that all traffic must go through the NAT device, even if there is a shorter path in the network behind the NAT, and it means that logs on the web server become much less useful because they all list the NAT device as the source of the connection. If your NAT implementation doesn't support hairpin, then you will not get where you were expecting to go.

And this problem gets worse as soon as you use DNS. Suddenly, if you want everything to work properly for something hosted behind NAT, you will want to give different answers on the address of the service hosted inside a NAT, based on who is asking (AKA split horizon DNS, IIRC). Yuck.

And that is all assuming you have someone knowledgeable about port forwarding and hairpin NAT and split horizon DNS. What about end users? What are their chances of getting this all set up right when they buy a consumer router and some IP security camera and want it to "just work"?

And that leads me to:

Optimal routing of traffic, hosts knowing their real address

As we have seen, even with advanced hairpin NAT traffic doesn't always flow though the optimal path. That is even in the case where a knowledgeable administrator sets up a server and has hairpin NAT. (Granted, split horizon DNS can lead to optimal routing of internal traffic in the hands of a network administrator.)

What happens when an application developer creates a program like Dropbox, and distribute it to end users that don't specialize in configuring network equipment? Specifically, what happens when I put a 4GB file in my share file, and then try to access in on the next computer over? Does it directly transfer between the machines, or do I have to wait for it to upload to a cloud server through a slow WAN connection, and then wait a second time for it to download through the same slow WAN connection?

For a naive implementation, it would be uploaded and then downloaded, using Dropbox's server infrastructure that is not behind a NAT as a mediator. But if the two machines could only realize that they are on the same network, then they could just directly transfer the file much faster. So for our first less-naive implementation try, we might ask the OS what IP(v4) addresses the machine has, and then check that against other machines registered on the same Dropbox account. If it's in the same range as us, just directly transfer the file. That might work in a lot of cases. But even then there is a problem: NAT only works because we can re-use addresses. So what if the 192.168.0.23 address and the 192.168.0.42 address registered on the same Dropbox account are actually on different networks (like your home network and your work network)? Now you have to fail back to using the Dropbox server infrastructure to mediate. (In the end, Dropbox tried to solve the problem by having each Dropbox client broadcast on the local network in hopes of finding other clients. But those broadcasts do not cross any routers you might have behind the NAT, meaning it is not a full solution, especially in the case of CGN.)

Static IPs

Additionally, since the first shortage (and wave of NAT) happened when many consumer connections were not always on connections (like dialup), ISPs could make better use of their addresses by only allocating public/external IP addresses when you were actually connected. That meant that when you connected, you got whatever address was available, instead of always getting the same one. That makes running your own server that much harder, and it makes developing peer to peer applications harder because they need to deal with peers moving around instead of being at fixed addresses.

Obfuscation of the source of malicious traffic

Because NAT re-writes outgoing connections to be as if they are coming from the NAT device itself, all of the behavior, good or bad, is rolled into one external IP address. I have not seen any NAT device that logs each outgoing connections by default. This means that by default, the source of past malicious traffic can only be traced to the NAT device it went through. While the more enterprise or carrier class equipment can be configured to log each outgoing connection, I have not seen any consumer routers that do it. I certainly think it will be interesting to see if (and for how long) ISPs will keep a log of all TCP and UDP connections made through CGNs as they roll them out. Such records would be needed to deal with abuse complaints and DMCA complaints.

Some people think that NAT increases security. If it does, it does so through obscurity. The default drop of incoming traffic that NAT makes mandatory is the same as having a stateful firewall. It is my understanding that any hardware capable of doing the connection tracking needed for NAT should be able to run a stateful firewall, so NAT doesn't really deserve any points there.

Protocols that use a second connection

Protocols like FTP and SIP (VoIP) tend to use separate connections for control and actual data content. Each protocol that does this must have helper software called an ALG (application layer gateway) on each NAT device it passes through, or work around the issue with some kind of mediator or hole punching. In my experience, ALGs are rarely if ever updated and have been the cause of at least a couple of issues I have dealt with involving SIP. Any time I hear someone report that VoIP didn't work for them because audio only worked one way, I instantly suspect that somewhere, there is a NAT gateway dropping UDP packets it can't figure out what to do with.

In summary, NAT tends to break:

  • alternative protocols to TCP or UDP
  • peer-to-peer systems
  • accessing something hosted behind the NAT
  • things like SIP and FTP. ALGs to work around this still cause random and weird problems today, especially with SIP.

At the core, the layered approach that the network stack takes is relatively simple and elegant. Try to explain it to someone new to networking, and they inevitably assume their home network is probably a good, simple network to try to understand. I've seen this lead in a couple of cases to some pretty interesting (excessively complicated) ideas about how routing works because of confusion between external and internal addresses.

I suspect that without NAT, VoIP would be ubiquitous and integrated with the PSTN, and that making calls from a cell phone or computer would be free (except for the internet you already paid for). After all, why would I pay for phone when you and I can just open a 64K VoIP stream and it works just as well as the PSTN? It seems like today, the number 1 issue with deploying VoIP is going through NAT devices.

I suspect we don't usually realize how much simpler many things could be if we had the end to end connectivity that NAT broke. People still email (or Dropbox) themselves files because if the core problem of needing a mediator for when two clients are behind NAT.

Solution 3:

One big symptom of IPv4 exhaustion I didn't see mentioned in other answers is that some mobile service providers started going IPv6-only several years ago. There's a chance you've been using IPv6 for years and didn't even know it. Mobile providers are newer to the Internet game, and don't necessarily have huge pre-existing IPv4 allocations to draw from. They also require more addresses than cable/DSL/fiber, because your phone can't share a public IP address with other members of your household.

My guess is IaaS and PaaS providers will be next, due to their growth that isn't tied to customers' physical addresses. I wouldn't be surprised to see IaaS providers offering IPv6-only at a discount soon.

Solution 4:

The major RIRs ran out of space for normal allocations a while ago. For most providers therefore the only sources of IPv4 addresses are their own stockpiles and the markets.

There are scenarios in which it is preferable to have a dedicated public IPv4 IP but it's not absolutely essential. There are also a bunch of public IPv4 addresses that are allocated but not currently in use on the public internet (they may be in use on private networks or they may not be in use at all). Finally there are older networks with addresses allocated far more loosely than they need to be.

The three largest RIRs now allow addresses to be sold both between their members and to each others members. So we have a market between organizations who either have addresses they are not using or who have addresses that could be freed up for a cost on one side and organizations who really need more IP addresses on the other.

What is difficult to predict is how much supply and demand there will be at each price-point and therefore what the market price will do in future. So-far the price per IP seems to have remained surprisingly low.