Is a load balancer used once per new client or all the time?
Solution 1:
There are different kinds of load-balancing. You can have multiple public IPs, which are visible in DNS records. Each of those IPs could point directly to a server. It will then be the client which choose among them, and the client can do fail-over between them. You should not rely too much on the fail-over between servers, if you leave that up to the client.
You can adjust the above scenario by not handing out all your public IPs in all DNS requests. There are multiple reasons for not handing out all your public IPs:
- There may just be so many that the DNS reply would become too large.
- You may want more control over where load goes.
- You may want to direct users to servers that are geographically closer to them.
- You may want to stop telling clients about public IPs, which are currently out of service.
The above methods are generally known as DNS based load-balancing.
At the next layer in the chain, your public IPs could be virtual IPs, which can be migrated between different hardware units. Each virtual IP can only be routed to one piece of hardware at a time, so it would not make sense to have many more boxes at this layer than you have public IP addresses.
Such virtual IPs is more often used for availability, they are not very flexible as a load-balancing solution.
At the next layer you can have a conventional load-balancer. The load-balancer receives the requests from the clients and forward them to a pool of servers. All the traffic from the client to the server has to go through the load-balancer, but the processing the load-balancer need to perform can be extremely light.
This layer of load-balancer can operate in two different modes. They can operate either in the conventional proxy mode, in which one TCP connection is used between client and load-balancer and another TCP connection is used between load-balancer and server, or in DSR mode in which TCP connections are terminated on the servers behind the load-balancer.
In proxy mode the load-balancer not only have to handle all the packets from the clients. It also has to handle all the packets from the servers back to the clients. And the load-balancer need a full TCP stack with buffering and retransmissions.
In DSR mode the load-balancer only need a simple connection tracking of each connection from clients. This reduces memory usage significantly on the load-balancer. It also means that packets from the server to the client does not have go through the load-balancer, it is send directly to the client (obviously going through routers on the way). This property is the reason this mode is called Direct Server Return.
The drawback of DSR mode is that the network configuration is a bit more complicated. The packets from the load-balancer to the server cannot rely on just ordinary routing. Since it does not rewrite the destination IP of packets from client to server, it needs to manipulate the destination address at a lower protocol layer to route the packet to the proper server or insert a tunneling layer in order to have a layer on which to put such a destination address.
All of the above methods can be layered in front of each other. That's how you can build a site scaling to hundreds of millions of users.
Solution 2:
so the LB receives only 0.001% of the total traffic?
No. All the traffic will flow through the load balancer. The load balancer will then pass the traffic through to the actual destination server.
Yes, the load balancer is a single point of failure (load balancing is not HA). If you need HA, get two load balancers.
Solution 3:
A load balancer usually works similar to a reverse proxy.
The client initiates the connection with the load balancer and sends its request to it.
The load balancer takes the request and passes it on to the backend node and waits for the reply.
It gets the reply and forwards it to the original client.
So yes, the full traffic will flow over it, both request and responds.