What are SO_SNDBUF and SO_RCVBUF

Solution 1:

The "SO_" prefix is for "socket option", so yes, these are per-socket settings for the per-socket buffers. There are usually system-wide defaults and maximum values.

SO_RCVBUF is simpler to understand: it is the size of the buffer the kernel allocates to hold the data arriving into the given socket during the time between it arrives over the network and when it is read by the program that owns this socket. With TCP, if data arrives and you aren't reading it, the buffer will fill up, and the sender will be told to slow down (using TCP window adjustment mechanism). For UDP, once the buffer is full, new packets will just be discarded.

SO_SNDBUF, I think, only matters for TCP (in UDP, whatever you send goes directly out to the network). For TCP, you could fill the buffer either if the remote side isn't reading (so that remote buffer becomes full, then TCP communicates this fact to your kernel, and your kernel stops sending data, instead accumulating it in the local buffer until it fills up). Or it could fill up if there is a network problem, and the kernel isn't getting acknowledgements for the data it sends. It will then slow down sending data on the network until, eventually, the outgoing buffer fills up. If so, future write() calls to this socket by the application will block (or return EAGAIN if you've set the O_NONBLOCK option).

This all is best described in the Unix Network Programming book.

Solution 2:

What is their role (generally)?

Data, that you want to send over a socket, is copied to the send buffer of the socket, so your code doesn't have to wait (=block) until the data has really been sent out to the network. When the send call returns successfully, this only means that the data has been placed into the send buffer from where the protocol implementation will read it as soon as it is ready to send that data over the network.

Keep in mind that multiple sockets from multiple processes may all want to send data at the same time, yet at any time only one data packet can be send over a network line. While sending is in progress, all other senders have to wait and once the line is free, the implementation can only process one send request after another.

Data, that arrives from the network, is written to the receive buffer of the socket by the protocol implementation, where it will wait until your code is reading it from there. Otherwise all receiving would have to stop until your code has processed the incoming packet, yet your code may do other things while a packet arrives in the background and again, the interface is shared, so the system must avoid that other processes cannot receive their network data just because your process is refusing to process its own incoming data.

Are they per-socket buffers?

Yes. Every socket has its own set of buffers.

Is there a connection between Transport layer's buffers (the TCP buffer, for example) and these buffers?

I'm not sure what you mean by "TCP buffers" but if you are referring to the TCP receive and send windows, the answer is yes.

TCP will tell the other side regularly how much room is left in your receive buffer, so that the other side will never send more data than would fit into your receive buffer. If your receive buffer is full, the other side will stop sending completely until there's room again, which will be the case as soon as soon as you read some data from it.

So if you cannot read data as often as would be required to prevent your socket buffer from running full, increasing the receive buffer size can prevent that TCP connections will have to pause sending data.

On the other hand, if the send buffer is running full, the socket will not accept any more data from your code. Any attempt to send will either block or fail with an error (non-blocking socket) until there's room again.

And as TCP can only work with the data currently in the send buffer, the send buffer size also influences the sending behavior of TCP. The TCP sending strategy can depend on various factors. One of them is the amount of data that is known to be sent. If your send buffer is just 2 KB, then TCP will never see more than 2 KB for sending, even though your app may know, that much more data is going to follow. If your send buffer is 256 KB and you put 128 KB of data into it, TCP will know that it has to send 128 KB of data for this connection and this may (and most likely will) influence the sending strategy that TCP uses.

Do they have a different behaviour/role when using stream sockets (TCP) and when using connectionless sockets (UDP)?

Yes. That's because for TCP the data you send is just a stream of bytes. There is no relationship between the bytes and the packets being sent out. Sending 80 bytes could mean sending one packet with 80 bytes or sending 10 packets with each 8 bytes. TCP will decide that on its own. Same for incoming. If there are 200 bytes in your receive buffer, you cannot know how these got there, an amount of bytes you read from a TCP socket may have been transported using any number of packets. So despite transporting data in chunks over packet based networks, a TCP connection behaves like a serial line link.

UDP on the other hand sends datagrams. If you place 80 bytes into the send buffer of an UDP socket, then these 80 bytes are for sure sent out in a single UDP packet containing 80 bytes of payload data. Data is sent in exactly the same way you write it into the send buffer. If you write 80 bytes one after another, 80 packets are sent out, each containing one byte. If you tell a TCP socket to send 200 bytes but there are only 100 bytes room left in the send buffer, TCP will add 100 bytes to the buffer and let you know that 100 of your 200 bytes were added. UDP on the ohter hand will block or fail with an error, as either all 200 bytes fit or nothing fits; there is no partial fit with UDP.

Also when receiving, datagrams are stored in the UDP receive buffer, not bytes. If a TCP socket first receives 80 bytes data and then 200 bytes data, you can perform a read call that reads all 280 bytes at once. If an UDP socket first receives a datagram with 80 bytes and then a datagram with 200 bytes and you request to read 280 bytes from it, you get exactly 80 bytes, as all data returned by a read call must be from the same datagram. You cannot read across datagram borders. Also note that if you request to only read 20 bytes, you receive the first 20 bytes of the datagram and the other 60 bytes are discarded. Next time you read data, it will be from the next datagram whose size was 200 bytes.

So the difference in two sentences: TCP sockets store bytes in the socket buffers, UDP sockes store datagrams in the socket buffers. And datagrams must fit completely in the buffers, incoming datagrams that cannot fit completely into the socket buffer are silently discarded, even though the buffer had some room available.

Solution 3:

In Windows, the send buffer does have an effect in UDP. If you blast packets out faster than the network can transmit them, eventually you will fill the socket output buffer and SendTo will fail with "would block". Increasing SO_SNDBUF will help with this. I had to increase both the send and receive buffers for a test I was doing to find the maximum packet rate I could send between a Windows box and a Linux box. I could have also handled the send size by detecting the "would block" error code, sleeping a bit, and retrying. But pumping up the send buffer size was simpler. The default in Windows is 8K, which seems needlessly small in this era of PC's with GB's of RAM!