I just came home from my exam in network-programming, and one of the question they asked us was "If you are going to stream video, would you use TCP or UDP? Give an explanation for both stored video and live video-streams". To this question they simply expected a short answer of TCP for stored video and UDP for live video, but I thought about this on my way home, and is it necessarily better to use UDP for streaming live video? I mean, if you have the bandwidth for it, and say you are streaming a soccer match, or concert for that matter, do you really need to use UDP?

Lets say that while you are streaming this concert or whatever using TCP you start losing packets (something bad happened in some network between you and the sender), and for a whole minute you don't get any packets. The video-stream will pause, and after the minute is gone packets start to get through again (IP found a new route for you). What would then happen is that TCP would retransmit the minute you lost and continue sending you the live stream. As an assumption the bandwidth is higher than the bit-rate on the stream, and the ping is not too high, so in a short amount of time, the one minute you lost will act as a buffer for the stream for you, that way, if packet-loss happens again, you won't notice.

Now, I can think of some appliances where this wouldn't be a good idea, like for instance video-conferences, where you need to always be at the end of the stream, because delay during a video-chat is just horrible, but during a soccer-match, or a concert what does it matter if you are a single minute behind the stream? Plus, you are guaranteed that you get all the data and it would be better to save for later viewing when it's coming in without any errors.

So this brings me to my question. Are there any drawbacks that I don't know of about using TCP for live-streaming? Or should it really be, that if you have the bandwidth for it you should go for TCP given that it is "nicer" to the network (flow-control)?


Solution 1:

Drawbacks of using TCP for live video:

  1. As you mentioned, TCP buffers the unacknowledged segments for every client. In some cases this is undesirable, such as TCP streaming for very popular live events: your list of simultaneous clients (and buffering requirements) are large in this case. Pre-recorded video-casts typically don't have as much of a problem with this because viewers tend to stagger their replay activity.

  2. TCP's delivery guarantees are a blocking function which isn't helpful in interactive conversations. Assume your network connection drops for 15 seconds. When we miss part of a conversation, we naturally ask the person to repeat (or the other party will proactively repeat if it seems like you missed something). UDP doesn't care if you missed part of a conversation for the last 15 seconds; it keeps working as if nothing happened. On the other hand, the app could be designed for TCP to replay the last 15 seconds (and the person on the other end may not want or know about that). Such a replay by TCP aggravates the problem, and makes it more difficult to stay in sync with other parties in the conversation. Comparing TCP and UDP’s behavior in the face of packet loss, one could say that it’s easier for UDP to stay in sync with the state of an interactive conversation.

  3. IP multicast significantly reduces video bandwidth requirements for large audiences; multicast requires UDP (and is incompatible with TCP). Note - multicast is generally restricted to private networks. Please note that multicast over the internet is not common. I would also point out that operating multicast networks is more complicated than operating typical unicast networks.

FYI, please don't use the word "packages" when describing networks. Networks send "packets".

Solution 2:

but during a soccer-match, or a concert what does it matter if you are a single minute behind the stream?

To some soccer fans, quite a bit. It has been remarked that delays of even a few seconds present in digital video streams due to encoding (or whatever) can be very annoying when, during high-profile events such as world cup matches, you can hear the cheers and groans from the guys next door (who are watching an undelyed analog program) before you get to see the game moves that caused them.

I think that to someone caring a lot about sports (and those are the biggest group of paying customers for digital TV, at least here in Germany), being a minute behind in a live video stream would be completely unacceptable (As in, they'd switch to your competitor where this doesn't happen).

Solution 3:

Usually a video stream is somewhat fault tolerant. So if some packages get lost (due to some router along the way being overloaded, for example), then it will still be able to display the content, but with reduced quality.

If your live stream was using TCP/IP, then it would be forced to wait for those dropped packages before it could continue processing newer data.

That's doubly bad:

  • old data be re-transmitted (that's probably for a frame that was already displayed and therefore worthless) and
  • new data can't arrive until after old data was re-transmitted

If your goal is to display as up-to-date information as possible (and for a live-stream you usually want to be up-to-date, even if your frames look a bit worse), then TCP will work against you.

For a recorded stream the situation is slightly different: you'll probably be buffering a lot more (possibly several minutes!) and would rather have data re-transmitted than have some artifacts due to lost packages. In this case TCP is a good match (this could still be implemented in UDP, of course, but TCP doesn't have as much drawbacks as for the live stream case).

Solution 4:

There are some use cases suitable to UDP transport and others suitable to TCP transport.

The use case also dictates encoding settings for the video. When broadcasting soccer match focus is on quality and for video conference focus is on latency.

When using multicast to deliver video to your customers then UDP is used.

Requirement for multicast is expensive networking hardware between broadcasting server and customer. In practice this means if your company owns network infrastructure you can use UDP and multicast for live video streaming. Even then quality-of-service is also implemented to mark video packets and prioritize them so no packet loss happens.

Multicast will simplify broadcasting software because network hardware will handle distributing packets to customers. Customers subscribe to multicast channels and network will reconfigure to route packets to new subscriber. By default all channels are available to all customers and can be optimally routed.

This workflow places dificulty on authorization process. Network hardware does not differentiate subscribed users from other users. Solution to authorization is in encrypting video content and enabling decryption in player software when subscription is valid.

Unicast (TCP) workflow allows server to check client's credentials and only allow valid subscriptions. Even allow only certain number of simultaneous connections.

Multicast is not enabled over internet.

For delivering video over internet TCP must be used. When UDP is used developers end up re-implementing packet re-transmission, for eg. Bittorrent p2p live protocol.

"If you use TCP, the OS must buffer the unacknowledged segments for every client. This is undesirable, particularly in the case of live events".

This buffer must exist in some form. Same is true for jitter buffer on player side. It is called "socket buffer" and server software can know when this buffer is full and discard proper video frames for live streams. It is better to use unicast/TCP method because server software can implement proper frame dropping logic. Random missing packets in UDP case will just create bad user experience. like in this video: http://tinypic.com/r/2qn89xz/9

"IP multicast significantly reduces video bandwidth requirements for large audiences"

This is true for private networks, Multicast is not enabled over internet.

"Note that if TCP loses too many packets, the connection dies; thus, UDP gives you much more control for this application since UDP doesn't care about network transport layer drops."

UDP also doesn't care about dropping entire frames or group-of-frames so it does not give any more control over user experience.

"Usually a video stream is somewhat fault tolerant"

Encoded video is not fault tolerant. When transmitted over unreliable transport then forward error correction is added to video container. Good example is MPEG-TS container used in satellite video broadcast that carry several audio, video, EPG, etc. streams. This is necessary as satellite link is not duplex communication, meaning receiver can't request re-transmission of lost packets.

When you have duplex communication available it is always better to re-transmit data only to clients having packet loss then to include overhead of forward-error-correction in stream sent to all clients.

In any case lost packets are unacceptable. Dropped frames are ok in exceptional cases when bandwidth is hindered.

The result of missing packets are artifacts like this one: artifacts

Some decoders can break on streams missing packets in critical places.

Solution 5:

I recommend you to look at new p2p live protocol Bittorent Live.

As for streaming it's better to use UDP, first because it lowers the load on servers, but mostly because you can send packets with multicast, it's simpler than sending it to each connected client.