What is a stream?

What is a stream in the programming world? Why do we need it?

Kindly explain with the help of an analogy, if possible.


Solution 1:

A stream represents a sequence of objects (usually bytes, but not necessarily so), which can be accessed in sequential order. Typical operations on a stream:

  • read one byte. Next time you read, you'll get the next byte, and so on.
  • read several bytes from the stream into an array
  • seek (move your current position in the stream, so that next time you read you get bytes from the new position)
  • write one byte
  • write several bytes from an array into the stream
  • skip bytes from the stream (this is like read, but you ignore the data. Or if you prefer it's like seek but can only go forwards.)
  • push back bytes into an input stream (this is like "undo" for read - you shove a few bytes back up the stream, so that next time you read that's what you'll see. It's occasionally useful for parsers, as is:
  • peek (look at bytes without reading them, so that they're still there in the stream to be read later)

A particular stream might support reading (in which case it is an "input stream"), writing ("output stream") or both. Not all streams are seekable.

Push back is fairly rare, but you can always add it to a stream by wrapping the real input stream in another input stream that holds an internal buffer. Reads come from the buffer, and if you push back then data is placed in the buffer. If there's nothing in the buffer then the push back stream reads from the real stream. This is a simple example of a "stream adaptor": it sits on the "end" of an input stream, it is an input stream itself, and it does something extra that the original stream didn't.

Stream is a useful abstraction because it can describe files (which are really arrays, hence seek is straightforward) but also terminal input/output (which is not seekable unless buffered), sockets, serial ports, etc. So you can write code which says either "I want some data, and I don't care where it comes from or how it got here", or "I'll produce some data, and it's entirely up to my caller what happens to it". The former takes an input stream parameter, the latter takes an output stream parameter.

Best analogy I can think of is that a stream is a conveyor belt coming towards you or leading away from you (or sometimes both). You take stuff off an input stream, you put stuff on an output stream. Some conveyors you can think of as coming out of a hole in the wall - they aren't seekable, reading or writing is a one-time-only deal. Some conveyors are laid out in front of you, and you can move along choosing whereabouts in the stream you want to read/write - that's seeking.

As IRBMe says, though, it's best to think of a stream in terms of the operations it offers (which vary from implementation to implementation, but have a lot in common) rather than by a physical analogy. Streams are "things you can read or write". When you start connecting up stream adaptors, you can think of them as a box with a conveyor in, and a conveyor out, that you connect to other streams and then the box performs some transformation on the data (zipping it, or changing UNIX linefeeds to DOS ones, or whatever). Pipes are another thorough test of the metaphor: that's where you create a pair of streams such that anything you write into one can be read out of the other. Think wormholes :-)

Solution 2:

A stream is already a metaphor, an analogy, so there's really no need to provide another one. You can think of it basically as a pipe with a flow of water in it where the water is actually data and the pipe is the stream. I suppose it's kind of a 2-way pipe if the stream is bi-directional. It's basically a common abstraction that is placed upon things where there is a flow or sequence of data in one or both directions.

In languages such as C#, VB.Net, C++, Java etc., the stream metaphor is used for many things. There are file streams, in which you open a file and can read from the stream or write to it continuously; There are network streams where reading from and writing to the stream reads from and writes to an underlying established network connection. Streams for writing only are typically called output streams, as in this example, and similarly, streams that are for reading only are called input streams, as in this example.

A stream can perform transformation or encoding of data (an SslStream in .Net, for example, will eat up the SSL negotiation data and hide it from you; A TelnetStream might hide the Telnet negotiations from you, but provide access to the data; A ZipOutputStream in Java allows you to write to files in a zip archive without having to worry about the internals of the zip file format.

Another common thing you might find is textual streams that allow you to write strings instead of bytes, or some languages provide binary streams that allow you to write primitive types. A common thing you'll find in textual streams is a character encoding, which you should be aware of.

Some streams also support random access, as in this example. A network stream, on the other hand, for obvious reasons, wouldn't.

  • MSDN gives a good overview of streams in .Net.
  • Sun also have an overview of their general OutputStream class and InputStream class.
  • In C++, here is the istream (input stream), ostream (output stream) and iostream (bidirectional stream) documentation.

UNIX like operating systems also support the stream model with program input and output, as described here.

Solution 3:

The answers given so far are excellent. I'm only providing another to highlight that a stream is not a sequence of bytes or specific to a programming language since the concept is universal (while its implementation may be unique). I often see an abundance of explanations online in terms of SQL, or C or Java, which make sense as a filestream deals with memory locations and low level operations. But they often address how to create a filestream and operate on the potential file in their given language rather than discuss the concept of a stream.

The Metaphor

As mentioned a stream is a metaphor, an abstraction of something more complex. To get your imagination working I offer some other metaphors:

  1. you want to fill an empty pool with water. one way to accomplish this is to attach a hose to a spigot, placing the end of the hose in the pool and turning on the water.

the hose is the stream

  1. similarly, if you wanted to refill your car with gas, you would go to a gas pump, insert the nozzle into your gas tank and open the valve by squeezing the locking lever.

the hose, nozzle and associated mechanisms to allow the gas to flow into your tank is the stream

  1. if you need to get to work you would start driving from your home to the office using the freeway.

the freeway is the stream

  1. if you want to have a conversation with someone you would use your ears to hear and your mouth to speak.

your ears and eyes are streams

Hopefully you notice in these examples that the stream metaphors only exist to allow something to travel through it (or on it in the case of the freeway) and do not themselves always poses the thing they are transferring. An important distinction. We don't refer to our ears as a sequence of words. A hose is still a hose if no water is flowing through it, but we have to connect it to a spigot for it do its job correctly. A car is not the only 'kind' of vehicle that can traverse a freeway.

Thus a stream can exist that has no data travelling through it as long as it is connected to a file.

Removing the Abstraction

Next, we need to answer a few questions. I'm going to use files to describe streams so... What is a file? And how do we read a file? I will attempt to answer this while maintaining a certain level of abstraction to avoid unneeded complexity and will use the concept of a file relative to a linux operating system because of its simplicity and accessibility.

What is a file?

A file is an abstraction :)

Or, as simply as I can explain, a file is one part data structure describing the file and one part data which is the actual content.

The data structure part (called an inode in UNIX/linux systems) identities important pieces of information about the content, but does not include the content itself (or a name of the file for that matter). One of the pieces of information it keeps is a memory address to where the content starts. So with a file name (or a hard link in linux), a file descriptor (a numeric file name that the operating system cares about) and a starting location in memory we have something we can call a file.

(the key takeaway is a 'file' is defined by the operating system since it is the OS that ultimately has to deal with it. and yes, files are much more complex).

So far so good. But how do we get the content of the file, say a love letter to your beau, so we can print it?

Reading a file

If we start from the result and move backwards, when we open a file on our computer its entire contents is splashed on our screen for us to read. But how? Very methodically is the answer. The content of the file itself is another data structure. Suppose an array of characters. We can also think of this as a string.

So how do we 'read' this string? By finding its location in memory and iterating through our array of characters, one character at a time until reaching an end of file character. In other words a program.

A stream is 'created' when its program is called and it has a memory location to attach to or connect to. Much like our water hose example, the hose is ineffective if it is not connected to a spigot. In the case of the stream, it must be connected to a file for it to exist.

Streams can be further refined, e.g, a stream to receive input or a stream to send a files contents to standard output. UNIX/linux connects and keeps open 3 filestreams for us right off the bat, stdin (standard input), stdout (standard output) and stderr (standard error). Streams can be built as data structures themselves or objects which allows us to perform more complex operations of the data streaming through them, like opening the stream, closing the stream or error checking the file a stream is connected to. C++'s cin is an example of a stream object.

Surely, if you so choose, you can write your own stream.

Definition

A stream is a reusable piece of code that abstracts the complexity of dealing with data while providing useful operations to perform on data.

Solution 4:

In addition to things mentioned above there is a different kind of streams - as defined in functional programming languages such as Scheme or Haskell - a possibly infinite datastructure which is generated by some function on-demand.