How to write custom input stream in C++

I'm currently learning C++ (Coming from Java) and I'm trying to understand how to use IO streams properly in C++.

Let's say I have an Image class which contains the pixels of an image and I overloaded the extraction operator to read the image from a stream:

istream& operator>>(istream& stream, Image& image)
{
    // Read the image data from the stream into the image
    return stream;
}

So now I'm able to read an image like this:

Image image;
ifstream file("somepic.img");
file >> image;

But now I want to use the same extraction operator to read the image data from a custom stream. Let's say I have a file which contains the image in compressed form. So instead of using ifstream I might want to implement my own input stream. At least that's how I would do it in Java. In Java I would write a custom class extending the InputStream class and implementing the int read() method. So that's pretty easy. And usage would look like this:

InputStream stream = new CompressedInputStream(new FileInputStream("somepic.imgz"));
image.read(stream);

So using the same pattern maybe I want to do this in C++:

Image image;
ifstream file("somepic.imgz");
compressed_stream stream(file);
stream >> image;

But maybe that's the wrong way, don't know. Extending the istream class looks pretty complicated and after some searching I found some hints about extending streambuf instead. But this example looks terribly complicated for such a simple task.

So what's the best way to implement custom input/output streams (or streambufs?) in C++?

Solution

Some people suggested not using iostreams at all and to use iterators, boost or a custom IO interface instead. These may be valid alternatives but my question was about iostreams. The accepted answer resulted in the example code below. For easier reading there is no header/code separation and the whole std namespace is imported (I know that this is a bad thing in real code).

This example is about reading and writing vertical-xor-encoded images. The format is pretty easy. Each byte represents two pixels (4 bits per pixel). Each line is xor'd with the previous line. This kind of encoding prepares the image for compression (usually results in lot of 0-bytes which are easier to compress).

#include <cstring>
#include <fstream>

using namespace std;

/*** vxor_streambuf class ******************************************/

class vxor_streambuf: public streambuf
{
public:
    vxor_streambuf(streambuf *buffer, const int width) :
        buffer(buffer),
        size(width / 2)
    {
        previous_line = new char[size];
        memset(previous_line, 0, size);
        current_line = new char[size];
        setg(0, 0, 0);
        setp(current_line, current_line + size);
    }

    virtual ~vxor_streambuf()
    {
        sync();
        delete[] previous_line;
        delete[] current_line;
    }

    virtual streambuf::int_type underflow()
    {
        // Read line from original buffer
        streamsize read = buffer->sgetn(current_line, size);
        if (!read) return traits_type::eof();

        // Do vertical XOR decoding
        for (int i = 0; i < size; i += 1)
        {
            current_line[i] ^= previous_line[i];
            previous_line[i] = current_line[i];
        }

        setg(current_line, current_line, current_line + read);
        return traits_type::to_int_type(*gptr());
    }

    virtual streambuf::int_type overflow(streambuf::int_type value)
    {
        int write = pptr() - pbase();
        if (write)
        {
            // Do vertical XOR encoding
            for (int i = 0; i < size; i += 1)
            {
                char tmp = current_line[i];
                current_line[i] ^= previous_line[i];
                previous_line[i] = tmp;
            }

            // Write line to original buffer
            streamsize written = buffer->sputn(current_line, write);
            if (written != write) return traits_type::eof();
        }

        setp(current_line, current_line + size);
        if (!traits_type::eq_int_type(value, traits_type::eof())) sputc(value);
        return traits_type::not_eof(value);
    };

    virtual int sync()
    {
        streambuf::int_type result = this->overflow(traits_type::eof());
        buffer->pubsync();
        return traits_type::eq_int_type(result, traits_type::eof()) ? -1 : 0;
    }

private:
    streambuf *buffer;
    int size;
    char *previous_line;
    char *current_line;
};


/*** vxor_istream class ********************************************/

class vxor_istream: public istream
{
public:
    vxor_istream(istream &stream, const int width) :
        istream(new vxor_streambuf(stream.rdbuf(), width)) {}

    virtual ~vxor_istream()
    {
        delete rdbuf();
    }
};


/*** vxor_ostream class ********************************************/

class vxor_ostream: public ostream
{
public:
    vxor_ostream(ostream &stream, const int width) :
        ostream(new vxor_streambuf(stream.rdbuf(), width)) {}

    virtual ~vxor_ostream()
    {
        delete rdbuf();
    }
};


/*** Test main method **********************************************/

int main()
{
    // Read data
    ifstream infile("test.img");
    vxor_istream in(infile, 288);
    char data[144 * 128];
    in.read(data, 144 * 128);
    infile.close();

    // Write data
    ofstream outfile("test2.img");
    vxor_ostream out(outfile, 288);
    out.write(data, 144 * 128);
    out.flush();
    outfile.close();

    return 0;
}

The proper way to create a new stream in C++ is to derive from std::streambuf and to override the underflow() operation for reading and the overflow() and sync() operations for writing. For your purpose you'd create a filtering stream buffer which takes another stream buffer (and possibly a stream from which the stream buffer can be extracted using rdbuf()) as argument and implements its own operations in terms of this stream buffer.

The basic outline of a stream buffer would be something like this:

class compressbuf
    : public std::streambuf {
    std::streambuf* sbuf_;
    char*           buffer_;
    // context for the compression
public:
    compressbuf(std::streambuf* sbuf)
        : sbuf_(sbuf), buffer_(new char[1024]) {
        // initialize compression context
    }
    ~compressbuf() { delete[] this->buffer_; }
    int underflow() {
        if (this->gptr() == this->egptr()) {
            // decompress data into buffer_, obtaining its own input from
            // this->sbuf_; if necessary resize buffer
            // the next statement assumes "size" characters were produced (if
            // no more characters are available, size == 0.
            this->setg(this->buffer_, this->buffer_, this->buffer_ + size);
        }
        return this->gptr() == this->egptr()
             ? std::char_traits<char>::eof()
             : std::char_traits<char>::to_int_type(*this->gptr());
    }
};

How underflow() looks exactly depends on the compression library being used. Most libraries I have used keep an internal buffer which needs to be filled and which retains the bytes which are not yet consumed. Typically, it is fairly easy to hook the decompression into underflow().

Once the stream buffer is created, you can just initialize an std::istream object with the stream buffer:

std::ifstream fin("some.file");
compressbuf   sbuf(fin.rdbuf());
std::istream  in(&sbuf);

If you are going to use the stream buffer frequently, you might want to encapsulate the object construction into a class, e.g., icompressstream. Doing so is a bit tricky because the base class std::ios is a virtual base and is the actual location where the stream buffer is stored. To construct the stream buffer before passing a pointer to a std::ios thus requires jumping through a few hoops: It requires the use of a virtual base class. Here is how this could look roughly:

struct compressstream_base {
    compressbuf sbuf_;
    compressstream_base(std::streambuf* sbuf): sbuf_(sbuf) {}
};
class icompressstream
    : virtual compressstream_base
    , public std::istream {
public:
    icompressstream(std::streambuf* sbuf)
        : compressstream_base(sbuf)
        , std::ios(&this->sbuf_)
        , std::istream(&this->sbuf_) {
    }
};

(I just typed this code without a simple way to test that it is reasonably correct; please expect typos but the overall approach should work as described)


boost (which you should have already if you're serious about C++), has a whole library dedicated to extending and customizing IO streams: boost.iostreams

In particular, it already has decompressing streams for a few popular formats (bzip2, gzlib, and zlib)

As you saw, extending streambuf may be an involving job, but the library makes it fairly easy to write your own filtering streambuf if you need one.


Don't, unless you want to die a terrible death of hideous design. IOstreams are the worst component of the Standard library - even worse than locales. The iterator model is much more useful, and you can convert from stream to iterator with istream_iterator.