changing the delimiter for cin (c++)

I've redirected "cin" to read from a file stream cin.rdbug(inF.rdbug()) When I use the extraction operator it reads until it reaches a white space character.

Is it possible to use another delimiter? I went through the api in cplusplus.com, but didn't find anything.


Solution 1:

It is possible to change the inter-word delimiter for cin or any other std::istream, using std::ios_base::imbue to add a custom ctype facet.

If you are reading a file in the style of /etc/passwd, the following program will read each :-delimited word separately.

#include <locale>
#include <iostream>


struct colon_is_space : std::ctype<char> {
  colon_is_space() : std::ctype<char>(get_table()) {}
  static mask const* get_table()
  {
    static mask rc[table_size];
    rc[':'] = std::ctype_base::space;
    rc['\n'] = std::ctype_base::space;
    return &rc[0];
  }
};

int main() {
  using std::string;
  using std::cin;
  using std::locale;

  cin.imbue(locale(cin.getloc(), new colon_is_space));

  string word;
  while(cin >> word) {
    std::cout << word << "\n";
  }
}

Solution 2:

For strings, you can use the std::getline overloads to read using a different delimiter.

For number extraction, the delimiter isn't really "whitespace" to begin with, but any character invalid in a number.

Solution 3:

This is an improvement on Robᵩ's answer, because that is the right one (and I'm disappointed that it hasn't been accepted.)

What you need to do is change the array that ctype looks at to decide what a delimiter is.

In the simplest case you could create your own:

const ctype<char>::mask foo[ctype<char>::table_size] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ctype_base::space};

On my machine '\n' is 10. I've set that element of the array to the delimiter value: ctype_base::space. A ctype initialized with foo would only delimit on '\n' not ' ' or '\t'.

Now this is a problem because the array passed into ctype defines more than just what a delimiter is, it also defines leters, numbers, symbols, and some other junk needed for streaming. (Ben Voigt's answer touches on this.) So what we really want to do is modify a mask, not create one from scratch.

That can be accomplished like this:

const auto temp = ctype<char>::classic_table();
vector<ctype<char>::mask> bar(temp, temp + ctype<char>::table_size);

bar[' '] ^= ctype_base::space;
bar['\t'] &= ~(ctype_base::space | ctype_base::cntrl);
bar[':'] |= ctype_base::space;

A ctype initialized with bar would delimit on '\n' and ':' but not ' ' or '\t'.

You go about setting up cin, or any other istream, to use your custom ctype like this:

cin.imbue(locale(cin.getloc(), new ctype<char>(data(bar))));

You can also switch between ctypes and the behavior will change mid-stream:

cin.imbue(locale(cin.getloc(), new ctype<char>(foo)));

If you need to go back to default behavior, just do this:

cin.imbue(locale(cin.getloc(), new ctype<char>));

Live example