Inconsistent output from gcount()

I have written the following simple MRE that regenerates a bug in my program:

#include <iostream>
#include <utility>
#include <sstream>
#include <string_view>
#include <array>
#include <vector>
#include <iterator>

// this function is working fine only if string_view contains all the user provided chars and nothing extra like null bytes
std::pair< bool, std::vector< std::string > > tokenize( const std::string_view inputStr, const std::size_t expectedTokenCount )
{
    // unnecessary implementation details

    std::stringstream ss;
    ss << inputStr.data( ); // works for null-terminated strings, but not for the non-null terminated strings

    // unnecessary implementation details
}

int main( )
{
    constexpr std::size_t REQUIRED_TOKENS_COUNT { 3 };
    std::array<char, 50> input_buffer { };

    std::cin.getline( input_buffer.data( ), input_buffer.size( ) ); // user can enter at max 50 characters

    const auto [ hasExpectedTokenCount, foundTokens ] { tokenize( { input_buffer.data( ), input_buffer.size( ) }, REQUIRED_TOKENS_COUNT ) };

    for ( const auto& token : foundTokens ) // print the tokens
    {
        std::cout << '\'' << token << "' ";
    }

    std::cout << '\n';
}

This is a program for tokenization (for full code see Compiler Explorer at the link below). Also, I use GCC v11.2.

First of all, I want to avoid using data() since it's a bit less efficient.

I looked at the assembly in Compiler Explorer and apparently, data() calls strlen() so when it reaches the first null byte it stops. But what if the string_view object is not null-terminated? That's a bit concerning. So I switched to ss << inputStr;.

Secondly, when I do this ss << inputStr;, the whole 50 character buffer is inserted into ss with all of its null bytes. Below are some sample outputs that are wrong:

sample #1:

1                  2    3
'1' '2' '3                                     ' // '1' and '2' are correct, '3' has lots of null bytes

sample #2 (in this one I typed a space character after 3):

1                  2    3
'1' '2' '3' '                                     ' // an extra token consisting of 1 space char and lots of null bytes has been created!

Is there a way to fix this? What should I do now to also support non-null terminated strings? I came up with the idea of gcount() as below:

    const std::streamsize charCount { std::cin.gcount( ) };
                                                                                        // here I pass charCount instead of the size of buffer
    const auto [ hasExpectedTokenCount, foundTokens ] { tokenize( { input_buffer.data( ), charCount },
                                                                    REQUIRED_TOKENS_COUNT ) };

But the problem is that when the user enters less characters than the buffer size, gcount() returns a value that is 1 more than the actual number of entered chars (e.g. user enters 5 characters but gcount returns 6 apparently also taking '\0' into account).

This causes the last token to also have a null byte at its end:

1   2     3
'1' '2' '3 ' // see the null byte in '3 ', it's NOT a space char

How should I fix gcount's inconsistent output?

Or maybe I should change the function tokenize so that it gets rid of any '\0' at the end of the string_view and then starts to tokenize it.

It might sound like an XY problem though. But I really need help to decide what to do.

The basic problem you have is with the operator<< functions. You've tried two of them:

operator<<(ostream &, const char *) which will take characters from the pointer up to (and not including) the next NUL. As you've noted, that may be a problem if the pointer comes from a string_view without a terminating NUL.
operator<<(ostream &, const string_view &) which will take all the characters from the string_view including any NULs that may be present.

It seems that what you want to do is take characters from the string_view up to (and not including) the first NUL or the end of the string_view, whichever comes first. You can do that with find and constructing a substr up to the NUL or end:

ss << inputStr.substr(0, inputStr.find('\0'));

Springboot can't read values from properties file

C# pass a null value as a generic object rather than a type for overloaded methods

how to load assets in ejs file when you have two different main routes?

Converting XML to CSV with xmlstarlet with no success

Correct syntax for modsecurity rules for Wordpress / Elementor false positives

Why custom colors without scale_fill_identify does not work in ggplot?

Best Way to Format Time Respecting Locale and User's 24-hour Preference

org.gradle.api.UnknownDomainObjectException: KotlinJvmAndroidCompilation with name 'debug' not found

Creating a new column by extracting values from dict in Pandas

javascript: prevent page from blocking ctrl+c / ctrl+v

@Transactional(propagation=Propagation.REQUIRED)

What is the difference between inversedBy and mappedBy?

Inconsistent output from gcount()

Related

Recent Posts