Need to know when no data appears between two token separators using strtok()
I am trying to tokenize a string but I need to know exactly when no data is seen between two tokens. e.g when tokenizing the following string "a,b,c,,,d,e
" I need to know about the two empty slots between 'd
' and 'e
'... which I am unable to find out simply using strtok()
. My attempt is shown below:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
tok=strtok(line,delim);//line contains the data
for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim))
{
if(tok)
sprintf(arr_fields[i], "%s", tok);
else
sprintf(arr_fields[i], "%s", "-");
}
Executing the above code with the aforementioned examples put characters a,b,c,d,e into first five elements of arr_fields
which is not desirable. I need the position of each character to go in specific indexes of array: i.e if there is a character missing between two characters, it should be recorded as is.
7.21.5.8 the strtok function
The standard says the following regarding strtok
:
[#3] The first call in the sequence searches the string pointed to by
s1
for the first character that is not contained in the current separator string pointed to bys2
. If no such character is found, then there are no tokens in the string pointed to bys1
and thestrtok
function returns a null pointer. If such a character is found, it is the start of the first token.
In the above quote we can read you cannot use strtok
as a solution to your specific problem, since it will treat any sequential characters found in delims
as a single token.
Am I doomed to weep in silence, or can somebody help me out?
You can easily implement your own version of strtok
that does what you want, see the snippets at the end of this post.
strtok_single
makes use of strpbrk (char const* src, const char* delims)
which will return a pointer to the first occurrence of any character in delims that is found in the null-terminated string src.
If no matching character is found the function will return NULL.
strtok_single
char *
strtok_single (char * str, char const * delims)
{
static char * src = NULL;
char * p, * ret = 0;
if (str != NULL)
src = str;
if (src == NULL)
return NULL;
if ((p = strpbrk (src, delims)) != NULL) {
*p = 0;
ret = src;
src = ++p;
} else if (*src) {
ret = src;
src = NULL;
}
return ret;
}
sample use
char delims[] = ",";
char data [] = "foo,bar,,baz,biz";
char * p = strtok_single (data, delims);
while (p) {
printf ("%s\n", *p ? p : "<empty>");
p = strtok_single (NULL, delims);
}
output
foo
bar
<empty>
baz
biz
You can't use strtok()
if that's what you want. From the man page:
A sequence of two or more contiguous delimiter characters in the parsed string is considered to be a single delimiter. Delimiter characters at the start or end of the string are ignored. Put another way: the tokens returned by strtok() are always nonempty strings.
Therefore it is just going to jump from c
to d
in your example.
You're going to have to parse the string manually or perhaps search for a CSV parsing library that would make your life easier.
Lately I was looking for a solution to the same problem and found this thread.
You can use strsep()
.
From the manual:
The strsep() function was introduced as a replacement for strtok(3), since the latter cannot handle empty fields.