difference between %ms and %s scanf
The C Standard does not define such an optional character in the scanf()
formats.
The GNU lib C, does define an optional a
indicator this way (from the man page for scanf
):
An optional
a
character. This is used with string conversions, and relieves the caller of the need to allocate a corresponding buffer to hold the input: instead,scanf()
allocates a buffer of sufficient size, and assigns the address of this buffer to the corresponding pointer argument, which should be a pointer to achar *
variable (this variable does not need to be initialized before the call).The caller should subsequently
free
this buffer when it is no longer required. This is a GNU extension; C99 employs thea
character as a conversion specifier (and it can also be used as such in the GNU implementation).
The NOTES section of the man page says:
The
a
modifier is not available if the program is compiled withgcc -std=c99
orgcc -D_ISOC99_SOURCE
(unless_GNU_SOURCE
is also specified), in which case thea
is interpreted as a specifier for floating-point numbers (see above).Since version 2.7, glibc also provides the
m
modifier for the same purpose as the a modifier. Them
modifier has the following advantages:
It may also be applied to
%c
conversion specifiers (e.g.,%3mc
).It avoids ambiguity with respect to the
%a
floating-point conversion specifier (and is unaffected bygcc -std=c99
etc.)It is specified in the upcoming revision of the POSIX.1 standard.
The online linux manual page at http://linux.die.net/man/3/scanf only documents this option as:
An optional 'm' character. This is used with string conversions (
%s
,%c
,%[
), and relieves the caller of the need to allocate a corresponding buffer to hold the input: instead,scanf()
allocates a buffer of sufficient size, and assigns the address of this buffer to the corresponding pointer argument, which should be a pointer to achar *
variable (this variable does not need to be initialized before the call). The caller should subsequentlyfree(3)
this buffer when it is no longer required.
The Posix standard documents this extension in its POSIX.1-2008 edition (see http://pubs.opengroup.org/onlinepubs/9699919799/functions/fscanf.html ):
The
%c
,%s
, and%[
conversion specifiers shall accept an optional assignment-allocation characterm
, which shall cause a memory buffer to be allocated to hold the string converted including a terminating null character. In such a case, the argument corresponding to the conversion specifier should be a reference to a pointer variable that will receive a pointer to the allocated buffer. The system shall allocate a buffer as ifmalloc()
had been called. The application shall be responsible for freeing the memory after usage. If there is insufficient memory to allocate a buffer, the function shall seterrno
to [ENOMEM
] and a conversion error shall result. If the function returnsEOF
, any memory successfully allocated for parameters using assignment-allocation characterm
by this call shall be freed before the function returns.
Using this extension, you could write:
char *p;
scanf("%ms", &p);
Causing scanf
to parse a word from standard input and allocate enough memory to store its characters plus a terminating '\0'
. A pointer to the allocated array would be stored into p
and scanf()
would return 1
, unless no non whitespace characters can be read from stdin
.
It is entirely possible that other systems use m
for similar semantics or for something else entirely. Non-standard extensions are non portable and should be used very carefully, documented as such, in circumstances where a standard approach is cumbersome impractical or altogether impossible.
Note that parsing a word of arbitrary size is indeed impossible with the standard version of scanf()
:
You can parse a word with a maximum size and should specify the maximum number of characters to store before the '\0'
:
char buffer[20];
scanf("%19s", buffer);
But this does not tell you how many more characters are available to parse in standard input. In any case, not passing the maximum number of characters may invoke undefined behavior if the input is long enough, and specially crafted input may even be used by an attacker to compromise your program:
char buffer[20];
scanf("%s", buffer); // potential undefined behavior,
// that could be exploited by an attacker.