gets() function in C

I need help again! I thought it is pretty cool to use the gets() function because it is like the scanf() wherein I could get an input with whitespace. But I read in one of the threads (student info file handling) that it is not good to use because according to them, it is a devil's tool for creating buffer overflows (which I don't understand)

If I use the gets() function, I could do this. ENTER YOUR NAME: Keanu Reeves.

If I use the scanf(), I could only do this. ENTER YOUR NAME: Keanu

So I heed their advice and replaced all my gets() code with fgets(). The problem is now some of my codes are not working anymore...are there any functions other than gets() and fgets() which could read the whole line and which ignores the whitespace.


it is a devil's tool for creating buffer overflows

Because gets does not take a length parameter, it doesn't know how large your input buffer is. If you pass in a 10-character buffer and the user enters 100 characters -- well, you get the point.

fgets is a safer alternative to gets because it takes the buffer length as a parameter, so you can call it like this:

fgets(str, 10, stdin);

and it will read in at most 9 characters.

the problem is now some of my codes are not working anymore

This is possibly because fgets also stores the final newline (\n) character in your buffer -- if your code is not expecting this, you should remove it manually:

int len = strlen(str);
if (len > 0 && str[len-1] == '\n')
  str[len-1] = '\0';

As other responses have noted, gets() doesn't check the buffer space. In addition to accidental overflow problems, this weakness can be used by malicious users to create all sorts of havoc.

One of the first widespread worms, released in 1988, used gets() to propogate itself throughout the internet. Here's an interesting excerpt from Expert C Programming by Peter Van Der Linden which discusses how it worked:

The Early Bug gets() the Internet Worm

The problems in C are not confined to just the language. Some routines in the standard library have unsafe semantics. This was dramatically demonstrated in November 1988 by the worm program that wriggled through thousands of machines on the Internet network. When the smoke had cleared and the investigations were complete, it was determined that one way the worm had propagated was through a weakness in the finger daemon, which accepts queries over the network about who is currently logged in. The finger daemon, in.fingerd, used the standard I/O routine gets().

The nominal task of gets() is to read in a string from a stream. The caller tells it where to put the incoming characters. But gets() does not check the buffer space; in fact, it can't check the buffer space. If the caller provides a pointer to the stack, and more input than buffer space, gets() will happily overwrite the stack. The finger daemon contained the code:

main(argc, argv)
char *argv[];
{
char line[512];
...
gets(line);

Here, line is a 512-byte array allocated automatically on the stack. When a user provides more input than that to the finger daemon, the gets() routine will keep putting it on the stack. Most architectures are vulnerable to overwriting an existing entry in the middle of the stack with something bigger, that also overwrites neighboring entries. The cost of checking each stack access for size and permission would be prohibitive in software. A knowledgeable malefactor can amend the return address in the procedure activation record on the stack by stashing the right binary patterns in the argument string. This will divert the flow of execution not back to where it came from, but to a special instruction sequence (also carefully deposited on the stack) that calls execv() to replace the running image with a shell. Voilà, you are now talking to a shell on a remote machine instead of the finger daemon, and you can issue commands to drag across a copy of the virus to another machine.

Ironically, the gets() routine is an obsolete function that provided compatibility with the very first version of the portable I/O library, and was replaced by standard I/O more than a decade ago. The manpage even strongly recommends that fgets() always be used instead. The fgets() routine sets a limit on the number of characters read, so it won't exceed the size of the buffer. The finger daemon was made secure with a two-line fix that replaced:

gets(line);

by the lines:

if (fgets(line, sizeof(line), stdin) == NULL)
exit(1);

This swallows a limited amount of input, and thus can't be manipulated into overwriting important locations by someone running the program. However, the ANSI C Standard did not remove gets() from the language. Thus, while this particular program was made secure, the underlying defect in the C standard library was not removed.


You could look at this question: Safe alternative to gets(). There are a number of useful answers.

You should be more precise about why your code does not work with fgets(). As the answers in the other question explain, you have to deal with the newline that gets() omits.


To read all the words using scanf you can do it like this

Example :

printf("Enter name: ");

scanf("%[^\n]s",name);       //[^\n] is the trick

You can use scanf to mimic gets. It's not pretty though.

#include <stdio.h>

#define S_HELPER(X) # X
#define STRINGIZE(X) S_HELPER(X)
#define MAX_NAME_LEN 20

int flushinput(void) {
  int ch;
  while (((ch = getchar()) != EOF) && (ch != '\n')) /* void */;
  return ch;
}

int main(void) {
  char name[MAX_NAME_LEN + 1] = {0};

  while (name[0] != '*') {
    printf("Enter a name (* to quit): ");
    fflush(stdout);
    scanf("%" STRINGIZE(MAX_NAME_LEN) "[^\n]", name); /* safe gets */
    if (flushinput() == EOF) break;
    printf("Name: [%s]\n", name);
    puts("");
  }

  return 0;
}

You're much better off reading with fgets and parsing (if needed) with sscanf.


EDIT explaining the scanf call and surrounding code.

The "%[" conversion specification of scanf accepts a maximum field width that does not include the null terminator. So the array to hold the input must have 1 more character than read with scanf.

To do that with only a single constant I used the STRINGIZE macro. With this macro I can use a #define'd constant both as an array size (for the variable definition) as a string (for the specifier).

There's one more aspect that deserves mention: the flushinput. If using gets, all data is written to memory (even when the buffer overflows) up to but not including the newline. To mimic that, the scanf reads a limited number of characters up to but not including the newline and, unlike gets, keeps the newline in the input buffer. So that newline needs to be removed and that's what flushinput does.

The rest of the code was mainly to set up a testing environment.