How does strcmp() work?

I've been looking around a fair bit for an answer. I'm going to make a series of my own string functions like my_strcmp(), my_strcat(), etc.

Does strcmp() work through each index of two arrays of characters and if the ASCII value is smaller at an identical index of two strings, that string is there alphabetically greater and therefore a 0 or 1 or 2 is returned? I guess what Im asking is, does it use the ASCII values of characters to return these results?

Any help would be greatly appreciated.

[REVISED]

OK, so I have come up with this... it works for all cases except when the second string is greater than the first.

Any tips?

int my_strcmp(char s1[], char s2[])
{   
    int i = 0;
    while ( s1[i] != '\0' )
    {
        if( s2[i] == '\0' ) { return 1; }
        else if( s1[i] < s2[i] ) { return -1; }
        else if( s1[i] > s2[i] ) { return 1; }
        i++;
    }   
    return 0;
}


int main (int argc, char *argv[])
{
    int result = my_strcmp(argv[1], argv[2]);

    printf("Value: %d \n", result);

    return 0;

}

Solution 1:

The pseudo-code "implementation" of strcmp would go something like:

define strcmp (s1, s2):
    p1 = address of first character of str1
    p2 = address of first character of str2

    while contents of p1 not equal to null:
        if contents of p2 equal to null: 
            return 1

        if contents of p2 greater than contents of p1:
            return -1

        if contents of p1 greater than contents of p2:
            return 1

        advance p1
        advance p2

    if contents of p2 not equal to null:
        return -1

    return 0

That's basically it. Each character is compared in turn an a decision is made as to whether the first or second string is greater, based on that character.

Only if the characters are identical do you move to the next character and, if all the characters were identical, zero is returned.

Note that you may not necessarily get 1 and -1, the specs say that any positive or negative value will suffice, so you should always check the return value with < 0, > 0 or == 0.

Turning that into real C would be relatively simple:

int myStrCmp (const char *s1, const char *s2) {
    const unsigned char *p1 = (const unsigned char *)s1;
    const unsigned char *p2 = (const unsigned char *)s2;

    while (*p1 != '\0') {
        if (*p2 == '\0') return  1;
        if (*p2 > *p1)   return -1;
        if (*p1 > *p2)   return  1;

        p1++;
        p2++;
    }

    if (*p2 != '\0') return -1;

    return 0;
}

Also keep in mind that "greater" in the context of characters is not necessarily based on simple ASCII ordering for all string functions.

C has a concept called 'locales' which specify (among other things) collation, or ordering of the underlying character set and you may find, for example, that the characters a, á, à and ä are all considered identical. This will happen for functions like strcoll.

Solution 2:

Here is the BSD implementation:

int
strcmp(s1, s2)
    register const char *s1, *s2;
{
    while (*s1 == *s2++)
        if (*s1++ == 0)
            return (0);
    return (*(const unsigned char *)s1 - *(const unsigned char *)(s2 - 1));
}

Once there is a mismatch between two characters, it just returns the difference between those two characters.

Solution 3:

It uses the byte values of the characters, returning a negative value if the first string appears before the second (ordered by byte values), zero if they are equal, and a positive value if the first appears after the second. Since it operates on bytes, it is not encoding-aware.

For example:

strcmp("abc", "def") < 0
strcmp("abc", "abcd") < 0 // null character is less than 'd'
strcmp("abc", "ABC") > 0 // 'a' > 'A' in ASCII
strcmp("abc", "abc") == 0

More precisely, as described in the strcmp Open Group specification:

The sign of a non-zero return value shall be determined by the sign of the difference between the values of the first pair of bytes (both interpreted as type unsigned char) that differ in the strings being compared.

Note that the return value may not be equal to this difference, but it will carry the same sign.

Solution 4:

This, from the masters themselves (K&R, 2nd ed., pg. 106):

// strcmp: return < 0 if s < t, 0 if s == t, > 0 if s > t
int strcmp(char *s, char *t) 
{
    int i;

    for (i = 0; s[i] == t[i]; i++)
        if (s[i] == '\0')
            return 0;
    return s[i] - t[i];
}