How to find leap year programmatically in C

I wrote a program in C to find whether the entered year is a leap year or not. But unfortunately its not working well. It says a year is leap and the preceding year is not leap.

#include <stdio.h>
#include <conio.h>

int yearr(int year);

void main(void) {
    int year;
    printf("Enter a year:");
    scanf("%d", &year);
    if (!yearr(year)) {
        printf("It is a leap year.");
    } else {
        printf("It is not a leap year");
    }

    getch();
}

int yearr(int year) {
    if ((year % 4 == 0) && (year / 4 != 0))
        return 1;
    else
        return 0;
}

After reading the comments I edited my code as:

#include <stdio.h>
#include <conio.h>

int yearr(int year);

void main(void) {
    int year;
    printf("Enter a year:");
    scanf("%d", &year);
    if (!yearr(year)) {
        printf("It is a leap year.");
    } else {
        printf("It is not a leap year");
    }

    getch();
}

int yearr(int year) {
    if (year % 4 == 0) {
        if (year % 400 == 0)
            return 1;
        if (year % 100 == 0)
            return 0; 
    } else
        return 0;
}

Solution 1:

Most efficient leap year test:

if ((year & 3) == 0 && ((year % 25) != 0 || (year & 15) == 0))
{
    /* leap year */
}

This code is valid in C, C++, C#, Java, and many other C-like languages. The code utilizes a single TRUE/FALSE expression that consists of three separate tests:

  • 4th year test: year & 3
  • 100th year test: year % 25
  • 400th year test: year & 15

A complete discussion of how this code works appears below, but first a discussion of Wikipedia's algorithm is called for:

Wikipedia algorithm is INEFFICIENT/UNRELIABLE

Wikipedia has published a pseudo-code algorithm (See: Wikipedia: Leap year - Algorithm) that has been subjected to constant editing, opinion, and vandalism.

DO NOT IMPLEMENT WIKIPEDIA ALGORITHM!

One of the longest-standing (and inefficient) Wikipedia algorithms appeared as follows:

if year modulo 400 is 0 then
   is_leap_year
else if year modulo 100 is 0 then
   not_leap_year
else if year modulo 4 is 0 then
   is_leap_year
else
   not_leap_year

The above algorithm is inefficient because it always performs the tests for the 400th year and 100th year even for years that would quickly fail the "4th year test" (the modulo 4 test)—which is 75% of the time! By re-ordering the algorithm to perform the 4th year test first we speed things up significantly.

"MOST-EFFICIENT" PSEUDO-CODE ALGORITHM

I provided the following algorithm to Wikipedia (more than once):

if year is not divisible by 4 then not leap year
else if year is not divisible by 100 then leap year
else if year is divisible by 400 then leap year
else not leap year

This "most-efficient" pseudo-code simply changes the order of tests so the division by 4 takes place first, followed by the less-frequently occurring tests. Because "year" does not divide by four 75-percent of the time, the algorithm ends after only one test in three out of four cases.

NOTE: I have fought various Wikipedia editors to improve the algorithm published there, arguing that many novice—and professional—programmers quickly arrive at the Wikipedia page (due to top search engine listings) and implement the Wikipedia pseudo-code without any further research. Wikipedia editors repudiated and deleted every attempt I made to improve, annotate or even merely footnote the published algorithm. Apparently, they feel finding efficiencies is the programmer's problem. That may be true, but many programmers are too hurried to perform solid research!

DISCUSSION OF "MOST-EFFICIENT" LEAP YEAR TEST

Bitwise-AND in place of modulo:

I have replaced two of the modulo operations in the Wikipedia algorithm with bitwise-AND operations. Why and how?

Performing a modulo calculation requires division. One doesn't often think twice about this when programming a PC, but when programming 8-bit microcontrollers embedded in small devices you may find that a divide function cannot be natively performed by the CPU. On such CPUs, division is an arduous process involving repetitive looping, bit shifting, and add/subtract operations that is very slow. It is very desirable to avoid.

It turns out that the modulo of powers of two can be alternately achieved using a bitwise-AND operation (see: Wikipedia: Modulo operation - Performance Issues):

x % 2^n == x & (2^n - 1)

Many optimizing compilers will convert such modulo operations to bitwise-AND for you, but less advanced compilers for smaller and less popular CPUs may not. Bitwise-AND is a single instruction on every CPU.

By replacing the modulo 4 and modulo 400 tests with & 3 and & 15 (see below: 'Factoring to reduce math') we can ensure that the fastest code results without using a much slower divide operation.

There exists no power of two that equals 100. Thus, we are forced to continue to use the modulo operation for the 100th year test, however 100 is replaced by 25 (see below).

Factoring to simplify the math:

In addition to using bitwise-AND to replace modulo operations, you may note two additional disputes between the Wikipedia algorithm and the optimized expression:

  • modulo 100 is replaced by modulo 25
  • modulo 400 is replaced by & 15

The 100th year test utilizes modulo 25 instead of modulo 100. We can do this because 100 factors out to 2 x 2 x 5 x 5. Because the 4th year test already checks for factors of 4 we can eliminate that factor from 100, leaving 25. This optimization is probably insignificant to nearly every CPU implementation (as both 100 and 25 fit in 8-bits).

The 400th year test utilizes & 15 which is equivalent to modulo 16. Again, we can do this because 400 factors out to 2 x 2 x 2 x 2 x 5 x 5. We can eliminate the factor of 25 which is tested by the 100th year test, leaving 16. We cannot further reduce 16 because 8 is a factor of 200, so removing any more factors would produce a unwanted positive for a 200th year.

The 400th year optimization is greatly important to 8-bit CPUs, first, because it avoids division; but, more important, because the value 400 is a 9-bit number which is much more difficult to deal with in an 8-bit CPU.

Short-circuit Logical AND/OR operators:

The final, and most important, optimization used are the short-circuit logical AND ('&&') and OR ('||') operators (see: Wikipedia: Short-circuit evaluation), which are implemented in most C-like languages. Short-circuit operators are so named because they do not bother to evaluate the expression on the right side if the expression on the left side, by itself, dictates the outcome of the operation.

For example: If the year is 2003, then year & 3 == 0 is false. There is no way that the tests on the right side of the logical AND can make the outcome true, so nothing else gets evaluated.

By performing the 4th year test first, only the 4th year test (a simple bitwise-AND) is evaluated three-quarters (75 percent) of the time. This speeds up program execution greatly, especially since it avoids the division necessary for the 100th year test (the modulo 25 operation).

NOTE ON PARENTHESES PLACEMENT

One commenter felt parentheses were misplaced in my code and suggested the sub-expressions be regrouped around the logical AND operator (instead of around the logical OR), as follows:

if (((year & 3) == 0 && (year % 25) != 0) || (year & 15) == 0) { /* LY */ }

The above is incorrect. The logical AND operator has higher precedence than logical OR and will be evaluated first with or without the new parentheses. Parentheses around the logical AND arguments has no effect. This might lead one to eliminate the sub-groupings entirely:

if ((year & 3) == 0 && (year % 25) != 0 || (year & 15) == 0) { /* LY */ }

But, in both cases above, the right side of the logical OR (the 400th year test) is evaluated almost every time (i.e., years not divisible by 4 and 100). Thus, a useful optimization has been mistakenly eliminated.

The parentheses in my original code implement the most optimized solution:

if ((year & 3) == 0 && ((year % 25) != 0 || (year & 15) == 0)) { /* LY */ }

Here, the logical OR is only evaluated for years divisible by 4 (because of the short-circuit AND). The right side of the logical OR is only evaluated for years divisible by 4 and 100 (because of the short-circuit OR).

NOTE FOR C/C++ PROGRAMMERS

C/C++ programmers might feel this expression is more optimized:

if (!(year & 3) && ((year % 25) || !(year & 15))) { /* LY */ }

This is not more optimized! While the explicit == 0 and != 0 tests are removed, they become implicit and are still performed. Worse, the code is no longer valid in strongly-typed languages like C# where year & 3 evaluates to an int, but the logical AND (&&), OR (||) and NOT (!) operators require bool arguments.

Solution 2:

Your logic to determine a leap year is wrong. This should get you started (from Wikipedia):

if year modulo 400 is 0
       then is_leap_year
else if year modulo 100 is 0
       then not_leap_year
else if year modulo 4 is 0
       then is_leap_year
else
       not_leap_year

x modulo y means the remainder of x divided by y. For example, 12 modulo 5 is 2.

Solution 3:

Many answers talk about performance. None shows any measurement. A nice quote from gcc's documentation on __builtin_expect says this:

Programmers are notoriously bad at predicting how their programs actually perform.

Most implementations use short-circuiting of && and || as an optimization tool and go on to prescribe the "right" order for the divisibility checks for "best" performance. It is worth mentioning that short-circuiting is not necessarily an optimization feature.

Agreed, some checks might give a definitive answer (e.g. year is not multiple of 4) and make useless the subsequent tests. It seems all reasonable to immediately return at this point rather than keeping going with needless calculations. On the other hand, early returns introduce branches and this might decrease performance. (See this legendary post.) The trade-off between a branch misprediction and an unnecessary calculation is very hard to guess. Indeed, it depends on the hardware, on input data, on the exactly assembly instructions emitted by the compiler (which might change from one version to another), etc.

The sequel shall show measurements obtained in quick-bench.com. In all cases, we measure the time taken to check whether each value stored in an std::array<int, 65536> is a leap year or not. The values are pseudo-random, uniformly distributed in the interval [-400, 399]. More precisely, they are generated by the following code:

auto const years = [](){
  std::uniform_int_distribution<int> uniform_dist(-400, 399);
  std::mt19937 rng;
  std::array<int, 65536> years;
  for (auto& year : years)
    year = uniform_dist(rng);
  return years;
}();

Even when possible, I do not replace % with & (e.g. year & 3 == 0 instead of year % 4 == 0). I trust the compiler (GCC-9.2 at -O3) will do that for me. (It does.)

4-100-400 tests

Checks for leap years are, usually, written in terms of three divisibility tests: year % 4 == 0, year % 100 != 0 and year % 400 == 0. The following is a list of implementations covering all possible orders in which these checks can appear. Each implementation is prefixed with a corresponding label. (Some orders allow two different implementations, in which case, the 2nd one gets a suffix b.) They are:

b4_100_400  : year % 4 == 0 && (year % 100 != 0 || year % 400 == 0)
b4_100_400b : (year % 4 == 0 && year % 100 != 0) || year % 400 == 0
b4_400_100  : year % 4 == 0 && (year % 400 == 0 || year % 100 != 0)
b100_4_400  : (year % 100 != 0 && year % 4 == 0) || year % 400 == 0
b100_400_4  : (year % 100 != 0 || year % 400 == 0) && year % 4 == 0
b400_4_100  : year % 400 == 0 || (year % 4 == 0 && year % 100 != 0)
b400_100_4  : year % 400 == 0 || (year % 100 != 0 && year % 4 == 0)
b400_100_4b : (year % 400 == 0 || year % 100 != 0) && year % 4 == 0

The results are shown below. (See them live.)

4-100-400 tests

Contrarily to what many have advised, checking divisibility by 4 first does not seem to be the best thing to do. At the contrary, at least in these measurements, the three first bars are among the worst five. The best is

b100_400_4  : (year % 100 != 0 || year % 400 == 0) && year % 4 == 0

4-25-16 tests

Another provided tip (which I must confess I thought was a good one) is replacing year % 100 != 0 with year % 25 != 0. This doesn't affect correctness since we also check year % 4 == 0. (If a number is multiple of 4 then divisibility by 100 is equivalent to divisibility by 25.) Similarly, year % 400 == 0 can be replaced with year % 16 == 0 due to the presence of divisibility check by 25.

As in the last section, we have 8 implementations using 4-25-16 divisibility checks:

b4_25_16  : year % 4 == 0 && (year % 25 != 0 || year % 16 == 0)
b4_25_16b : (year % 4 == 0 && year % 25 != 0) || year % 16 == 0
b4_16_25  : year % 4 == 0 && (year % 16 == 0 || year % 25 != 0)
b25_4_16  : (year % 25 != 0 && year % 4 == 0) || year % 16 == 0
b25_16_4  : (year % 25 != 0 || year % 16 == 0) && year % 4 == 0
b16_4_25  : year % 16 == 0 || (year % 4 == 0 && year % 25 != 0)
b16_25_4  : year % 16 == 0 || (year % 25 != 0 && year % 4 == 0)
b16_25_4b : (year % 16 == 0 || year % 25 != 0) && year % 4 == 0

Results (live here):

4-25-16 tests

Again, checking divisibility by 4 first does not look a good idea. In this round the fast is

b25_16_4  : (year % 25 != 0 || year % 16 == 0) && year % 4 == 0

4-100-400 tests (no branching)

As previously mentioned branching might degrade performance. In particular, short-circuiting might be counterproductive. When this is the case, a classical trick is replacing logical operators && and || with their bit-wise counterparts & and |. The implementations become:

nb4_100_400  : (year % 4 == 0) & ((year % 100 != 0) | (year % 400 == 0))
nb4_100_400b : ((year % 4 == 0) & (year % 100 != 0)) | (year % 400 == 0)
nb4_400_100  : (year % 4 == 0) & ((year % 400 == 0) | (year % 100 != 0))
nb100_4_400  : ((year % 100 != 0) & (year % 4 == 0)) | (year % 400 == 0)
nb100_400_4  : ((year % 100 != 0) | (year % 400 == 0)) & (year % 4 == 0)
nb400_4_100  : (year % 400 == 0) | ((year % 4 == 0) & (year % 100 != 0))
nb400_100_4  : (year % 400 == 0) | ((year % 100 != 0) & (year % 4 == 0))
nb400_100_4b : ((year % 400 == 0) | (year % 100 != 0)) & (year % 4 == 0)

Results (live here):

4-100-400 tests (no branching)

A noticeable feature is that performance variation is not as pronounced as in the branching case and it makes difficult to declare a winner. We pick this one:

nb100_400_4  : ((year % 100 != 0) | (year % 400 == 0)) & (year % 4 == 0)

4-25-16 tests (no branching)

To complete the exercise, we consider the no-branching case with 4-25-16 divisibility tests:

nb4_25_16  : (year % 4 == 0) & ((year % 25 != 0) | (year % 16 == 0))
nb4_25_16b : ((year % 4 == 0) & (year % 25 != 0)) | (year % 16 == 0)
nb4_16_25  : (year % 4 == 0) & ((year % 16 == 0) | (year % 25 != 0))
nb25_4_16  : ((year % 25 != 0) & (year % 4 == 0)) | (year % 16 == 0)
nb25_16_4  : ((year % 25 != 0) | (year % 16 == 0)) & (year % 4 == 0)
nb16_4_25  : (year % 16 == 0) | ((year % 4 == 0) & (year % 25 != 0))
nb16_25_4  : (year % 16 == 0) | ((year % 25 != 0) & (year % 4 == 0))
nb16_25_4b : ((year % 16 == 0) | (year % 25 != 0)) & (year % 4 == 0)

Results (live here):

4-25-16 tests (no branching)

Once again, it's difficult to define the best and we select this one:

nb25_16_4  : ((year % 25 != 0) | (year % 16 == 0)) & (year % 4 == 0)

Champions League

It's now time to pick the best of each previous section and compare them:

b100_400_4  : (year % 100 != 0 || year % 400 == 0) && year % 4 == 0
b25_16_4    : (year % 25 != 0 || year % 16 == 0) && year % 4 == 0
nb100_400_4 : ((year % 100 != 0) | (year % 400 == 0)) & (year % 4 == 0)
nb25_16_4   : ((year % 25 != 0) | (year % 16 == 0)) & (year % 4 == 0)

Results (live here):

Champions League

This chart suggests that short-circuiting is, indeed, an optimization but divisibility by 4 should be the last to be checked rather then the first. For better performance one should check divisibility by 100 first. This is rather surprising! After all, the latter test is never enough to decide whether the year is leap or not and a subsequent test (either by 400 or by 4) is always required.

Also surprising is that for the branching versions using simpler divisors 25 and 16 is not better than using the more intuitive 100 and 400. I can offer my "theory" which also partially explains why testing for 100 first is better than testing for 4. As many pointed out, the divisibility test by 4 splits execution into (25%, 75%) parts whereas the test for 100 splits it into (1%, 99%). It doesn't matter that after the latter check, the execution must carry on to another test because, at least, the branch predictor is more likely to guess correctly which way to go. Similarly, checking divisibility by 25 splits execution into (4%, 96%) which is more challenging for the branch predictor than (1%, 99%). It looks like that it is better to minimize the entropy of the distribution, helping the branch predictor, rather than maximizing the probability of early return.

For the no branching versions, simplified divisors do offer better performance. In this case the branch predictor plays no role and, therefore, the simpler the better. Generally, the compiler can perform better optimizations with smaller numbers.

Is this it?

Did we hit the wall and found out that

b100_400_4  : (year % 100 != 0 || year % 400 == 0) && year % 4 == 0

is the most performant check for leap years? Definitely not. We haven't for instance, mixed branching operators && or || with no branching ones & and |. Perhaps... Let's see and compare the above with two other implementations. The first is

m100_400_4 : (year % 100 != 0 || year % 400 == 0) & (year % 4 == 0)

(Notice the mix of branching || and non-branching & operators.) The second is an obscure "hack":

bool b;
auto n = 0xc28f5c29 * year;
auto m = n + 0x051eb850;
m = (m << 30 | m >> 2);
if (m <= 0x28f5c28)
  b = n % 16 == 0;
else
  b = n % 4 == 0;
return b

Does the latter work? Yes it does. Instead of giving a mathematical proof I suggest comparing the code emitted for the above with the one for this more readable source:

bool b;
if (year % 100 == 0)
  b = year % 16 == 0;
else
  b = year % 4 == 0;
return b;

in Compiler Explorer. They are almost identical, the only difference being that one uses an add instruction when the other uses a lea. This should convince you that the hack code does work (as long as the other does).

Benchmark results (live here):

Final

Hold on, I hear you say, why not using the more readable code in the picture above? Well, I've tried and learned another lesson. When this code is inserted into the benchmark loop, the compiler looked at the source as a whole and decided to emit different code than when it sees the source in isolation. The performance was worse. Go figure!

And now? Is this it?

I don't know! There are many things that we could explore. The last section, for instance, showed yet another version using an if statement rather that short-circuiting. That could be a way to get even better performance. We could also try the ternary operator ?.

Be aware that all measurements and conclusions were based on GCC-9.2. With another compiler and/or version things might change. For instance, GCC from version 9.1 introduced a new improved algorithm for divisibility checks. Hence, older versions have different performances and the trade-off between an unnecessary calculation and a branch misprediction have likely changed.

The points that we can definitely conclude are:

  1. Do not overthink. Prefer clear code over difficult to understand optimizations (unless you're a library writer offering higher level APIs for your users around very efficient implementations). After all, a hand-crafted snippet might not be the most performant option when you upgrade compiler. As ex-nihilo said in a comment, "There is little point in fussing over a piece of code like this unless it is known to be a performance bottleneck."
  2. Guessing is difficult and it's better to measure.
  3. Measuring is difficult but is better than guessing.
  4. Short-circuiting/branching can be good for performance but it depends on many things including how much the code helps the branch predictor.
  5. Code emitted for a snippet in isolation might be different when it's inlined somewhere.

Solution 4:

int isLeapYear(int year)
{
   return (year % 400 == 0) || ( ( year % 100 != 0) && (year % 4 == 0 ));
}