Coupon code generation

I would like to generate coupon codes , e.g. AYB4ZZ2. However, I would also like to be able to mark the used coupons and limit their global number, let's say N. The naive approach would be something like "generate N unique alphanumeric codes, put them into database and perform a db search on every coupon operation."

However, as far as I realize, we can also attempt to find a function MakeCoupon(n), which converts the given number into a coupon-like string with predefined length.

As far as I understand, MakeCoupon should fullfill the following requirements:

  • Be bijective. It's inverse MakeNumber(coupon) should be effectively computable.

  • Output for MakeCoupon(n) should be alphanumeric and should have small and constant length - so that it could be called human readable. E.g. SHA1 digest wouldn't pass this requirement.

  • Practical uniqueness. Results of MakeCoupon(n) for every natural n <= N should be totally unique or unique in the same terms as, for example, MD5 is unique (with the same extremely small collision probability).

  • (this one is tricky to define) It shouldn't be obvious how to enumerate all remaining coupons from a single coupon code - let's say MakeCoupon(n) and MakeCoupon(n + 1) should visually differ.

    E.g. MakeCoupon(n), which simply outputs n padded with zeroes would fail this requirement, because 000001 and 000002 don't actually differ "visually".

Q:

Does any function or function generator, which fullfills the following requirements, exist? My search attempts only lead me to [CPAN] CouponCode, but it does not fullfill the requirement of the corresponding function being bijective.


Basically you can split your operation into to parts:

  1. Somehow "encrypt" your initial number n, so that two consecutive numbers yield (very) different results
  2. Construct your "human-readable" code from the result of step 1

For step 1 I'd suggest to use a simple block cipher (e.g. a Feistel cipher with a round function of your choice). See also this question.

Feistel ciphers work in several rounds. During each round, some round function is applied to one half of the input, the result is xored with the other half and the two halves are swapped. The nice thing about Feistel ciphers is that the round function hasn't to be two-way (the input to the round function is retained unmodified after each round, so the result of the round function can be reconstructed during decryption). Therefore you can choose whatever crazy operation(s) you like :). Also Feistel ciphers are symmetric, which fulfills your first requirement.

A short example in C#

const int BITCOUNT = 30;
const int BITMASK = (1 << BITCOUNT/2) - 1;

static uint roundFunction(uint number) {
  return (((number ^ 47894) + 25) << 1) & BITMASK;
}

static uint crypt(uint number) {
  uint left = number >> (BITCOUNT/2);
  uint right = number & BITMASK;
  for (int round = 0; round < 10; ++round) {
    left = left ^ roundFunction(right);
    uint temp = left; left = right; right = temp;
  }
  return left | (right << (BITCOUNT/2));
}

(Note that after the last round there is no swapping, in the code the swapping is simply undone in the construction of the result)

Apart from fulfilling your requirements 3 and 4 (the function is total, so for different inputs you get different outputs and the input is "totally scrambled" according to your informal definition) it is also it's own inverse (thus implicitely fulfilling requirement 1), i.e. crypt(crypt(x))==x for each x in the input domain (0..2^30-1 in this implementation). Also it's cheap in terms of performance requirements.

For step 2 just encode the result to some base of your choice. For instance, to encode a 30-bit number, you could use 6 "digits" of an alphabet of 32 characters (so you can encode 6*5=30 bits).

An example for this step in C#:

const string ALPHABET= "AG8FOLE2WVTCPY5ZH3NIUDBXSMQK7946";
static string couponCode(uint number) {
  StringBuilder b = new StringBuilder();
  for (int i=0; i<6; ++i) {
    b.Append(ALPHABET[(int)number&((1 << 5)-1)]);
    number = number >> 5;
  }
  return b.ToString();
}
static uint codeFromCoupon(string coupon) {
  uint n = 0;
  for (int i = 0; i < 6; ++i)
    n = n | (((uint)ALPHABET.IndexOf(coupon[i])) << (5 * i));
  return n;
}

For inputs 0 - 9 this yields the following coupon codes

0 => 5VZNKB
1 => HL766Z
2 => TMGSEY
3 => P28L4W
4 => EM5EWD
5 => WIACCZ
6 => 8DEPDA
7 => OQE33A
8 => 4SEQ5A
9 => AVAXS5

Note, that this approach has two different internal "secrets": First, the round function together with the number of rounds used and second, the alphabet you use for encoding the encyrpted result. But also note, that the shown implementation is in no way secure in a cryptographical sense!

Also note, that the shown function is a total bijective function, in the sense, that every possible 6-character code (with characters out of your alphabet) will yield a unique number. To prevent anyone from entering just some random code, you should define some kind of restictions on the input number. E.g. only issue coupons for the first 10.000 numbers. Then, the probability of some random coupon code to be valid would be 10000/2^30=0.00001 (it would require about 50000 attempts to find a correct coupon code). If you need more "security", you can just increase the bit size/coupon code length (see below).

EDIT: Change Coupon code length

Changing the length of the resulting coupon code requires some math: The first (encrypting) step only works on a bit string with even bit count (this is required for the Feistel cipher to work).

In the the second step, the number of bits that can be encoded using a given alphabet depends on the "size" of chosen alphabet and the length of the coupon code. This "entropy", given in bits, is, in general, not an integer number, far less an even integer number. For example:

A 5-digit code using a 30 character alphabet results in 30^5 possible codes which means ld(30^5)=24.53 bits/Coupon code.

For a four-digit code, there is a simple solution: Given a 32-Character alphabet you can encode *ld(32^4)=5*4=20* Bits. So you can just set the BITCOUNT to 20 and change the for loop in the second part of the code to run until 4 (instead of 6)

Generating a five-digit code is a bit trickier and somhow "weakens" the algorithm: You can set the BITCOUNT to 24 and just generate a 5-digit code from an alphabet of 30 characters (remove two characters from the ALPHABET string and let the for loop run until 5).

But this will not generate all possible 5-digit-codes: with 24 bits you can only get 16,777,216 possible values from the encryption stage, the 5 digit codes could encode 24,300,000 possible numbers, so some possible codes will never be generated. More specifically, the last position of the code will never contain some characters of the alphabet. This can be seen as a drawback, because it narrows down the set of valid codes in an obvious way.

When decoding a coupon code, you'll first have to run the codeFromCoupon function and then check, if bit 25 of the result is set. This would mark an invalid code that you can immediately reject. Note that, in practise, this might even be an advantage, since it allows a quick check (e.g. on the client side) of the validity of a code without giving away all internals of the algorithm.

If bit 25 is not set you'll call the crypt function and get the original number.


Though I may get docked for this answer I feel like I need to respond - I really hope that you hear what I'm saying as it comes from a lot of painful experience.

While this task is very academically challenging, and software engineers tend to challenge their intelect vs. solving problems, I need to provide you with some direction on this if I may. There is no retail store in the world, that has any kind of success anyway, that doesn't keep very good track of each and every entity that is generated; from each piece of inventory to every single coupon or gift card they send out those doors. It's just not being a good steward if you are, because it's not if people are going to cheat you, it's when, and so if you have every possible item in your arsenal you'll be ready.

Now, let's talk about the process by which the coupon is used in your scenario.

When the customer redeems the coupon there is going to be some kind of POS system in front right? And that may even be an online business where they are then able to just enter their coupon code vs. a register where the cashier scans a barcode right (I'm assuming that's what we're dealing with here)? And so now, as the vendor, you're saying that if you have a valid coupon code I'm going to give you some kind of discount and because our goal was to generate coupon codes that were reversable we don't need a database to verify that code, we can just reverse it right! I mean it's just math right? Well, yes and no.

Yes, you're right, it's just math. In fact, that's also the problem because so is cracking SSL. But, I'm going to assume that we all realize the math used in SSL is just a bit more complex than anything used here and the key is substantially larger.

It does not behoove you, nor is it wise for you to try and come up with some kind of scheme that you're just sure nobody cares enough to break, especially when it comes to money. You are making your life very difficult trying to solve a problem you really shouldn't be trying to solve because you need to be protecting yourself from those using the coupon codes.

Therefore, this problem is unnecessarily complicated and could be solved like this.

// insert a record into the database for the coupon
// thus generating an auto-incrementing key
var id = [some code to insert into database and get back the key]

// base64 encode the resulting key value
var couponCode = Convert.ToBase64String(id);

// truncate the coupon code if you like

// update the database with the coupon code
  1. Create a coupon table that has an auto-incrementing key.
  2. Insert into that table and get the auto-incrementing key back.
  3. Base64 encode that id into a coupon code.
  4. Truncate that string if you want.
  5. Store that string back in the database with the coupon just inserted.