I am working on refactoring some old code and have found few structs containing zero length arrays (below). Warnings depressed by pragma, of course, but I've failed to create by "new" structures containing such structures (error 2233). Array 'byData' used as pointer, but why not to use pointer instead? or array of length 1? And of course, no comments were added to make me enjoy the process... Any causes to use such thing? Any advice in refactoring those?

struct someData
{
   int nData;
   BYTE byData[0];
}

NB It's C++, Windows XP, VS 2003


Yes this is a C-Hack.
To create an array of any length:

struct someData* mallocSomeData(int size)
{
    struct someData*  result = (struct someData*)malloc(sizeof(struct someData) + size * sizeof(BYTE));
    if (result)
    {    result->nData = size;
    }
    return result;
}

Now you have an object of someData with an array of a specified length.


There are, unfortunately, several reasons why you would declare a zero length array at the end of a structure. It essentially gives you the ability to have a variable length structure returned from an API.

Raymond Chen did an excellent blog post on the subject. I suggest you take a look at this post because it likely contains the answer you want.

Note in his post, it deals with arrays of size 1 instead of 0. This is the case because zero length arrays are a more recent entry into the standards. His post should still apply to your problem.

http://blogs.msdn.com/oldnewthing/archive/2004/08/26/220873.aspx

EDIT

Note: Even though Raymond's post says 0 length arrays are legal in C99 they are in fact still not legal in C99. Instead of a 0 length array here you should be using a length 1 array


This is an old C hack to allow a flexible sized arrays.

In C99 standard this is not neccessary as it supports the arr[] syntax.


Your intution about "why not use an array of size 1" is spot on.

The code is doing the "C struct hack" wrong, because declarations of zero length arrays are a constraint violation. This means that a compiler can reject your hack right off the bat at compile time with a diagnostic message that stops the translation.

If we want to perpetrate a hack, we must sneak it past the compiler.

The right way to do the "C struct hack" (which is compatible with C dialects going back to 1989 ANSI C, and probably much earlier) is to use a perfectly valid array of size 1:

struct someData
{
   int nData;
   unsigned char byData[1];
}

Moreover, instead of sizeof struct someData, the size of the part before byData is calculated using:

offsetof(struct someData, byData);

To allocate a struct someData with space for 42 bytes in byData, we would then use:

struct someData *psd = (struct someData *) malloc(offsetof(struct someData, byData) + 42);

Note that this offsetof calculation is in fact the correct calculation even in the case of the array size being zero. You see, sizeof the whole structure can include padding. For instance, if we have something like this:

struct hack {
  unsigned long ul;
  char c;
  char foo[0]; /* assuming our compiler accepts this nonsense */
};

The size of struct hack is quite possibly padded for alignment because of the ul member. If unsigned long is four bytes wide, then quite possibly sizeof (struct hack) is 8, whereas offsetof(struct hack, foo) is almost certainly 5. The offsetof method is the way to get the accurate size of the preceding part of the struct just before the array.

So that would be the way to refactor the code: make it conform to the classic, highly portable struct hack.

Why not use a pointer? Because a pointer occupies extra space and has to be initialized.

There are other good reasons not to use a pointer, namely that a pointer requires an address space in order to be meaningful. The struct hack is externalizeable: that is to say, there are situations in which such a layout conforms to external storage such as areas of files, packets or shared memory, in which you do not want pointers because they are not meaningful.

Several years ago, I used the struct hack in a shared memory message passing interface between kernel and user space. I didn't want pointers there, because they would have been meaningful only to the original address space of the process generating a message. The kernel part of the software had a view to the memory using its own mapping at a different address, and so everything was based on offset calculations.


It's worth pointing out IMO the best way to do the size calculation, which is used in the Raymond Chen article linked above.

struct foo
{
    size_t count;
    int data[1];
}

size_t foo_size_from_count(size_t count)
{
    return offsetof(foo, data[count]);
}

The offset of the first entry off the end of desired allocation, is also the size of the desired allocation. IMO it's an extremely elegant way of doing the size calculation. It does not matter what the element type of the variable size array is. The offsetof (or FIELD_OFFSET or UFIELD_OFFSET in Windows) is always written the same way. No sizeof() expressions to accidentally mess up.