What kind of C11 data type is an array according to the AMD64 ABI

I was researching the calling convention of x86_64 that's used on OSX and was reading the section called "Aggregates and Unions" in the System V x86-64 ABI standard). It mention arrays and I figured that was like a fixed length c array, e.g. int[5].

I went down to "3.2.3 Parameter Passing" to read about how arrays were passed and if I'm understanding correctly, something like uint8_t[3] should be passed in registers as it's smaller than the four eightbyte limit imposed by rule 1 of the classification of aggregate types (page 18 near the bottom).

After compiling I see that instead it's being passed as a pointer. (I'm compiling with clang-703.0.31 from Xcode 7.3.1 on OSX 10.11.6).

The example source I was using to compile is as follows:

#include <stdio.h>

#define type char

extern void doit(const type[3]);
extern void doitt(const type[5]);
extern void doittt(const type[16]);
extern void doitttt(const type[32]);
extern void doittttt(const type[40]);

int main(int argc, const char *argv[]) {
  const char a[3] = { 1, 2, 3 };
  const char b[5] = { 1, 2, 3, 4, 5 };
  const char c[16] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 1, 1, 1, 1 };
  const char d[32] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 1, 1, 1, 1 };
  const char e[40] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 };

  doit(a);
  doitt(b);
  doittt(c);
  doitttt(d);
  doittttt(e);
}

I dump that in a file named a.c and use the following command to compile: clang -c a.c -o a.o. I use otool to analyze the assembly generated (by running otool -tV a.o) and get the following output:

a.o:
(__TEXT,__text) section
_main:
0000000000000000    pushq   %rbp
0000000000000001    movq    %rsp, %rbp
0000000000000004    subq    $0x10, %rsp
0000000000000008    leaq    _main.a(%rip), %rax
000000000000000f    movl    %edi, -0x4(%rbp)
0000000000000012    movq    %rsi, -0x10(%rbp)
0000000000000016    movq    %rax, %rdi
0000000000000019    callq   _doit
000000000000001e    leaq    _main.b(%rip), %rdi
0000000000000025    callq   _doitt
000000000000002a    leaq    _main.c(%rip), %rdi
0000000000000031    callq   _doittt
0000000000000036    leaq    _main.d(%rip), %rdi
000000000000003d    callq   _doitttt
0000000000000042    leaq    _main.e(%rip), %rdi
0000000000000049    callq   _doittttt
000000000000004e    xorl    %eax, %eax
0000000000000050    addq    $0x10, %rsp
0000000000000054    popq    %rbp
0000000000000055    retq

Or equivalently, here it is on the Godbolt compiler explorer with clang3.7, which targets Linux which uses the same ABI.

So, I was wondering if anyone could lead me to what data types in C11 apply to arrays. (It looks like clang defaults to using C11 - see the blurb here right under C99 inline function).

I also did a similar investigation with ARM and found similar results, even though the ARM standard also specifies there exists an array aggregate type.

Also, is there somewhere in some standard that it's specified that a fixed length array is to be treated as a pointer?

Solution 1:

Bare arrays as function args in C and C++ always decay to pointers, just like in several other contexts.

Arrays inside structs or unions don't, and are passed by value. This is why ABIs need to care about how they're passed, even though it doesn't happen in C for bare arrays.

As Keith Thomson points out, the relevant part of the C standard is N1570 section 6.7.6.3 paragraph 7

A declaration of a parameter as "array of type" shall be adjusted to "qualified pointer to type", where the type qualifiers (if any) are those specified within the [ and ] of the array type derivation ... (stuff about foo[static 10], see below)

Note that multidimensional arrays work as arrays of array type, so only the outer-most level of "array-ness" is converted to a pointer to array type.

Terminology: The x86-64 ABI doc uses the same terminology as ARM, where structs and arrays are "aggregates" (multiple elements at sequential addresses). So the phrase "aggregates and unions" comes up a lot, because unions are handled similarly by the language and the ABI.

It's the recursive rule for handling composite types (struct/union/class) that brings the array-passing rules in the ABI into play. This is the only way you'll see asm that copies an array to the stack as part of a function arg, for C or C++

struct s { int a[8]; };
void ext(struct s byval);

void foo() { struct s tmp = {{0}}; ext(tmp); }

gcc6.1 compiles it (for the AMD64 SysV ABI, with -O3) to the following:

    sub     rsp, 40    # align the stack and leave room for `tmp` even though it's never stored?
    push    0
    push    0
    push    0
    push    0
    call    ext
    add     rsp, 72
    ret

In the x86-64 ABI, pass-by-value happens by actual copying (into registers or the stack), not by hidden pointers.

Note that return-by-value does pass a pointer as a "hidden" first arg (in rdi), when the return value is too large to fit in the 128bit concatenation of rdx:rax (and isn't a vector being returned in vector regs, etc. etc.)

It would be possible for the ABI to use a hidden pointer to pass-by-value objects above a certain size, and trust the called function not to modify the original, but that's not what the x86-64 ABI chooses to do. That would be better in some cases (especially for inefficient C++ with lots of copying without modification (i.e. wasted)), but worse in other cases.

SysV ABI bonus reading: As the x86 tag wiki points out, the current version of the ABI standard doesn't fully document the behaviour that compilers rely on: clang/gcc sign/zero extend narrow args to 32bit.

Note that to really guarantee that a function arg is a fixed-size array, C99 and later lets you use the static keyword in a new way: on array sizes. (It's still passed as a pointer, of course. This doesn't change the ABI).

void bar(int arr[static 10]);

This allows compiler warnings about going out of bounds. It also potentially enables better optimization if the compiler knows it's allowed to access elements that the C source doesn't. (See this blog post). However, the arg still has type int*, not an actual array, so sizeof(arr) == sizeof(int*).

The same keyword page for C++ indicates that ISO C++ does not support this usage of static; it's another one of those C-only features, along with C99 variable-length-arrays and a few other goodies that C++ doesn't have.

In C++, you can use std::array<int,10> to get compile-time size information passed to the caller. However, you have to manually pass it by reference if that's what you want, since it's of course just a class containing an int arr[10]. Unlike a C-style array, it doesn't decay to T* automatically.

The ARM doc that you linked doesn't seem to actually call arrays an aggregate type: Section 4.3 Composite Types (which discusses alignment) distinguishes arrays from aggregate types, even though they appear to be a special case of its definition for aggregates.

A Composite Type is a collection of one or more Fundamental Data Types that are handled as a single entity at the procedure call level. A Composite Type can be any of:

An aggregate, where the members are laid out sequentially in memory

A union, where each of the members has the same address

An array, which is a repeated sequence of some other type (its base type).

The definitions are recursive; that is, each of the types may contain a Composite Type as a member

"Composite" is an umbrella term that includes arrays, structs, and unions.

What kind of C11 data type is an array according to the AMD64 ABI

Solution 1:

Related

Recent Posts