How are C data types “supported directly by most computers”?

I am reading K&R's “The C Programming Language” and came across this statement [Introduction, p. 3]:

Because the data types and control structures provided by C are supported directly by most computers, the run-time library required to implement self-contained programs is tiny.

What does the bolded statement mean? Is there an example of a data type or a control structure that isn't supported directly by a computer?


Yes, there are data types not directly supported.

On many embedded systems, there is no hardware floating point unit. So, when you write code like this:

float x = 1.0f, y = 2.0f;
return x + y;

It gets translated into something like this:

unsigned x = 0x3f800000, y = 0x40000000;
return _float_add(x, y);

Then the compiler or standard library has to supply an implementation of _float_add(), which takes up memory on your embedded system. If you're counting bytes on a really tiny system, this can add up.

Another common example is 64-bit integers (long long in the C standard since 1999), which are not directly supported by 32-bit systems. Old SPARC systems didn't support integer multiplication, so multiplication had to be supplied by the runtime. There are other examples.

Other languages

By comparison, other languages have more complicated primitives.

For example, a Lisp symbol requires a lot of runtime support, just like tables in Lua, strings in Python, arrays in Fortran, et cetera. The equivalent types in C are usually either not part of the standard library at all (no standard symbols or tables) or they are much simpler and don't require much runtime support (arrays in C are basically just pointers, nul-terminated strings are almost as simple).

Control structures

A notable control structure missing from C is exception handling. Nonlocal exit is limited to setjmp() and longjmp(), which just save and restore certain parts of processor state. By comparison, the C++ runtime has to walk the stack and call destructors and exception handlers.


Actually, I'll bet that the contents of this introduction haven't changed much since 1978 when Kernighan and Ritchie first wrote them in the First Edition of the book, and they refer to the history and evolution of C at that time more than modern implementations.

Computers are fundamentally just memory banks and central processors, and each processor operates using a machine code; part of the design of each processor is an instruction set architecture, called an Assembly Language, which maps one-to-one from a set of human-readable mnemonics to machine code, which is all numbers.

The authors of the C language – and the B and BCPL languages that immediately preceded it – were intent upon defining constructs in the language that were as efficiently compiled into Assembly as possible ... in fact, they were forced to by limitations in the target hardware. As other answers have pointed out, this involved branches (GOTO and other flow control in C), moves (assignment), logical operations (& | ^), basic arithmetic (add, subtract, increment, decrement), and memory addressing (pointers). A good example is the pre-/post-increment and decrement operators in C, which supposedly were added to the B language by Ken Thompson specifically because they were capable of translating directly to a single opcode once compiled.

This is what the authors meant when they said "supported directly by most computers". They didn't mean that other languages contained types and structures that were not supported directly - they meant that by design C constructs translated most directly (sometimes literally directly) into Assembly.

This close relation to the underlying Assembly, while still providing all the elements required for structured programming, are what led to C's early adoption, and what keep it a popular language today in environments where efficiency of code compiled is still key.

For an interesting write-up of the history of the language, see The Development of the C Language - Dennis Ritchie


The short answer is, most of the language constructs supported by C are also supported by the target computer's microprocessor, therefore, compiled C code translates very nicely and efficient to the microprocessor's assembly language, thereby resulting in smaller code and a smaller footprint.

The longer answer requires a little bit of assembly language knowledge. In C, a statement such as this:

int myInt = 10;

would translate to something like this in assembly:

myInt dw 1
mov myInt,10

Compare this to something like C++:

MyClass myClass;
myClass.set_myInt(10);

The resulting assembly language code (depending on how big MyClass() is), could add up to hundreds of assembly language lines.

Without actually creating programs in assembly language, pure C is probably the "skinniest" and "tightest" code you can make a program in.

EDIT

Given the comments on my answer, I decided to run a test, just for my own sanity. I created a program called "test.c", which looked like this:

#include <stdio.h>

void main()
{
    int myInt=10;

    printf("%d\n", myInt);
}

I compiled this down to assembly using gcc. I used the following command line to compile it:

gcc -S -O2 test.c

Here is the resulting assembly language:

    .file   "test.c"
    .section    .rodata.str1.1,"aMS",@progbits,1
.LC0:
    .string "%d\n"
    .section    .text.unlikely,"ax",@progbits
.LCOLDB1:
    .section    .text.startup,"ax",@progbits
.LHOTB1:
    .p2align 4,,15
    .globl  main
    .type   main, @function
main:
.LFB24:
    .cfi_startproc
    movl    $10, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    jmp __printf_chk
    .cfi_endproc
.LFE24:
    .size   main, .-main
    .section    .text.unlikely
.LCOLDE1:
    .section    .text.startup
.LHOTE1:
    .ident  "GCC: (Ubuntu 4.9.1-16ubuntu6) 4.9.1"
    .section    .note.GNU-stack,"",@progbits

I then create a file called "test.cpp" which defined a class and outputted the same thing as "test.c":

#include <iostream>
using namespace std;

class MyClass {
    int myVar;
public:
    void set_myVar(int);
    int get_myVar(void);
};

void MyClass::set_myVar(int val)
{
    myVar = val;
}

int MyClass::get_myVar(void)
{
    return myVar;
}

int main()
{
    MyClass myClass;
    myClass.set_myVar(10);

    cout << myClass.get_myVar() << endl;

    return 0;
}

I compiled it the same way, using this command:

g++ -O2 -S test.cpp

Here is the resulting assembly file:

    .file   "test.cpp"
    .section    .text.unlikely,"ax",@progbits
    .align 2
.LCOLDB0:
    .text
.LHOTB0:
    .align 2
    .p2align 4,,15
    .globl  _ZN7MyClass9set_myVarEi
    .type   _ZN7MyClass9set_myVarEi, @function
_ZN7MyClass9set_myVarEi:
.LFB1047:
    .cfi_startproc
    movl    %esi, (%rdi)
    ret
    .cfi_endproc
.LFE1047:
    .size   _ZN7MyClass9set_myVarEi, .-_ZN7MyClass9set_myVarEi
    .section    .text.unlikely
.LCOLDE0:
    .text
.LHOTE0:
    .section    .text.unlikely
    .align 2
.LCOLDB1:
    .text
.LHOTB1:
    .align 2
    .p2align 4,,15
    .globl  _ZN7MyClass9get_myVarEv
    .type   _ZN7MyClass9get_myVarEv, @function
_ZN7MyClass9get_myVarEv:
.LFB1048:
    .cfi_startproc
    movl    (%rdi), %eax
    ret
    .cfi_endproc
.LFE1048:
    .size   _ZN7MyClass9get_myVarEv, .-_ZN7MyClass9get_myVarEv
    .section    .text.unlikely
.LCOLDE1:
    .text
.LHOTE1:
    .section    .text.unlikely
.LCOLDB2:
    .section    .text.startup,"ax",@progbits
.LHOTB2:
    .p2align 4,,15
    .globl  main
    .type   main, @function
main:
.LFB1049:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    movl    $10, %esi
    movl    $_ZSt4cout, %edi
    call    _ZNSolsEi
    movq    %rax, %rdi
    call    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
    xorl    %eax, %eax
    addq    $8, %rsp
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc
.LFE1049:
    .size   main, .-main
    .section    .text.unlikely
.LCOLDE2:
    .section    .text.startup
.LHOTE2:
    .section    .text.unlikely
.LCOLDB3:
    .section    .text.startup
.LHOTB3:
    .p2align 4,,15
    .type   _GLOBAL__sub_I__ZN7MyClass9set_myVarEi, @function
_GLOBAL__sub_I__ZN7MyClass9set_myVarEi:
.LFB1056:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    movl    $_ZStL8__ioinit, %edi
    call    _ZNSt8ios_base4InitC1Ev
    movl    $__dso_handle, %edx
    movl    $_ZStL8__ioinit, %esi
    movl    $_ZNSt8ios_base4InitD1Ev, %edi
    addq    $8, %rsp
    .cfi_def_cfa_offset 8
    jmp __cxa_atexit
    .cfi_endproc
.LFE1056:
    .size   _GLOBAL__sub_I__ZN7MyClass9set_myVarEi, .-_GLOBAL__sub_I__ZN7MyClass9set_myVarEi
    .section    .text.unlikely
.LCOLDE3:
    .section    .text.startup
.LHOTE3:
    .section    .init_array,"aw"
    .align 8
    .quad   _GLOBAL__sub_I__ZN7MyClass9set_myVarEi
    .local  _ZStL8__ioinit
    .comm   _ZStL8__ioinit,1,1
    .hidden __dso_handle
    .ident  "GCC: (Ubuntu 4.9.1-16ubuntu6) 4.9.1"
    .section    .note.GNU-stack,"",@progbits

As you can clearly see, the resulting assembly file is much larger on the C++ file then it is on the C file. Even if you cut out all the other stuff and just compare the C "main" to the C++ "main", there is a lot of extra stuff.


K&R mean that most C expressions (technical meaning) map to one or a few assembly instructions, not a function call to a support library. The usual exceptions are integer division on architectures without a hardware div instruction, or floating point on machines with no FPU.

There's a quote:

C combines the flexibility and power of assembly language with the user-friendliness of assembly language.

(found here. I thought I remembered a different variation, like "speed of assembly language with the convenience and expressivity of assembly language".)

long int is usually the same width as the native machine registers.

Some higher level languages define the exact width of their data types, and implementations on all machines must work the same. Not C, though.

If you want to work with 128bit ints on x86-64, or in the general case BigInteger of arbitrary size, you need a library of functions for it. All CPUs now use 2s complement as the binary representation of negative integers, but even that wasn't the case back when C was designed. (That's why some things that would give different results on non 2s-complement machines are technically undefined in the C standards.)

C pointers to data or to functions work the same way as assembly addresses.

If you want ref-counted references, you have to do it yourself. If you want c++ virtual member functions that call a different function depending on what kind of object your pointer is pointing to, the C++ compiler has to generate a lot more than just a call instruction with a fixed address.

Strings are just arrays

Outside of library functions, the only string operations provided are read/write a character. No concat, no substring, no search. (Strings are stored as nul-terminated ('\0') arrays of 8bit integers, not pointer+length, so to get a substring you'd have to write a nul into the original string.)

CPUs sometimes have instructions designed for use by a string-search function, but still usually process one byte per instruction executed, in a loop. (or with the x86 rep prefix. Maybe if C was designed on x86, string search or compare would be a native operation, rather than a library function call.)

Many other answers give examples of things that aren't natively supported, like exception handling, hash tables, lists. K&R's design philosophy is the reason C doesn't have any of these natively.


The assembly language of a process generally deals with jump (go to), statements, move statements, binary arthritic (XOR, NAND, AND OR, etc), memory fields (or address). Categorizes memory into two types, instruction and data. That is about all an assembly language is (I am sure assembly programmers will argue there is more to it than that, but it boils down to this in general). C closely resembles this simplicity.

C is to assemble what algebra is to arithmetic.

C encapsulates the basics of assembly (the processor's language). Is probably a truer statement than "Because the data types and control structures provided by C are supported directly by most computers"