Why are string literals l-value while all other literals are r-value?

A string literal is a literal with array type, and in C there is no way for an array type to exist in an expression except as an lvalue. String literals could have been specified to have pointer type (rather than array type that usually decays to a pointer) pointing to the string "contents", but this would make them rather less useful; in particular, the sizeof operator could not be applied to them.

Note that C99 introduced compound literals, which are also lvalues, so having a literal be an lvalue is no longer a special exception; it's closer to being the norm.

String literals are arrays - objects of inherently unpredictable size (i.e of user-defined and possibly large size). In general case, there's simply no other way to represent such literals except as objects in memory, i.e. as lvalues. In C99 this also applies to compound literals, which are also lvalues.

Any attempts to artificially hide the fact that string literals are lvalues at the language level would produce a considerable number of completely unnecessary difficulties, since the ability to point to a string literal with a pointer as well as the ability to access it as an array relies critically on its lvalue-ness being visible at the language level.

Meanwhile, literals of scalar types have fixed compile-time size. At the same time, such literals are very likely to be embedded directly into the machine commands on the given hardware architecture. For example, when you write something like i = i * 5 + 2, the literal values 5 and 2 become explicit (or even implicit) parts of the generated machine code. They don't exist and don't need to exist as standalone locations in data storage. There's simply no point in storing values 5 and 2 in the data memory.

It is also worth noting that on many (if not most, or all) hardware architectures floating-point literals are actually implemented as "hidden" lvalues (even though the language does not expose them as such). On platforms like x86 machine commands from floating-point group do not support embedded immediate operands. This means that virtually every floating-point literal has to be stored in (and read from) data memory by the compiler. E.g. when you write something like i = i * 5.5 + 2.1 it is translated into something like

const double unnamed_double_5_5 = 5.5;
const double unnamed_double_2_1 = 2.1;
i = i * unnamed_double_5_5 + unnamed_double_2_1;

In other words, floating-point literals often end up becoming "unofficial" lvalues internally. However, it makes perfect sense that language specification did not make any attempts to expose this implementation detail. At language level, arithmetic literals make more sense as rvalues.

I'd guess that the original motive was mainly a pragmatic one: a string literal must reside in memory and have an address. The type of a string literal is an array type (char[] in C, char const[] in C++), and array types convert to pointers in most contexts. The language could have found other ways to define this (e.g. a string literal could have pointer type to begin with, with special rules concerning what it pointed to), but just making the literal an lvalue is probably the easiest way of defining what is concretely needed.

An lvalue in C++ does not always refer to an object. It can refer to a function too. Moreover, objects do not have to be referred to by lvalues. They may be referred to by rvalues, including for arrays (in C++ and C). However, in old C89, the array to pointer conversion did not apply for rvalues arrays.

Now, an rvalue denotes no, limited or soon to be an expired lifetime. A string literal, however, lives for the entire program.

So string literals being lvalues is exactly right.

Why are string literals l-value while all other literals are r-value?

Related

Recent Posts