3.5 Declarations

The Committee decided that empty declarations are invalid (except for a special case with tags, see §3.5.2.3, and the case of enumerations such as enum {zero,one};, see §3.5.2.2). While many seemingly silly constructs are tolerated in other parts of the language in the interest of facilitating the machine generation of C, empty declarations were considered sufficiently easy to avoid.

The practice of placing the storage class specifier other than first in a declaration has been branded as obsolescent. (See §3.9.3.) The Committee feels it desirable to rule out such constructs as

        enum { aaa, aab,
           /* etc */
        zzy, zzz } typedef a2z;

in some future standard.

3.5.1 Storage-class specifiers

Because the address of a register variable cannot be taken, objects of storage class register effectively exist in a space distinct from other objects. (Functions occupy yet a third address space). This makes them candidates for optimal placement, the usual reason for declaring registers, but it also makes them candidates for more aggressive optimization.

The practice of representing register variables as wider types (as when register char is quietly changed to register int) is no longer acceptable.

3.5.2 Type specifiers

Several new type specifiers have been added: signed, enum, and void. long float has been retired and long double has been added, along with a plethora of integer types. The Committee's reasons for each of these additions, and the one deletion, are given in section §3.1.2.5 of this document.

3.5.2.1 Structure and union specifiers

Three types of bit fields are now defined: ``plain'' int calls for implementation-defined signedness (as in the Base Document), signed int calls for assuredly signed fields, and unsigned int calls for unsigned fields. The old constraints on bit fields crossing word boundaries have been relaxed, since so many properties of bit fields are implementation dependent anyway.

The layout of structures is determined only to a limited extent:

no hole may occur at the beginning;
members occupy increasing storage addresses; and
if necessary, a hole is placed on the end to make the structure big enough to pack tightly into arrays and maintain proper alignment.

Since some existing implementations, in the interest of enhanced access time, leave internal holes larger than absolutely necessary, it is not clear that a portable deterministic method can be given for traversing a structure field by field.

To clarify what is meant by the notion that ``all the fields of a union occupy the same storage,'' the Standard specifies that a pointer to a union, when suitably cast, points to each member (or, in the case of a bit-field member, to the storage unit containing the bit field).

3.5.2.2 Enumeration specifiers

3.5.2.3 Tags

As with all block structured languages that also permit forward references, C has a problem with structure and union tags. If one wants to declare, within a block, two mutually referencing structures, one must write something like:

        struct x { struct y *p; /*...*/ };
        struct y { struct x *q; /*...*/ };

But if struct y is already defined in a containing block, the first field of struct x will refer to the older declaration.

Thus special semantics has been given to the form:

        struct y;

It now hides the outer declaration of y, and ``opens'' a new instance in the current block.

QUIET CHANGE

struct

x;

3.5.3 Type qualifiers

The Committee has added to C two type qualifiers: const and volatile. Individually and in combination they specify the assumptions a compiler can and must make when accessing an object through an lvalue.

The syntax and semantics of const were adapted from C++; the concept itself has appeared in other languages. volatile is an invention of the Committee; it follows the syntactic model of const.

Type qualifiers were introduced in part to provide greater control over optimization. Several important optimization techniques are based on the principle of ``cacheing'': under certain circumstances the compiler can remember the last value accessed (read or written) from a location, and use this retained value the next time that location is read. (The memory, or ``cache'', is typically a hardware register.) If this memory is a machine register, for instance, the code can be smaller and faster using the register rather than accessing external memory.

The basic qualifiers can be characterized by the restrictions they impose on access and cacheing:

const: No writes through this lvalue. In the absence of this qualifier, writes may occur through this lvalue.
volatile: No cacheing through this lvalue: each operation in the abstract semantics must be performed. (That is, no cacheing assumptions may be made, since the location is not guaranteed to contain any previous value.) In the absence of this qualifier, the contents of the designated location may be assumed to be unchanged (except for possible aliasing.)

A translator design with no cacheing optimizations can effectively ignore the type qualifiers, except insofar as they affect assignment compatibility.

It would have been possible, of course, to specify a nonconst keyword instead of const, or nonvolatile instead of volatile. The senses of these concepts in the Standard were chosen to assure that the default, unqualified, case was the most common, and that it corresponded most clearly to traditional practice in the use of lvalue expressions.

Four combinations of the two qualifiers is possible; each defines a useful set of lvalue properties. The next several paragraphs describe typical uses of these qualifiers.

The translator may assume, for an unqualified lvalue, that it may read or write the referenced object, that the value of this object cannot be changed except by explicitly programmed actions in the current thread of control, but that other lvalue expressions could reference the same object.

const is specified in such a way that an implementation is at liberty to put const objects in read-only storage, and is encouraged to diagnose obvious attempts to modify them, but is not required to track down all the subtle ways that such checking can be subverted. If a function parameter is declared const, then the referenced object is not changed (through that lvalue) in the body of the function --- the parameter is read-only.

A static volatile object is an appropriate model for a memory-mapped I/O register. Implementors of C translators should take into account relevant hardware details on the target systems when implementing accesses to volatile objects. For instance, the hardware logic of a system may require that a two-byte memory-mapped register not be accessed with byte operations; a compiler for such a system would have to assure that no such instructions were generated, even if the source code only accesses one byte of the register. Whether read-modify-write instructions can be used on such device registers must also be considered. Whatever decisions are adopted on such issues must be documented, as volatile access is implementation-defined. A volatile object is an appropriate model for a variable shared among multiple processes.

A static const volatile object appropriately models a memory-mapped input port, such as a real-time clock. Similarly, a const volatile object models a variable which can be altered by another process but not by this one.

Although the type qualifiers are formally treated as defining new types they actually serve as modifiers of declarators. Thus the declarations

        const struct s {int a,b;} x;
        struct s  y;

declare x as a const object, but not y. The const property can be associated with the aggregate type by means of a type definition:

        typedef const struct s {int a,b;} stype;
        stype x;
        stype y;

In these declarations the const property is associated with the declarator stype, so x and y are both const objects.

The Committee considered making const and volatile storage classes, but this would have ruled out any number of desirable constructs, such as const members of structures and variable pointers to const types.

A cast of a value to a qualified type has no effect; the qualification (volatile, say) can have no effect on the access since it has occurred prior to the cast. If it is necessary to access a non-volatile object using volatile semantics, the technique is to cast the address of the object to the appropriate pointer-to-qualified type, then dereference that pointer.

3.5.4 Declarators

The function prototype syntax was adapted from C++. (See §3.3.2.2 and §3.5.4.3)

Some current implementations have a limit of six type modifiers (function returning, array of, pointer to), the limit used in Ritchie's original compiler. This limit has been raised to twelve since the original limit has proven insufficient in some cases; in particular, it did not allow for FORTRAN-to-C translation, since FORTRAN allows for seven subscripts. (Some users have reported using nine or ten levels, particularly in machine-generated C code.)

3.5.4.1 Pointer declarators

A pointer declarator may have its own type qualifiers, to specify the attributes of the pointer itself, as opposed to those of the reference type. The construct is adapted from C++.

const int * means (variable) pointer to constant int, and int * const means constant pointer to (variable) int, just as in C++, from which these constructs were adopted. (And mutatis mutandis for the other type qualifiers.) As with other aspects of C type declarators, judicious use of typedef statements can clarify the code.

3.5.4.2 Array declarators

The concept of composite types (§3.1.2.6) was introduced to provide for the accretion of information from incomplete declarations, such as array declarations with missing size, and function declarations with missing prototype (argument declarations). Type declarators are therefore said to specify compatible types if they agree except for the fact that one provides less information of this sort than the other.

The declaration of 0-length arrays is invalid, under the general principle of not providing for 0-length objects. The only common use of this construct has been in the declaration of dynamically allocated variable-size arrays, such as

        struct segment {
            short int	count;
            char	c[N];
        };

        struct segment * new_segment( const int length )
        {
            struct segment * result;
            result = malloc( sizeof segment + (length-N) );
            result->count = length;
            return result;
        }

In such usage, N would be 0 and (length-N) would be written as length. But this paradigm works just as well, as written, if N is 1. (Note, by the by, an alternate way of specifying the size of result:

            result = malloc( offsetof(struct segment,c) + length );

This illustrates one of the uses of the offsetof macro.)

3.5.4.3 Function declarators (including prototypes)

The function prototype mechanism is one of the most useful additions to the C language. The feature, of course, has precedent in many of the Algol-derived languages of the past 25 years. The particular form adopted in the Standard is based in large part upon C++.

Function prototypes provide a powerful translation-time error detection capability. In traditional C practice without prototypes, it is extremely difficult for the translator to detect errors (wrong number or type of arguments) in calls to functions declared in another source file. Detection of such errors has either occurred at runtime, or through the use of auxiliary software tools.

In function calls not in the scope of a function prototype, integral arguments have the integral widening conversions applied and float arguments are widened to double. It is thus impossible in such a call to pass an unconverted char or float argument. Function prototypes give the programmer explicit control over the function argument type conversions, so that the often inappropriate and sometimes inefficient default widening rules for arguments can be suppressed by the implementation. Modifications of function interfaces are easier in cases where the actual arguments are still assignment compatible with the new formal parameter type --- only the function definition and its prototype need to be rewritten in this case; no function calls need be rewritten.

Allowing an optional identifier to appear in a function prototype serves two purposes:

the programmer can associate a meaningful name with each argument position for documentation purposes, and
a function declarator and a function prototype can use the same syntax. The consistent syntax makes it easier for new users of C to learn the language. Automatic generation of function prototype declarators from function definitions is also facilitated.

Optimizers can also take advantage of function prototype information. Consider this example:

        extern int compare(const char * string1,
                           const char * string2) ;
        
        void func2(int x)
        {
                char * str1, * str2 ;
                    /* ... */
                x = compare(str1, str2) ;
                    /* ... */
        }

The optimizer knows that the pointers passed to compare are not used to assign new values to any objects that the pointers reference. Hence the optimizer can make less conservative assumptions about the side effects of compare than would otherwise be necessary.

The Standard requires that calls to functions taking a variable number of arguments must occur in the presence of a prototype (using the trailing ellipsis notation ,...). An implementation may thus assume that all other functions are called with a fixed argument list, and may therefore use possibly more efficient calling sequences. Programs using old-style headers in which the number of arguments in the calls and the definition differ may not work in implementations which take advantage of such optimizations. This is not a Quiet Change, strictly speaking, since the program does not conform to the Standard. A word of warning is in order, however, since the style is not uncommon in extant code, and since a conforming translator is not required to diagnose such mismatches when they occur in separate translation units. Such trouble spots can be made manifest (assuming an implementation provides reasonable diagnostics) by providing new-style function declarations in the translation units with the non-matching calls. Programmers who currently rely on being able to omit trailing arguments are advised to recode using the <stdarg.h> paradigm.

Function prototypes may be used to define function types as well:

        typedef  double (*d_binop) (double A, double B);

        struct d_funct {
            d_binop        f1;
            int            (*f2)(double, double);
        };

The structure d_funct has two fields, both of which hold pointers to functions taking two double arguments; the function types differ in their return type.

3.5.5 Type names

Empty parentheses within a type name are always taken as meaning function with unspecified arguments and never as (unnecessary) parentheses around the elided identifier. This specification avoids an ambiguity by fiat.

3.5.6 Type definitions

A typedef may only be redeclared in an inner block with a declaration that explicitly contains a type name. This rule avoids the ambiguity about whether to take the typedef as the type name or the candidate for redeclaration.

Some implementations of C have allowed type specifiers to be added to a type defined using typedef. Thus

        typedef short int small ;
        unsigned small x ;

would give x the type unsigned short int. The Committee decided that since this interpretation may be difficult to provide in many implementations, and since it defeats much of the utility of typedef as a data abstraction mechanism, such type modifications are invalid. This decision is incorporated in the rules of §3.5.2.

A proposed typeof operator was rejected on the grounds of insufficient utility.

3.5.7 Initialization

An implementation might conceivably have codes for floating zero and/or null pointer other than all bits zero. In such a case, the implementation must fill out an incomplete initializer with the various appropriate representations of zero; it may not just fill the area with zero bytes.

The Committee considered proposals for permitting automatic aggregate initializers to consist of a brace-enclosed series of arbitrary (execute-time) expressions, instead of just those usable for a translate-time static initializer. However, cases like this were troubling:

        int x[2] = { f(x[1]), g(x[0]) };

Rather than determine a set of rules which would avoid pathological cases and yet not seem too arbitrary, the Committee elected to permit only static initializers. Consequently, an implementation may choose to build a hidden static aggregate, using the same machinery as for other aggregate initializers, then copy that aggregate to the automatic variable upon block entry.

A structure expression, such as a call to a function returning the appropriate structure type, is permitted as an automatic structure initializer, since the usage seems unproblematic.

For programmer convenience, even though it is a minor irregularity in initializer semantics, the trailing null character in a string literal need not initialize an array element, as in:

        char mesg[5] = "help!" ;

(Some widely used implementations provide precedent.)

The Base Document allows a trailing comma in an initializer at the end of an initializer-list. The Standard has retained this syntax, since it provides flexibility in adding or deleting members from an initializer list, and simplifies machine generation of such lists.

Various implementations have parsed aggregate initializers with partially elided braces differently. The Standard has reaffirmed the (top-down) parse described in the Base Document. Although the construct is allowed, and its parse well defined, the Committee urges programmers to avoid partially elided initializers: such initializations can be quite confusing to read.

QUIET CHANGE

The Committee has adopted the rule (already used successfully in some implementations) that the first member of the union is the candidate for initialization. Other notations for union initialization were considered, but none seemed of sufficient merit to outweigh the lack of prior art.

This rule has a parallel with the initialization of structures. Members of structures are initialized in the sequence in which they are declared. The same can now be said of unions, with the significant difference that only one union member (the first) can be initialized.

3.4 Constant Expressions

ANSI C Rationale

3.6 Statements Index