4.3 Character Handling  -< ANSI C Rationale  -> 4.5 Mathematics              Index 

4.4  Localization  <locale.h>

C has become an international language.  Users of the language outside the United States have been forced to deal with the various Americanisms built into the standard library routines. 

Areas affected by international considerations include:

Alphabet.
The English language uses 26 letters derived from the Latin alphabet.  This set of letters suffices for English, Swahili, and Hawaiian; all other living languages use either the Latin alphabet plus other characters, or other, non-Latin alphabets or syllabaries. 

In English, each letter has an upper-case and lower-case form.  The German ``sharp S'', ß, occurs only in lower-case.  European French usually omits diacriticals on upper-case letters.  Some languages do not have the concept of two cases. 

Collation.
In both EBCDIC and ASCII the code for `z' is greater than the code for `a', and so on for other letters in the alphabet, so a ``machine sort'' gives not unreasonable results for ordering strings.  In contrast, most European languages use a codeset resembling ASCII in which some of the codes used in ASCII for punctuation characters are used for alphabetic characters. (See §2.2.1.)  The ordering of these codes is not alphabetic.  In some languages letters with diacritics sort as separate letters; in others they should be collated just as the unmarked form.  In Spanish, ``ll'' sorts as a single letter following ``l''; in German, ``ß'' sorts like ``ss''. 

Formatting of numbers and currency amounts.
In the United States the period is invariably used for the decimal point; this usage was built into the definitions of such functions as printf and scanf Prevalent practice in several major European countries is to use a comma; a raised dot is employed in some locales.  Similarly, in the United States a comma is used to separate groups of three digits to the left of the decimal point; a period is common in Europe, and in some countries digits are not grouped by threes. In printing currency amounts, the currency symbol (which may be more than one character) may precede, follow, or be embedded in the digits. 

Date and time.
The standard function asctime returns a string which includes abbreviations for month and weekday names, and returns the various elements in a format which might be considered unusual even in its country of origin. 

Various common date formats include

    1776-07-04                 ISO Format
    4.7.76                     customary central
                               European and British usage
    7/4/76                     customary U.S. usage
    4.VII.76                   Italian usage
    76186                      Julian date (YYDDD)
    04JUL76                    airline usage
    Thursday, July 4, 1776     full U.S. format
    Donnerstag, 4. Juli 1776   full German format
Time formats are also quite diverse:
    3:30 PM                    customary U.S. and British format
    1530                       U.S. military format
    15h.30                     Italian usage
    15.30                      German usage
    15:30                      common European usage
The Committee has introduced mechanisms into the C library to allow these and other issues to be treated in the appropriate locale-specific manner. 

The localization features of the Standard are based on these principles:

English for C source.
The C language proper is based on English.  Keywords are based on English words.  A program which uses ``national characters'' in identifiers is not strictly conforming.  (Use of national characters in comments is strictly conforming, though what happens when such a program is printed in a different locale is unspecified.)  The decimal point must be a period in C source, and no thousands delimiter may be used. 
Runtime selectability.
The locale must be selectable at runtime, from an implementation-defined set of possibilities.  Translate-time selection does not offer sufficient flexibility.  Software vendors do not want to supply different object forms of their programs in different locales.  Users do not want to use different versions of a program just because they deal with several different locales. 
Function interface.
Locale is changed by calling a function, thus allowing the implementation to recognize the change, rather than by, say, changing a memory location that contains the decimal point character. 
Immediate effect.
When a new locale is selected, affected functions reflect the change immediately.  (This is not meant to imply if a signal-handling function were to change the selected locale and return to a library function, that the return value from that library function must be completely correct with respect to the new locale.) 

4.4.1  Locale control

4.4.1.1  The setlocale function

setlocale provides the mechanism for controlling locale-specific features of the library.  The category argument allows parts of the library to be localized as necessary without changing the entire locale-specific environment.  Specifying the locale argument as a string gives an implementation maximum flexibility in providing a set of locales.  For instance, an implementation could map the argument string into the name of a file containing appropriate localization parameters --- these files could then be added and modified without requiring any recompilation of a localizable program. 

4.4.2  Numeric formatting convention inquiry

4.4.2.1  The localeconv function

The localeconv function gives a programmer access to information about how to format numeric quantities (monetary or otherwise).  This sort of interface was considered preferable to defining conversion functions directly: even with a specified locale, the set of distinct formats that can be constructed from these elements is large, and the ones desired very application-dependent. 


4.3 Character Handling  -< ANSI C Rationale  -> 4.5 Mathematics              Index