Miscellaneous

Section 17. Miscellaneous

17.1: What can I safely assume about the initial values of variables which are not explicitly initialized? If global variables start out as "zero," is that good enough for null pointers and floating-point zeroes?

Variables with "static" duration (that is, those declared outside of functions, and those declared with the storage class static), are guaranteed initialized to zero, as if the programmer had typed "= 0". Therefore, such variables are initialized to the null pointer (of the correct type; see also Section 1) if they are pointers, and to 0.0 if they are floating-point.

Variables with "automatic" duration (i.e. local variables without the static storage class) start out containing garbage, unless they are explicitly initialized. Nothing useful can be predicted about the garbage.

Dynamically-allocated memory obtained with malloc and realloc is also likely to contain garbage, and must be initialized by the calling program, as appropriate. Memory obtained with calloc contains all-bits-0, but this is not necessarily useful for pointer or floating-point values (see question 3.13, and section 1).

17.2: This code, straight out of a book, isn't compiling:

        f()
        {
        char a[] = "Hello, world!";
        }

Perhaps you have a pre-ANSI compiler, which doesn't allow initialization of "automatic aggregates" (i.e. non-static local arrays and structures). As a workaround, you can make the array global or static, and initialize it with strcpy when f is called. (You can always initialize local char * variables with string literals, but see question 17.20). See also questions 5.16 and 5.17.

17.3: How can I write data files which can be read on other machines with different word size, byte order, or floating point formats?

The best solution is to use text files (usually ASCII), written with fprintf and read with fscanf or the like. (Similar advice also applies to network protocols.) Be skeptical of arguments which imply that text files are too big, or that reading and writing them is too slow. Not only is their efficiency frequently acceptable in practice, but the advantages of being able to manipulate them with standard tools can be overwhelming.

If you must use a binary format, you can improve portability, and perhaps take advantage of prewritten I/O libraries, by making use of standardized formats such as Sun's XDR (RFC 1014), OSI's ASN.1, CCITT's X.409, or ISO 8825 "Basic Encoding Rules." See also question 9.11.

17.4: How can I insert or delete a line (or record) from the middle of a file?

Short of rewriting the file, you probably can't. See also question 16.9.

17.5: How can I return several values from a function?

Either pass pointers to locations which the function can fill in, or have the function return a structure containing the desired values, or (in a pinch) consider global variables. See also questions 2.17, 3.4, and 9.2.

17.6: If I have a `char` `*` variable pointing to the name of a function as a string, how can I call that function?

The most straightforward thing to do is maintain a correspondence table of names and function pointers:

	int function1(), function2();

	struct {char *name; int (*funcptr)(); } symtab[] =
		{
		"function1",	function1,
		"function2",	function2,
		};

Then, just search the table for the name, and call through the associated function pointer. See also questions 9.9 and 16.11.

17.7: I seem to be missing the system header file `<sgtty.h>`. Can someone send me a copy?

Standard headers exist in part so that definitions appropriate to your compiler, operating system, and processor can be supplied. You cannot just pick up a copy of someone else's header file and expect it to work, unless that person is using exactly the same environment. Ask your compiler vendor why the file was not provided (or to send a replacement copy).

17.8: How can I call FORTRAN (C++, BASIC, Pascal, Ada, LISP) functions from C? (And vice versa?)

The answer is entirely dependent on the machine and the specific calling sequences of the various compilers in use, and may not be possible at all. Read your compiler documentation very carefully; sometimes there is a "mixed-language programming guide," although the techniques for passing arguments and ensuring correct run-time startup are often arcane. More information may be found in FORT.gz by Glenn Geers, available via anonymous ftp from suphys.physics.su.oz.au in the src directory.

cfortran.h, a C header file, simplifies C/FORTRAN interfacing on many popular machines. It is available via anonymous ftp from zebra.desy.de (131.169.2.244).

In C++, a "C" modifier in an external function declaration indicates that the function is to be called using C calling conventions.

17.9: Does anyone know of a program for converting Pascal or FORTRAN (or LISP, Ada, awk, "Old" C, ...) to C?

Several public-domain programs are available:

p2c: A Pascal to C converter written by Dave Gillespie, posted to comp.sources.unix in March, 1990 (Volume 21); also available by anonymous ftp from csvax.cs.caltech.edu, file pub/p2c-1.20.tar.Z .
ptoc: Another Pascal to C converter, this one written in Pascal (comp.sources.unix, Volume 10, also patches in Volume 13?).
f2c: A Fortran to C converter jointly developed by people from Bell Labs, Bellcore, and Carnegie Mellon. To find about f2c, send the mail message "send index from f2c" to netlib@research.att.com or research!netlib. (It is also available via anonymous ftp on netlib.att.com, in directory netlib/f2c.)

This FAQ list's maintainer also has available a list of other commercial translation products, and some for more obscure languages.

17.10: Is C++ a superset of C? Can I use a C++ compiler to compile C code?

C++ was derived from C, and is largely based on it, but there are some legal C constructs which are not legal C++. (Many C programs will nevertheless compile correctly in a C++ environment.)

17.11: I need...

Look for programs (see also question 17.12) named:

...a C cross-reference generator: cflow, calls, cscope
...a C beautifier/pretty-printer: cb, indent

17.12: Where can I get copies of all these public-domain programs?

If you have access to Usenet, see the regular postings in the comp.sources.unix and comp.sources.misc newsgroups, which describe, in some detail, the archiving policies and how to retrieve copies. The usual approach is to use anonymous ftp and/or uucp from a central, public-spirited site, such as uunet (ftp.uu.net, 192.48.96.9). However, this article cannot track or list all of the available archive sites and how to access them.

Ajay Shah maintains an index of free numerical software; it is posted periodically, and available where this FAQ list is archived (see question 17.33). The comp.archives newsgroup contains numerous announcements of anonymous ftp availability of various items. The "archie" mailserver can tell you which anonymous ftp sites have which packages; send the mail message "help" to archie@quiche.cs.mcgill.ca for information. Finally, the newsgroup comp.sources.wanted is generally a more appropriate place to post queries for source availability, but check its FAQ list, "How to find sources," before posting there.

17.13: When will the next International Obfuscated C Code Contest (IOCCC) be held? How can I get a copy of the current and previous winning entries?

The contest schedule is tied to the dates of the USENIX conferences at which the winners are announced. At the time of this writing, it is expected that the yearly contest will open in October. To obtain a current copy of the rules and guidelines, send e-mail with the Subject: line "send rules" to:

{apple,pyramid,sun,uunet}!hoptoad!judges or
judges@toad.com

(Note that these are not the addresses for submitting entries.)

Contest winners should be announced at the winter USENIX conference in January, and are posted to the net sometime thereafter. Winning entries from previous years (to 1984) are archived at uunet (see question 17.12) under the directory ~/pub/ioccc.

As a last resort, previous winners may be obtained by sending e-mail to the above address, using the Subject: "send YEAR winners", where YEAR is a single four-digit year, a year range, or "all".

17.14: Why don't C comments nest? How am I supposed to comment out code containing comments? Are comments legal inside quoted strings?

Nested comments would cause more harm than good, mostly because of the possibility of accidentally leaving comments unclosed by including the characters "/*" within them. For this reason, it is usually better to "comment out" large sections of code, which might contain comments, with #ifdef or #if 0 (but see question 5.11).

The character sequences /* and */ are not special within double-quoted strings, and do not therefore introduce comments, because a program (particularly one which is generating C code as output) might want to print them.

References: ANSI Appendix E p. 198, Rationale Sec. 3.1.9 p. 33.

17.15: How can I get the ASCII value corresponding to a character, or vice versa?

In C, characters are represented by small integers corresponding to their values (in the machine's character set) so you don't need a conversion routine: if you have the character, you have its value.

17.16: How can I implement sets and/or arrays of bits?

Use arrays of char or int, with a few macros to access the right bit at the right index (try using 8 for CHAR_BIT if you don't have <limits.h>):

	#include <limits.h>		/* for CHAR_BIT */

	#define BITMASK(bit) (1 << ((bit) % CHAR_BIT))
	#define BITSLOT(bit) ((bit) / CHAR_BIT)
	#define BITSET(ary, bit) ((ary)[BITSLOT(bit)] |= BITMASK(bit))
	#define BITTEST(ary, bit) ((ary)[BITSLOT(bit)] & BITMASK(bit))

17.17: What is the most efficient way to count the number of bits which are set in a value?

This and many other similar bit-twiddling problems can often be sped up and streamlined using lookup tables (but see the next question).

17.18: How can I make this code more efficient?

Efficiency, though a favorite comp.lang.c topic, is not important nearly as often as people tend to think it is. Most of the code in most programs is not time-critical. When code is not time-critical, it is far more important that it be written clearly and portably than that it be written maximally efficiently. (Remember that computers are very, very fast, and that even "inefficient" code can run without apparent delay.)

It is notoriously difficult to predict what the "hot spots" in a program will be. When efficiency is a concern, it is important to use profiling software to determine which parts of the program deserve attention. Often, actual computation time is swamped by peripheral tasks such as I/O and memory allocation, which can be sped up by using buffering and caching techniques.

For the small fraction of code that is time-critical, it is vital to pick a good algorithm; it is less important to "microoptimize" the coding details. Many of the "efficient coding tricks" which are frequently suggested (e.g. substituting shift operators for multiplication by powers of two) are performed automatically by even simpleminded compilers. Heavyhanded "optimization" attempts can make code so bulky that performance is degraded.

For more discussion of efficiency tradeoffs, as well as good advice on how to increase efficiency when it is important, see chapter 7 of Kernighan and Plauger's The Elements of Programming Style, and Jon Bentley's Writing Efficient Programs.

17.19: Are pointers really faster than arrays? How much do function calls slow things down? Is `++i` faster than `i = i + 1` ?

Precise answers to these and many similar questions depend of course on the processor and compiler in use. If you simply must know, you'll have to time test programs carefully. (Often the differences are so slight that hundreds of thousands of iterations are required even to see them. Check the compiler's assembly language output, if available, to see if two purported alternatives aren't compiled identically.)

It is "usually" faster to march through large arrays with pointers rather than array subscripts, but for some processors the reverse is true.

Function calls, though obviously incrementally slower than in-line code, contribute so much to modularity and code clarity that there is rarely good reason to avoid them.

Before rearranging expressions such as i = i + 1, remember that you are dealing with a C compiler, not a keystroke-programmable calculator. Any decent compiler will generate identical code for ++i, i += 1, and i = i + 1. The reasons for using ++i or i += 1 over i = i + 1 have to do with style, not efficiency. (See also question 4.7.)

17.20: Why does this code:

        char *p = "Hello, world!";
        p[0] = tolower(p[0]);

crash?

String literals are not necessarily modifiable, except (in effect) when they are used as array initializers. Try

        char a[] = "Hello, world!";

(For compiling old code, some compilers have a switch controlling whether strings are writable or not.) See also questions 2.1, 2.2, 2.8, and 17.2.

References: ANSI Sec. 3.1.4 .

17.21: This program crashes before it even runs! (When single-stepping with a debugger, it dies before the first statement in `main`.)

You probably have one or more very large (kilobyte or more) local arrays. Many systems have fixed-size stacks, and those which perform dynamic stack allocation automatically (e.g. Unix) can be confused when the stack tries to grow by a huge chunk all at once.

It is often better to declare large arrays with static duration (unless of course you need a fresh set with each recursive call).

(See also question 9.4.)

17.22: What do "Segmentation violation" and "Bus error" mean?

These generally mean that your program tried to access memory it shouldn't have, invariably as a result of improper pointer use, often involving uninitialized or improperly allocated pointers (see questions 3.1 and 3.2), or malloc (see question 17.23) or perhaps scanf (see question 11.3).

17.23: My program is crashing, apparently somewhere down inside `malloc`, but I can't see anything wrong with it.

It is unfortunately very easy to corrupt malloc's internal data structures, and the resulting problems can be hard to track down. The most common source of problems is writing more to a malloc'ed region than it was allocated to hold; a particularly common bug is to malloc(strlen(s)) instead of strlen(s) + 1. Other problems involve freeing pointers not obtained from malloc, or trying to realloc a null pointer (see question 3.12).

A number of debugging packages exist to help track down malloc problems; one popular one is Conor P. Cahill's "dbmalloc", posted to comp.sources.misc in September of 1992. Others are "leak," available in volume 27 of the comp.sources.unix archives; JMalloc.c and JMalloc.h in Fidonet's C_ECHO Snippets (or ask archie; see question 17.12); and MEMDEBUG from ftp.crpht.lu in pub/sources/memdebug . See also question 17.12.

17.24: Does anyone have a C compiler test suite I can use?

Plum Hall (formerly in Cardiff, NJ; now in Hawaii) sells one. The FSF's GNU C (gcc) distribution includes a c-torture-test.tar.Z which checks a number of common problems with compilers. Kahan's paranoia test, found in netlib/paranoia on netlib.att.com, strenuously tests a C implementation's floating point capabilities.

17.25: Where can I get a YACC grammar for C?

The definitive grammar is of course the one in the ANSI standard. A fleshed-out, working instance of the ANSI grammar (due to Jeff Lee) is on uunet (see question 17.12) in usenet/net.sources/ansi.c.grammar.Z(including a companion lexer). The FSF's GNU C compiler contains a grammar, as does the appendix to K&R II.

References: ANSI Sec. A.2 .

17.26: I need code to parse and evaluate expressions.

Two available packages are "defunc," posted to comp.source.misc in December, 1993 (V41 i32,33), to alt.sources in January, 1994, and available from sunsite.unc.edu in pub/packages/development/libraries/defunc-1.3.tar.Z; and "parse," at lamont.ldgo.columbia.edu.

17.27: I need a sort of an "approximate" `strcmp` routine, for comparing two strings for close, but not necessarily exact, equality.

The traditional routine for doing this sort of thing involves the "soundex" algorithm, which maps similar-sounding words to the same numeric codes. Soundex is described in the Searching and Sorting volume of Donald Knuth's classic The Art of Computer Programming.

17.28: How can I find the day of the week given the date?

Use mktime (see questions 12.6 and 12.7), or Zeller's congruence, or see the sci.math FAQ list, or try this code posted by Tomohiko Sakamoto:

    dayofweek(y, m, d)      /* 0 = Sunday */
    int y, m, d;            /* 1 <= m <= 12,  y > 1752 or so */
    {
            static int t[] = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4};
            y -= m < 3;
            return (y + y/4 - y/100 + y/400 + t[m-1] + d) % 7;
    }

17.29: Will 2000 be a leap year? Is `(year % 4 == 0)` an accurate test for leap years?

Yes and no, respectively. The full expression for the Gregorian calendar is

        year % 4 == 0 && (year % 100 != 0 || year % 400 == 0)

See a good astronomical almanac or other reference for details.

17.30: How do you pronounce "char"?

You can pronounce the C keyword "char" in at least three ways: like the English words "char," "care," or "car;" the choice is arbitrary.

17.31: What's a good book for learning C?

Mitch Wright maintains an annotated bibliography of C and Unix books; it is available for anonymous ftp from ftp.rahul.net in directory pub/mitch/YABL.

This FAQ list's editor maintains a collection of previous answers to this question, which is available upon request.

17.32: Are there any C tutorials on the net?

There are at least two of them:

"Notes for C programmers," by Christopher Sawtell, available from:

svr-ftp.eng.cam.ac.uk:misc/sawtell_C.shar
garbo.uwasa.fi:/pc/c-lang/c-lesson.zip
paris7.jussieu.fr:/contributions/docs

Tim Love's "C for Programmers," available from svr-ftp.eng.cam.ac.uk in the misc directory.

17.33: Where can I get extra copies of this list? What about back issues?

For now, just pull it off the net; it is normally posted to comp.lang.c on the first of each month, with an Expires: line which should keep it around all month. An abridged version is also available (and posted), as is a list of changes accompanying each significantly updated version. These lists can also be found in the newsgroups comp.answers and news.answers . Several sites archive news.answers postings and other FAQ lists, including this one; two sites are rtfm.mit.edu (directory pub/usenet), and ftp.uu.net (directory usenet). The archie server should help you find others; query it for "prog C-faq". See the meta-FAQ list in news.answers for more information; see also question 17.12.

This list is an evolving document of questions which have been Frequent since before the Great Renaming, not just a collection of this month's interesting questions. Older copies are obsolete and don't contain much, except the occasional typo, that the current list doesn't.

Section 17. Miscellaneous

17.1: What can I safely assume about the initial values of variables which are not explicitly initialized? If global variables start out as "zero," is that good enough for null pointers and floating-point zeroes?

17.6: If I have a char * variable pointing to the name of a function as a string, how can I call that function?

17.7: I seem to be missing the system header file <sgtty.h>. Can someone send me a copy?