[FrontPage] [TitleIndex] [WordIndex

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

1. Integer types

In order to declare a variable, you have to specify a type, which controls both how much space the variable takes up and how the bits stored within it are interpreted in arithmetic operators.

The standard C integer types are:1

Name

Typical size

Signed by default?

char

8 bits

Unspecified

short

16 bits

Yes

int

32 bits

Yes

long

32 bits

Yes

long long

64 bits

Yes

The typical size is for 32-bit architectures like the Intel i386. Some 64-bit machines might have 64-bit ints and longs, and some prehistoric computers had 16-bit ints. Particularly bizarre architectures might have even wilder bit sizes, but you are not likely to see this unless you program vintage 1970s supercomputers. Some compilers also support a long long type that is usually twice the length of a long (e.g. 64 bits on i386 machines); this may or may not be available if you insist on following the ANSI specification strictly. The general convention is that int is the most convenient size for whatever computer you are using and should be used by default.

Whether a variable is signed or not controls how its values are interpreted. In signed integers, the first bit is the sign bit and the rest are the value in 2's complement notation; so for example a signed char with bit pattern 11111111 would be interpreted as the numerical value -1 while an unsigned char with the same bit pattern would be 255. Most integer types are signed unless otherwise specified; an n-bit integer type has a range from -2n-1 to 2n-1-1 (e.g. -32768 to 32767 for a short.) Unsigned variables, which can be declared by putting the keyword unsigned before the type, have a range from 0 to 2n-1 (e.g. 0 to 65535 for an unsigned short).

For chars, whether the character is signed (-128..127) or unsigned (0..255) is at the whim of the compiler. If it matters, declare your variables as signed char or unsigned char. For storing actual characters that you aren't doing arithmetic on, it shouldn't matter.

1.1. C99 fixed-width types

C99 provides a stdint.h header file that defines integer types with known size independent of the machine architecture. So in C99, you can use int8_t instead of signed char to guarantee a signed type that holds exactly 8 bits, or uint64_t instead of unsigned long long to get a 64-bit unsigned integer type. The full set of types typically defined are int8_t, int16_t, int32_t, and int64_t for signed integers and the same starting with uint for signed integers. There are also types for integers that contain the fewest number of bits greater than some minimum (e.g., int_least16_t is a signed type with at least 16 bits, chosen to minimize space) or that are the fastest type with at least the given number of bits (e.g., int_fast16_t is a signed type with at least 16 bits, chosen to minimize time).

These are all defined using typedef; the main advantage of using stdint.h over defining them yourself is that if somebody ports your code to a new architecture, stdint.h should take care of choosing the right types automatically. The disadvantage is that, like many C99 features, stdint.h is not universally available on all C compilers.

If you need to print types defined in stdint.h, the larger inttypes.h header defines macros that give the corresponding format strings for printf.

2. Integer constants

Constant integer values in C can be written in any of four different ways:

Except for character constants, you can insist that an integer constant is unsigned or long by putting a u or l after it. So 1ul is an unsigned long version of 1. By default integer constants are (signed) ints. For long long constants, use ll, e.g., the unsigned long long constant 0xdeadbeef01234567ull. It is also permitted to write the l as L, which can be less confusing if the l looks too much like a 1.

3. Integer operators

3.1. Arithmetic operators

The usual + (addition), - (negation or subtraction), and * (multiplication) operators work on integers pretty much the way you'd expect. The only caveat is that if the result lies outside of the range of whatever variable you are storing it in, it will be truncated instead of causing an error:

   1     unsigned char c;
   2 
   3     c = -1;             /* sets c = 255 */
   4     c = 255 + 255;      /* sets c = 254 */
   5     c = 256 * 1772717;  /* sets c = 0 */

This can be a source of subtle bugs if you aren't careful. The usual giveaway is that values you thought should be large positive integers come back as random-looking negative integers.

Division (/) of two integers also truncates: 2/3 is 0, 5/3 is 1, etc. For positive integers it will always round down.

Prior to C99, if either the numerator or denominator is negative, the behavior was unpredictable and depended on what your processor does---in practice this meant you should never use / if one or both arguments might be negative. The C99 standard specified that integer division always removes the fractional part, effectively rounding toward 0; so (-3)/2 is -1, 3/-2 is -1, and (-3)/-2 is 1.

There is also a remainder operator % with e.g. 2%3 = 2, 5%3 = 2, 27 % 2 = 1, etc. The sign of the modulus is ignored, so 2%-3 is also 2. The sign of the dividend carries over to the remainder: (-3)%2 and (-3)%(-2) are both 1. The reason for this rule is that it guarantees that y == x*(y/x) + y%x is always true.

3.2. Bitwise operators

In addition to the arithmetic operators, integer types support bitwise logical operators that apply some Boolean operation to all the bits of their arguments in parallel. What this means is that the i-th bit of the output is equal to some operation applied to the i-th bit(s) of the input(s). The bitwise logical operators are ~ (bitwise negation: used with one argument as in ~0 for the all-1's binary value), & (bitwise AND), '|' (bitwise OR), and '^' (bitwise XOR, i.e. sum mod 2). These are mostly used for manipulating individual bits or small groups of bits inside larger words, as in the expression x & 0x0f, which strips off the bottom four bits stored in x.

Examples:

x

y

expression

value

0011

0101

x&y

0001

0011

0101

x|y

0111

0011

0101

x^y

0101

0011

0101

~x

1100

The shift operators << and >> shift the bit sequence left or right: x << y produces the value x⋅2y (ignoring overflow); this is equivalent to shifting every bit in x y positions to the left and filling in y zeros for the missing positions. In the other direction, x >> y produces the value ⌊x⋅2-y, by shifting x y positions to the right. The behavior of the right shift operator depends on whether x is unsigned or signed; for unsigned values, it shifts in zeros from the left end always; for signed values, it shifts in additional copies of the leftmost bit (the sign bit). This makes x >> y have the same sign as x if x is signed.

If y is negative, it reverses the direction of the shift; so x << -2 is equivalent to x >> 2.

Examples (unsigned char x):

x

y

x << y

x >> y

00000001

1

00000010

00000000

11111111

3

11111000

00011111

10111001

-2

00101110

11100100

Examples (signed char x):

x

y

x << y

x >> y

00000001

1

00000010

00000000

11111111

3

11111000

11111111

10111001

-2

11101110

11100100

Shift operators are often used with bitwise logical operators to set or extract individual bits in an integer value. The trick is that (1 << i) contains a 1 in the i-th least significant bit and zeros everywhere else. So x & (1<<i) is nonzero if and only if x has a 1 in the i-th place. This can be used to print out an integer in binary format (which standard printf won't do):

   1 void
   2 print_binary(unsigned int n)
   3 {
   4     unsigned int mask = 0;
   5 
   6     /* this grotesque hack creates a bit pattern 1000... */
   7     /* regardless of the size of an unsigned int */
   8     mask = ~mask ^ (~mask >> 1);
   9 
  10     for(; mask != 0; mask >>= 1) {
  11         putchar((n & mask) ? '1' : '0');
  12     }
  13 }

(See test_print_binary.c for a program that uses this.)

In the other direction, we can set the i-th bit of x to 1 by doing x | (1 << i) or to 0 by doing x & ~(1 << i). See C/BitExtraction for applications of this to build arbitrarily-large bit vectors.

3.3. Logical operators

To add to the confusion, there are also three logical operators that work on the truth-values of integers, where 0 is defined to be false and anything else is defined by be true. These are && (logical AND), ||, (logical OR), and ! (logical NOT). The result of any of these operators is always 0 or 1 (so !!x, for example, is 0 if x is 0 and 1 if x is anything else). The && and || operators evaluate their arguments left-to-right and ignore the second argument if the first determines the answer (this is the only place in C where argument evaluation order is specified); so

   1     0 && execute_programmer();
   2     1 || execute_programmer();

is in a very weak sense perfectly safe code to run.

Watch out for confusing & with &&. The expression 1 & 2 evaluates to 0, but 1 && 2 evaluates to 1. The statement 0 & execute_programmer(); is also unlikely to do what you want.

Yet another logical operator is the ternary operator ?:, where x ? y : z equals the value of y if x is nonzero and z if x is zero. Like && and ||, it only evaluates the arguments it needs to:

   1     fileExists(badFile) ? deleteFile(badFile) : createFile(badFile);

Most uses of ?: are better done using an if-then-else statement (C/Statements).

3.4. Relational operators

Logical operators usually operate on the results of relational operators or comparisons: these are == (equality), != (inequality), < (less than), > (greater than), <= (less than or equal to) and >= (greater than or equal to). So, for example,

    if(size >= MIN_SIZE && size <= MAX_SIZE) {
        puts("just right");
    }

tests if size is in the (inclusive) range [MIN_SIZE..MAX_SIZE].

Beware of confusing == with =. The code

   1     /* DANGER! DANGER! DANGER! */
   2     if(x = 5) {
   3         ...

is perfectly legal C, and will set x to 5 rather than testing if it's equal to 5. Because 5 happens to be nonzero, the body of the if statement will always be executed. This error is so common and so dangerous that gcc will warn you about any tests that look like this if you use the -Wall option. Some programmers will go so far as to write the test as 5 == x just so that if their finger slips, they will get a syntax error on 5 = x even without special compiler support.

4. Input and output

To input or output integer values, you will need to convert them from or to strings. Converting from a string is easy using the atoi or atol functions declared in stdlib.h; these take a string as an argument and return an int or long, respectively.3

Output is usually done using printf (or sprintf if you want to write to a string without producing output). Use the %d format specifier for ints, shorts, and chars that you want the numeric value of, %ld for longs, and %lld for long longs.

A contrived program that uses all of these features is given below:

   1 #include <stdio.h>
   2 #include <stdlib.h>
   3 
   4 /* This program can be used to how atoi etc. handle overflow. */
   5 /* For example, try "overflow 1000000000000". */
   6 int
   7 main(int argc, char **argv)
   8 {
   9     char c;
  10     int i;
  11     long l;
  12     long long ll;
  13     
  14     if(argc != 2) {
  15         fprintf(stderr, "Usage: %s n\n", argv[0]);
  16         return 1;
  17     }
  18     
  19     c = atoi(argv[1]);
  20     i = atoi(argv[1]);
  21     l = atol(argv[1]);
  22     ll = atoll(argv[1]);
  23 
  24     printf("char: %d  int: %d  long: %ld  long long: %lld", c, i, l, ll);
  25 
  26     return 0;
  27 }
overflow.c

5. Alignment

Modern CPU architectures typically enforce alignment restrictions on multi-byte values, which mean that the address of an int or long typically has to be a multiple of 4. This is an effect of the memory being organized as groups of 32 bits that are written in parallel instead of 8 bits. Such restrictions are not obvious when working with integer-valued variables directly, but will come up when we talk about pointers in C/Pointers.


CategoryProgrammingNotes

  1. The long long type wasn't added to the language officially until C99, but was supported by most compilers anyway. (1)

  2. Certain ancient versions of C ran on machines with a different character set encoding, like EBCDIC. The C standard does not guarantee ASCII encoding. (2)

  3. C99 also provides atoll for converting to long long. (3)


2014-06-17 11:57