[FrontPage] [TitleIndex] [WordIndex

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

Congratulations! You now know everything there is to know about programming in C. Now what do you do?

My recommendation would be the following: learn C++, since you know 75% of it already, and you will be able to escape from some (but not all) of the annoying limitations of C. And learn a scripting language you can be comfortable with, for writing programs quickly where performance isn't the main requirement.

1. What's wrong with C

During CS223, we had to work around fundamental limitations in C on several occasions.

C doesn't have a garbage collector

Many modern program languages will detect and free unreachable data for you automatically. C doesn't, so the programmer has to spend a lot of time worrying about when and by whom data allocated with malloc will be passed to free. Not only does this create many possibilities for error, but it also means that certain kinds of data structures in which a single component of the data structure is pointed to by an unpredictable number of other components are difficult to write in C, since it's hard to tell when it is safe to free a component. Garbage-collected languages avoid all of these problems at a slight cost in performance. Though there exists a garbage collector for C/C++ http://www.hpl.hp.com/personal/Hans_Boehm/gc/, it isn't 100% portable and may not work as well as a built-in collector.

C doesn't support any kind of polymorphism

Polymorphism is when a function can work on more than one data type. The closest C can do is either parameterized macros (see C/Macros), heavy use of void * and function pointers as in qsort, or various nasty hacks where code is automatically generated with type names filled in from a base template (see C/Macros again). Most modern programming languages have some sort of support for polymorphism, allowing you to write, for example, a generic sorting routine without resorting to void *-like departures from the type system.

C doesn't have exceptions
Exceptions are a mechanism for doing non-standard returns from a function when something blows up, which get caught using an "exception handler" that is often separate from the code handling the normal return values and which can often be used to catch exceptions from a variety of sources. Instead, C requires function writers to invent and document an ad-hoc protocol for indicating bad outcomes for every function they write, and requires function users to remember to test for bad return values. Most programmers are too lazy to do this all the time, leading to undetected run-time errors. Most modern programming languages fix this.
C doesn't support object-oriented programming very well

"Object-oriented" is a buzzword with many possible meanings (but see http://c2.com/cgi/wiki?HeInventedTheTerm). However, at minimum it means that in addition to supporting polymorphism (described above), your language should support strong encapsulation (controlling who can get at the internal data of an object) and inheritance (allowing one abstract data type to be defined by extending another). You can fake most of these things in C if you try hard enough (see C/FunctionPointers for some examples), but it is always possible to muck around with internal bits of things just because of the unlimited control C gives you over the environment. This can quickly become dangerous in large software projects.

C provides only limited support for avoiding namespace collisions

In a large C program, it's impossible to guarantee that my eat_leftovers function exported from leftovers.c doesn't conflict with your eat_leftovers function in cannibalism.c. A mediocre solution is to use longer names: leftovers_eat_leftovers vs cannibalism_eat_leftovers, and one can also play games with function pointers and global struct variables to allow something like leftovers.eat_leftovers vs cannibalism.eat_leftovers. Most modern programming languages provide an explicit package or namespace mechanism to allow the programmer to control who sees what names where.

2. What C++ fixes

On the above list, C++ fixes everything except the missing garbage collector. If you want to learn C++, you should get a copy of The C++ Programming Language, by Bjarne Stroustrup, which is the definitive reference manual. But you can get a taste of it from several on-line tutorials:

3. Other C-like languages to consider

C syntax has become the default for new programming languages targeted at a general audience. So if you haven't run into them already, you should check out Java (http://java.sun.com/) and/or C# (http://msdn/microsoft.com/vcsharp), both of which fix the many misfeatures of C (including the lack of a garbage collector and bounds checks on arrays) while retaining much of the flavor of C.

4. Scripting languages

Much current programming is done in so-called scripting languages like Perl, Python, PHP, Visual Basic, Tcl, etc. These are generally interpreted languages similar to Lisp or Scheme under the hood, with dynamic typing (type information is carried along with values, so type errors are detected only at runtime but polymorphism is provided automatically), garbage collectors, and support for many advanced programming features like objects and anonymous functions. What distinguishes scripting languages from the Lisp-like languages is that the syntax is generally more accessible to newcomers and the language runtime usually comes with very large libraries providing built-in tools for doing practical programming tasks like parsing odd input formats and interfacing to databases and network services. The result is that common programming tasks can be implemented using very few lines of code, at a slight cost in performance.

Let's look at an example in two common scripting languages, Perl and Python.

Here's a solution to CS223/Assignments/HW02, which asks you to find all the palindromes on stdin and report the first non-matching character for any non-palindrome. This version is written in Perl (http://www.perl.org):

# For each line in stdin, print PALINDROME if it is a palindrome, or index of
# the first non-matching character otherwise.

while(<>) {
    chomp;              # remove trailing newline
    if($_ eq reverse $_) {
        print "PALINDROME\n";
    } else {
        for $i (0..length($_) - 1) {
            if(substr($_, $i, 1) ne substr($_, length($_) - $i - 1, 1)) {
                print $i, "\n";

The things to notice about Perl is that the syntax is deliberately very close to C (with some idiosyncratic extensions like putting $ on the front of all variable names), and that common tasks like reading all input lines get hidden inside default constructions like while(<>) and the $_ variable that functions with no arguments like chomp operate on by default. This can allow for very compact but sometimes very incomprehensible code.

Here's a version in Python (http://www.python.org):

   1 #!/usr/bin/python
   3 """For each line in stdin, print PALINDROME if it is a palindrome, or index of
   4 the first non-matching character otherwise."""
   6 import sys
   8 for line in sys.stdin:
   9     line = line.rstrip('\n')         # remove trailing newline
  10     if line == line[::-1]:
  11         print "PALINDROME"
  12     else:
  13         mismatches = [ i for i in range(len(line)) if line[i] != line[-(i+1)] ]
  14         print min(mismatches)

Here the syntax is a little more alien if you are used to C: Python doesn't use curly braces for block structure, using indentation instead. The code above uses some other odd features of the language, such as the ability to take "slices" of sequence variables like strings (the expression line[::-1] means "take all elements of line starting from the obvious default starting point (the empty string before the first colon) to the obvious default ending point (the empty string before the second colon) stepping backwards one character at a time (the -1)), a feature the language adopted from array-processing languages like MatLab; and the ability to do list comprehensions (the large expression assigned to mismatches), a feature that Python adopted from Haskell and that Haskell adopted from set theory.

What these gain in short code length they lose in speed; run times on /usr/share/dict/words in the Zoo are







Note that for Perl and Python much of the cost is just the time to start the interpreter and parse the script, but factors of 10-100 are not unusual slowdowns when moving from C to a scripting language. The selling point of these languages is that in many applications run time is not as critical as ease and speed of implementation.

As an even shorter example, if you just want to print all the palindromes in a file, you can do that from the command line in one line of Perl, e.g:

$ perl -ne 'chomp; print $_, "\n" if($_ eq reverse $_)' < /usr/share/dict/words


2014-06-17 11:57