A thread abstracts the normal flow of control in a program: the sequence of addresses that the IP passes through. In a multi-threaded program, we have the illusion of multiple flows of control corresponding to multiple IPs. Typically this is implemented using time-sharing, where a single IP is switched between different threads. But (usually with OS support) it may also be implemented using multiple CPU cores.
Usually when we talk about threads we are assuming multiple tasks in the same address space; a single thread isolated in its own address space generally goes by the name of a process (see Processes). Some systems (e.g. Linux) use task to refer to either a thread or a process, the relevant property being that each task has its own IP and its own stack.
1. Why use threads?
Multithreaded programs are notoriously difficult to write and debug, since having multiple threads running concurrently with unpredictable timing makes a program's behavior itself unpredictable unless the programmer is very careful. So why would we both with threads? There are basically four reasons (SGG also give four, but they're a slightly different four):
- You bought a machine with more than one CPU core, and you'd like to use all of them. If you can rewrite your program to perform non-interfering tasks in separate threads, your program will run faster.
Your program is naturally structured to allow parallel computation, and it's easiest to write it this way. There are two very common design patterns that lend themselves to parallel execution: producer-consumer pipelines (e.g. tr A-Z a-z | tr -cs "a-z'" '\n' | sort | uniq -c | sort -nr, a five-stage pipeline that finds the most common words in its input when run on Unix), and server programs (e.g. a web server handling multiple simultaneous requests).
- You want to handle multiple streams of interaction at once that share common data, but you don't want one of them to slow down the others. For example, in your GUI you don't want to wait for the user to type a key before updating the clock, and conversely the user doesn't want to wait for some excessively-complicated web page to render before typing a key. (Also the web server above, which doesn't want to get stuck reading half an HTTP request.)
- You wanted to spawn 10,000 processes, but since the difference between them was only a few kilobytes it made sense to share the common bits so your computer doesn't burn up. To a certain extent this can be handled with a good shared library implementation, but even exploiting shared libraries it's usually cheaper (both in time and memory) to spawn a thread than a whole new process.
2. Thread creation: user view
Typically the user will have some sort of thread library. On POSIX systems, the standard thread library is Pthreads. A typical call to create a thread in Pthreads looks like this:
Here thread_id is a retval that returns a handle to the thread (which can be used to do further nasty things to it), attributes is a set of thread options (set to a default value by pthread_attr_init), and function_to_run, and function_argument are the standard C-style closure consisting of a function and a void * argument. Pthreads also provides procedures for waiting for threads to finish (pthread_join), and various other useful tools like mutexes.
In object-oriented languages, threads are often implemented by inheritance from some standard thread class (e.g. Thread in Java or threading.Thread in Python). Here a thread is created by creating a new object of the appropriate class and kicking it to life with some standard run() method.
For more examples see SGG §4.3.
3. User threads vs kernel threads
Processes are typically implemented as kernel threads—threads that are implemented in and supported directly by the kernel. We can also have user threads, which don't necessarily require kernel support. For example, it's not hard to hack up a primitive threading system within a user process pretty much any modern C environment by allocating some extra stacks using malloc and abusing setjmp and longjmp to switch between them (note that this will require some machine-dependent bits to get right). Similarly, Java threads in early JVM implementations ran entirely inside the Java interpreter with no interference from or knowledge of the underlying OS. But in many cases it is helpful to have OS support for user threads. There are three basic models:
Many-to-one: The simplest model as described above. A user-space thread library manages many user threads on top of a single kernel thread. The advantage is that it doesn't require kernel support. The disadvantage is that since the kernel doesn't know about user threads it can't schedule them, at most one user thread can run at a time (since the thread library can't get at more than one CPU), and any action that blocks the single kernel thread (e.g. a blocking read system call) blocks all the user threads.
One-to-one: The kernel allocates one kernel thread for each user thread, e.g. using the clone system call in Linux. This gives essentially all the advantages might want from threads: better scheduling, concurrency across multiple CPU cores, no blocking between user threads. One slight disadvantage is that requiring kernel involvement increases costs, and may put a limit on how many user threads a process can have. Most widely-used OSs use this model.
Many-to-many: A process may have many user threads and many kernel threads, with user threads mapped to the pool of kernel threads according to various possible rules. This combines the good features of the previous two models at the cost of additional complexity.
Here's something you don't see: nested threads, where one user thread in a many-to-one model may itself be divided into further threads. The closest a standard OS will get to this is running a virtual machine, although some microkernels have the machinery to support this. Exercise: What would nested threads actually look like, and why would or wouldn't you want to support them?
4. Why not use threads?
A program that uses threads will (in general) be nondeterministic: its behavior will change from one execution to the next. This makes debugging using the standard approach of making small changes and seeing if they improve things much more difficult—how do you know if the program is really fixed or just decided to behave differently this time? It also causes trouble when two (or more) threads share data structures without using some sort of ConcurrencyControl.