[FrontPage] [TitleIndex] [WordIndex

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

You will need to learn some IA-32 assembly language programming for CS422. This document starts with pointers to IA-32 assembly language documentation and then continues with some specific details that might be more directly relevant to the course.

1. What you should read instead of this page

My recommendation is to start with Kai Li's notes on IA-32 programming at http://www.cs.princeton.edu/courses/archive/fall06/cos318/docs/pc-arch.html.

Then move on to the

Official IA32 Intel architecture software developer's manuals:

One trap in all of this is that gas, the GNU assembler, uses a different syntax for assembly language from the Intel style. See the gas manual for an extensive discussion of this.

A less authoritative guide to x86 assembly written in gas syntax can be found at http://en.wikibooks.org/wiki/X86_Assembly.

Two helpful gcc tricks:

  1. You can find out what the C compiler turns a given chunk of C code to using gcc -S file.c.

  2. You can make use of assembly-language code inside C code using the asm syntax in gcc. (Note that you may need to provide extra directives to the assembler if you are coding for unusual targets, e.g. 16-bit mode boot loaders should include asm(".code16gcc"); at the top of every C file). With sufficiently clever use of this feature you can keep most of your code in C and use assembly only for very specific low-level tasks (like manipulating segment registers, calling BIOS routines, or executing special-purpose instructions that never show up as a result of normal C code like int or iret). See the gcc documentation for more on using the asm mechanism.

http://devpit.org/wiki/Compiler_Options_for_Creating_Odd_Binaries has some nice discussion of how to generate unusual binaries using gcc and ld.

2. Real mode x86 programming

For the first few assignments you will be working in real mode, which is the x86 architectures mode that emulates a vintage 1976 8086 CPU. The main advantage of real mode is that you have a flat 20-bit address space running from 0x00000 to 0xFFFFFF and no memory management or protection issues to worry about. The disadvantage is that you only have 16-bit address registers to address this 20-bit space.

The trick that Intel's engineers came up with to handle this problem was to us segmented addressing. In addition to the four 16-bit data registers AX, BX, CX, and DX and the four 16-bit address registers BP, SP, DI, and SI there are four 16-bit segment registers (later extended to six) CS, DS, ES, and SS. Addresses in real mode are obtained by combining a 16-bit segment with a 16-bit offset by the rule 0x10*segment+offset. This operation is commonly written with a colon, so for example the physical address of the stack is SS:SP = 0x10*SS+SP.

2.1. Opcodes and operands

Instructions typically operate one or two registers, immediate values (i.e. constants), or memory locations. An instruction is written as an opcode followed by its operands separated by commas.

Perhaps the most useful opcode is mov, equivalent to an assignment. It comes in several flavors depending on the size of the value you are moving: movb = 1 byte, movw = 2 bytes, movl = 4 bytes. If you don't add the size tag the assembler picks one based on the size of the destination operand. In AT&T syntax as used in gas the first operand is the source, the second is the destination (this is backwards from Intel syntax).

Here is movw conjugated with various addressing modes:

    movw $4, %ax     # copy the constant 0x04 into AX
    movw 4, %ax      # copy the contents of memory location DS:0x04 into AX
    movw %ax, %bx    # copy the contents of AX to BX
    movw (%si), %ax  # copy the contents of memory location pointed to by SI to AX
    movw 4(%bp), %ax # copy the contents of location at SS:BP+4 to AX
    movw %ax, 12(%es:%si)  # copy AX to location ES:SI+12

Note that most of the time we don't bother specifying the segment register, but instead take the default: CS for instructions, SS for the stack (push, pop, anything using BP or SP), and DS for everything else except string instruction destinations, which use ES. But we can always specify the segment register explicitly as in the last example.

Arithmetic operations follow the pattern for mov, e.g.

    addw %ax, %bx    # add AX to BX (in C: bx += ax)
    incw 4(%bp)      # increment *(SS:BP+4)
    cmpw %cx, %dx    # compare CX to DX; like subtraction but throws away result

Control flow is handled by jump instructions. Targets are labels which are followed by a colon (think goto in C):

    jmp loop         # very fast loop

Conditional jumps are often more useful:

    movw $0, %ax
    incw %ax
    cmpw %ax, $10
    jle loop16       # jump if A <= 0x10

Two specialized jumps are call and ret, which are used for procedure calls and returns. These push or pop the IP register on the stack as appropriate. The int and iret instructions are slightly more complicated variants of these used for simulating interrupts; we'll run into these more later.

See the documentation for many more instructions.

2014-06-17 11:58