Documentation for Txtlisp/LitLisp

This is a preliminary manual for LitLisp, a literate-programming
system based on 'Txtlisp', a general-purpose tool for embedding Lisp
code in text files.  This mechanism is language-independent.  You can
produce Fortran code if you want to.

The purpose of Txtlisp is to process text files containing embedded
Lisp.  An embedded Lisp expression is evaluated and its output is
discarded, but anything it prints appears in the output file, in place
of the Lisp expression.  The standard extension for a Txtlisp input
file is ".txl", and in what follows I refer to that file as the .txl
file.  For more documentation on Txtlisp, see the comment at the
beginning of txtlisp.lisp in the Litlisp distribution.  (Txtlisp has
one main virtue, which is that it is very simple.)

There are two complementary ways to connect the text of a paper to the
program files it is about.  One way is to include the entire text of
the code file in the paper, although not necessarily in the order
expected by the compiler of the programming language being used.  The
other approach is keep the code files separate all along, and
to tell LitLisp which parts of the files to include in the paper. 


I Generating Code Files

In the first approach, code files are built out of _code segments_.
Code segments can occur in any order in the .txl file.  Code segments
can contain other code segments.  The mechanisms described here
unscramble them to form complete files.  (This process is called
"tangling" by the literate-programming community, which seems
backwards.)  A file is defined by a "top-level" code segment (one not
contained in any other code segment), which is known as a _file
segment_.

In a lisp section of a .txl file, (code-seg '<name>) begins a code
segment.  (Abbreviation: !#= <name>; see below.)  The code-seg
function takes over reading the segment, which ends when the
character sequence ~~| is seen.

Inside a segment, the characters ~~, when not followed by a vertical
bar or right paren, get you back into Txtlisp mode.  The main reason
to switch into Txtlisp mode is to insert one code segment into
another, by writing (insert-seg '<name>).  (Abbreviation: !## <name>.)
This ultimately outputs some markers that are interpreted one way when
generating text, another when generating a code file.

The other "active" character inside a segment is the colon (':'),
if it is followed by '.', '(', or '{', all of which behave as
special-purpose open brackets.  Their matching closer and the pair's
significance are given in the following table:

  :( & ):  Anything between these two will be included in the code
      file, but be replaced by an ellipsis in the text
      file.  The definition of "ellipsis" depends on what kind of text
      file is being generated; for LaTeX it is "\ldots", but for most
      formats it is just "...".

  :{ & }:  These are complementary (almost): Anything in between these
      brackets is included in the text file, and omitted completely
      from the code file.  (If the text file is a '.tex' file, one can
      insert TeX or LaTeX commands this way.)

  :. &  .:  Like :{ and }:, except that the ':.' and '.:' brackets are
      included in the text file.  Furthermore, if at all possible,
      LitLisp preserves the columnar position of characters in the
      lines containing the bracket characters.  This makes them
      useful in specifying context information.  (See below.)
  
  :\ -- This is the way to include a colon followed by a '.', '(', or
      '{' without having it be perceived as one of the open brackets. 
      
Here is an example of how ':.' and '.:' can be used to supply context
information.  Suppose we want to discuss the clauses of a 'cond'
separately.  We could structure the text as follows.  First comes the
introduction of the 'cond' and a discussion of the default operation:

  !#= cond-seg
  (cond ~~!## clause1 %.
  ~~    !## clause2 %.
  ~~    (t (default-op ...))) ~~

Later we might present what 'clause1' and 'clause2' look like thus:

  !#= clause1
  :. (cond .:
           ((> (foo ...) (baz ...))) ~~)

  discussion-discussion-discussion
  !#= clause2
  :. (cond .:
           ((= zz 0) ...)   ~~)

The extra occurrences of 'cond' are there just to help the reader
"see" what follows as a cond clause.  The final result output to the
code file is simply

     (cond ((> (foo ...) (baz ...)))
           ((= zz 0) ...)
           (t (default-op ...)))

The form (file-seg '<name>) defines a file segment as described
above.  (Abbreviation: !#== <name>; see below.)  The name is
associated with an actual file by a prior declaration of the form
               (code-file <name> <filespec>)
You can define several file segments with the same name, and they
will be put in the file in the order they occur in the Txtlisp file.
(This feature is of dubious value because the whole point of literate
programming is to decouple the defining file order from the "tangled"
file order; if in doubt, use just one file segment per file.)

Rather than write out 'code-seg' and 'insert-seg', one can use the
following abbreviations:

!#== <n>   =>   (file-seg '<n>)
!#= <n>    =>   (code-seg '<n>)
!## <n>    =>   (insert-seg '<n>)

Any number of left parentheses can come between the '#' and '='.
Instead of the terminating '~~|', one can write 

      ...    ~~)

and again, any number of right parentheses can follow the '~~'.  The
purpose of these conventions is to allow you to make all the parens
balance in the source file even though segments themselves need not
balance, or satisfy any other syntactic criterion. [Oops -- this is
very Lisp-centric!  Not clear how you would make this work for other
languages, or if it's important to....]

The transition between different modes can be confusing.  The
processor starts in text (or TeX) mode, switches to "Txtlisp" mode
when it sees ~~ or a dot at the start of a line, then switches to
"litlisp" mode when it sees (code-seg ...) or (file-seg ...).  Litlisp
mode resembles Txtlisp mode, in that it consists of Lisp code inside a
text file, but it is actually closer to ordinary Lisp mode.  None of
the Txtlisp conventions for producing output to *standard-output* are
in place, and the escape character is plain old backslash, not "%".
The only active escape sequences are "~~" and ":.".  The former kicks
you back into Txtlisp mode, which must itself be terminated by an
occurrence of '%.'.

In the text output file, file segments come out as

     <<File: <filename>
       ... >>

Other code segments come out as

     <<Define <segment-name>
       ... >>

A reference to a segment (produced by insert-seg) comes out as

     <<Insert: <segment-name> >>

(These double angle brackets are descended from the {\sc Noweb} literate-
programming system.)

If the output file type is "tex", then the contents of a code segment
are "verbatimized" so characters such as "\" and "#" are invisible to
TeX.  A label 'cf/'<name> is associated with the code segment, but
currently it can be used only for page references.

The various code files consist of the all the file segments in
order, with all their component segments substituted in,
recursively.  

Segments always start at the beginning of a line, and are generally
assumed to take an integral number of lines.  If you try to cram more
than one segment definition or insertion into one line, you'll
probably get some newlines breaking it up.  However, I've tried hard
to make sure that if you put just one insertion on a line, and define
a segment to occupy an integral number of lines, then no spurious line
breaks will be added by the LitLisp machinery.  You should be able to
create and refer to a series of insertions with no blank lines between
them.


II Extracting Code from Source Files

The alternative approach is to start with a source file and extract
fragments of it for use in the paper.  Such a file is declared by
writing

           (source-file <name> <filespec>)

Within the source file, a fragment is indicated by two bracketing
lines:

;;;;;           <<<< id
        ...
;;;;;           >>>> id

Here the ";;;;;" indicates whatever comment notation is used by the
programming language.  For instance, in a C program one would define a
fragment 'init' by writing

/*             <<<< init   */
        ...
/*             >>>> init   */

The entire source file is processed as soon as the 'source-file'
declaration is seen.  In the rest of the paper, one indicates an
occurrence of the fragment by writing

       (use-frag idx)

where "idx" is an expression that evaluates to the id, usually just
'id.  A call to 'use-frag' produces the same sort of expression as
'code-seg' --

          <<Define <fragment-name>
              ... >>

Fragments can be nested.  However, if an inner fragment is not
explicitly referred to in a (use-frag ...), there is no mention of it
in the text output file.  

For example, suppose the source file looks like this:

   ;-*- Mode: Common-lisp; Package: nisp; Readtable: ytools; -*-
   (in-package :nisp)

   ;;;;; <<<< foo
   (deffunc foo - Integer (l - (Lst Integer))
      (let-fun ()
         (sum l 0)
       :where
   ;;;;;    <<<< sum
         (:def sum - Integer (l - (Lst Integer) total - Integer)
            (cond ((null l) total)
                  (t
                   (sum (cdr l) (+ total (car l))))))
   ;;;;;    >>>> sum
         ))
   ;;;;; >>>> foo

which has two fragments.  If there is a reference to 'foo' alone:

   This algorithm is really quite ingenious:

   ~~ (use-frag 'foo) %.

then the text file will look thus:

   This algorithm is really quite ingenious

   <<Define foo
   (deffunc foo - Integer (l - (Lst Integer))
      (let-fun ()
         (sum l 0)
       :where
         (:def sum - Integer (l - (Lst Integer) total - Integer)
            (cond ((null l) total)
                  (t
                   (sum (cdr l) (+ total (car l))))))
         ))>>

But if we refer to the inner fragment 'sum' elsewhere, as in this
case:

   The key subroutine that makes all of our software magically
   efficacious is the 'sum' routine:

   ~~ (use-frag 'sum) %.

   For instance, here is how it is used by the otherwise drab
   procedure 'foo':

   ~~ (use-frag 'foo) %.

   which now runs in sublogarithmic time.

the text file will come out thus:

   The key subroutine that makes all of our software magically
   efficacious is the 'sum' routine:
   
   <<Define sum 
         (:def sum - Integer (l - (Lst Integer) total - Integer)
            (cond ((null l) total)
                  (t
                   (sum (cdr l) (+ total (car l))))))>>

   For instance, here is how it is used by the otherwise drab
   procedure 'foo':

   <<Define foo
   (deffunc foo - Integer (l - (Lst Integer))
      (let-fun ()
         (sum l 0)
       :where
   <<Insert: sum>>
         ))>>

The similarity in notation between the "segment" approach and the
"fragment" approach is not accidental.  The reader of the paper should
not be able to tell which was used, assuming care is taken to include
all the source code in either case.

One issue that comes up with fragments but not segments is that
special provision must be made to allow for alternative versions of
fragments.  Suppose in a paper we want to propose one or two
preliminary versions of a fragment before revealing the one that
actually occurs in the source file.  One way to do that would be to
create a pretend "source" file to hold the preliminary fragments.
That would be an error-prone nuisance.  Instead, one can define the
fragments in the .txl file using

    (define-frag '<id>)

and then proceeding as for 'file-seg'.  (The only difference between
the two is that the newly defined fragment is not associated with any
code file.)  There is a read-macro abbreviation for 'define-frag'.  
!#> is to !#== as 'define-frag' is to 'file-seg'.  

There are other facilities for altering the code as it actually
appears in the source file.  In segments, we can hide excess detail 
using the ":. ldots{} .:" notation.  That won't work in an actual
source file.  We can declare that an inner fragment is to be omitted
entirely by writing 

      (omit-frag '<id>)

in the .txl file.  (An ellipsis will appear in place of the fragment.)
This device allows us to omit a fragment in one paper while retaining
the ability to refer to it in another.  If we want to omit it in every
paper, we can just name the segment '_' (underscore).  Yet another way
to cause it to be omitted is via keyword arguments to 'use-frag'.
There are actually several keyword arguments that may come in handy:

    :omit (fr1 ... frN) -- causes the inner fragments fr1,...,frN 
            to be omitted in this version of a fragment
    :replace ((f1 r1) ... (fN rN)) -- causes fragment fI to be 
            replaced by rI.
    :multi-okay B -- if B is true, indicates that if this fragment has
            been used before, that's okay.  (If B is false or omitted,
            then a warning is issued for multiple uses of the same
            fragment.) 
    :source -- The name of the source-file this fragment comes from,
            as declared in a 'source-file' form.
    :show-define -- If true, the <<Define ...>> brackets are wrapped
            around the fragment as described.  If false, they are
            suppressed (so that the reader is not reminded of the
            literate-programming machinery when it's not necessary).
            The default value comes from the global variable
            frag-show-define* (initially false).
    
There is no readmacro abbreviation for 'use-frag', mainly because it
does not switch us into a tricky special text mode (as elucidated at
the end of section I, above).  It's just a straightforward use of the
standard 'Txtlisp' pattern:

     ~~ (use-frag 'frizzle) %.

causes some text to be generated to '*standard-output*'.  A readmacro
abbreviation would just look silly trying to accommodate cases such as

     ~~ (use-frag 'frizzle
                  :multi-okay true
                  :replace '((frazzle frazzle-alt))) %.


III Combining Segments and Fragments

The two approaches are not mutually exclusive.  One can produce one
source file and consume another.  Suppose you've written an essay
explaining a program, and later you decide to write a paper about some
aspect of it.  Obviously, Essay A must contain segments, say

~~ !#(= random-pick-delete
          // Select a random element and mark it deleted.
          int random_pick_and_delete()
          {
            // j is the ordinal of the number we have selected.
            int j = /*rangen.*/Rand.this.nextInt(num_left);
            //System.err.print("[" + j + "] ");

            Splitter tr = this;
            int bail = n;
            int left;
///// <<<< pick-loop
            while (tr.split && bail > 0)
              {
                left = tr.less.num_left;
                // Invariant: j is the ordinal w.r.t. tr.
                //System.err.print("." + j + ".");
                bail--;
                tr.num_left--;
                if (j < left)
                  {
                    //System.err.print("{<" + left + "}");
                    tr = tr.less;
                  }
                else
                  {
                    //System.err.print("{>" + left + "}");
                    j -= left;
                    tr = tr.more;
                  }
              }
///// >>>> pick-loop
            //System.err.print("." + j + ".");
            tr.num_left--;
            tr.split = true;
            tr.d = tr.lo + j;
            tr.less = new Splitter(tr.lo, tr.d);
            tr.more = new Splitter(tr.d+1, tr.hi);
            //System.err.print("[" + tr.lo + " " + tr.hi + "] ->" + tr.d);
            return tr.d;
          }
~~)
%.

Note that the first occurrence of '~~' gets us into Txtlisp mode,
where any Lisp form can be evaluated.  The only thing that we do here
is call 'code-seg', by writing !#(= <name>, thus declaring that the
following piece of Java code is a segment.  (You were expecting Lisp?)
Within this segment, we deploy the syntax for fragments, so that in
our second paper we can refer to the fragment 'pick-loop'. To use
fragments, you have to refer to a code file.  No problem; it's not
hard to figure out what file the segment 'pick-file-delete' will wind
up in.  The fragment brackets are just passed through to the code file
unchanged.  So Essay B might say

   The heart of the \texttt{random\_pick\_and\_delete} algorithm is
   the loop shown here:

   ~~ (use-frag 'pick-loop) %.

   a piece of code so clever that several lawsuits have been filed to
   protect my rights to the intellectual property embodied therein.

The file declaration in Essay A might look like

~~
...
  (code-file principal "~/research/irreproducible/fraudulent/shady.java")
...
  !#(== principal   <-= This code-file segment ultimately includes
        ...~~)          the segment 'pick-and-delete'  
...
%.     

The file declaration in Essay B is then

~~
...
     (source-file important 
                  "~/research/irreproducible/fraudulent/shady.java")

~~|

One might object that the fragment annotations in Essay A are a
bizarre distraction to anyone trying to make sense of the algorithm.
Not to worry; all fragment brackets are deleted from the version of
the algorithm that appears in the text.


IV Running the Program

Litlisp can be downloaded from my website, at 

http://www.cs.yale.edu/homes/dvm/#software

It depends on two other software packages, YTools and Nisp, available
at the same place.  YTools provides the basic substrate for the other
packages. Once it is installed (see the manual, at 

http://www.cs.yale.edu/homes/dvm/#documentation)

the other two can be installed by performing the following steps:

1. Download and uncompress the tar file.

2. Spill its contents into a convenient directory.

3. Start Lisp; load YTools, and type (yt-install :litlisp).  (After
   installing Nisp, which requires the same procedure, except you type
   (yt-install :nisp).)  The installation procedure will ask you two
   questions (roughly); it basically wants to know where you put the
   source files and where it should put the binaries.  The answer to
   the latter question is usually '../bin/', meaning, "Put them in a
   sibling directory whose path starts with 'bin'."

The software has to be installed only once.  Thereafter you load it by
starting Lisp, loading YTools, and then typing (yt-load :litlisp).

To run Litlisp, you usually do one of the following. If the essay is a
LaTeX file, you do
   
    (tex-litlisp "filename.txl")

If it's html, you do

    (html-litlisp "filename.txl")

(You can leave the ".txl" off; it's assumed.)

The first time the program is run on a file, it begins to build tables
of program segments and/or fragments.  The former are kept in a
special file with name filename.seg.  The latter are kept in the file
filename.lux, which is a general repository of information produced by
'Txtlisp'.  (Actually, the segment table should probably be in the
.lux file as well.)

The segment and fragment tables can't in general be produced in one
pass (except in the rare case that no segment contains a piece defined
earlier in the .txl file; and no fragment contains a piece used
_later_ in the file).  'tex-litlisp' and 'html-litlisp' will process
the file repeatedly until the tables stabilize, up to a settable
number of iterations.  (Just set the global variable
'litlisp-num-runs*, whose default value is 5.  Usually this is more
than enough.)


V Debugging

Given the complex bracketing scheme used by Txtlisp/Litlisp, it is
inevitable that you will forget a bracket and something will go
awry.  Almost always what happens is that something gets processed in
Txtlisp mode that shouldn't be.  The usual symptom is that a random
word is treated as a variable by Lisp, which halts with an "unbound
variable" message.  If the word is distinctive, you can often spot
the point in the source file where it occurs and look back for a
missing close bracket.

If that doesn't work, it's necessary to bring out the heavy artillery,
which in our case is just one howitzer, the Boolean variable
'txtlisp-dbg*'.  Setting this to true causes a _lot_ of output to be
generated about what mode txtlisp thinks it's in at every transition
in the file.  If you're patient, you can find in this output the
places where Litlisp is misclassifying some part of your file.
