TeX Internals: Overview

/

Writing a TeX clone has been on my wishlist for quite a while. I have been reading the source code of Knuth's TeX on and off in the last few weeks, and it seems like a good idea to document what I learned in the process. This blog post would be the first of a series.

Input to TeX is passed through a macro processor, which expands macros and conditionals to generate a stream of commands. TeX runs in a loop interpreting these commands. Some commands represent content like characters and rules (TeX-speak for "black rectangles"). TeX accumulates content in a tree. Each non-leaf node represents a container, like a paragraph, box, or math. When a container ends, its children are packed according to the container's layout rules. The results are appended to the parent of the container node. For example, a simple paragraph contains characters and glue (TeX-speak for "space that can stretch or shrink"). When the paragraph ends, TeX breaks it into lines, each represented as a box node, and puts these nodes into the vbox the paragraph belongs to, usually the current page.

The macro processor keeps a stack of inputs, where each is either a file or a token list, and always reads from the top. The stack is initialized to the input file. When a macro is expanded, for example, its definition (a token list) is pushed to the stack. When it references a parameter, the argument (also a token list) is pushed to the stack. A reference counting mechanism destroys unused token lists.

The macro processor recognizes backslashes followed by letters as control sequences. A hash table maps each control sequence to either an expandable token, like a macro, or a command. A character token is interpreted as a command to append a character node to the current container. Generally speaking, each TeX primitive corresponds to a command.

In TeX, if a hash table entry or another parameter is changed in a group, its old value is restored when the group ends. This is called dynamic scoping. To implement this, TeX keeps a "save stack" where each frame is a list of changed parameters and their old values. When a group ends, TeX interprets its corresponding frame to restore the old values.

(See TeX Internals: Parameters for the list of parameters that was removed in the 2025-12-13 revision.)