typst/ARCHITECTURE.md
2023-03-18 18:27:22 +01:00

8.7 KiB

Typst Compiler Architecture

Wondering how to contribute or just curious how Typst works? This document covers the general architecture of Typst's compiler, so you get an understanding of what's where and how everything fits together.

The source-to-PDF compilation process of a Typst file proceeds in four phases.

  1. Parsing: Turns a source string into a syntax tree.
  2. Evaluation: Turns a syntax tree and its dependencies into content.
  3. Layout: Layouts content into frames.
  4. Export: Turns frames into an output format like PDF or a raster graphic.

The Typst compiler is incremental: Recompiling a document that was compiled previously is much faster than compiling from scratch. Most of the hard work is done by comemo, an incremental compilation framework we have written for Typst. However, the compiler is still carefully written with incrementality in mind. Below we discuss the four phases and how incrementality affects each of them.

Parsing

The syntax tree and parser are located in src/syntax. Parsing is a pure function &str -> SyntaxNode without any further dependencies. The result is a concrete syntax tree reflecting the whole file structure, including whitespace and comments. Parsing cannot fail. If there are syntactic errors, the returned syntax tree contains error nodes instead. It's important that the parser deals well with broken code because it is also used for syntax highlighting and IDE functionality.

Typedness: The syntax tree is untyped, any node can have any SyntaxKind. This makes it very easy to (a) attach spans to each node (see below), (b) traverse the tree when doing highlighting or IDE analyses (no extra complications like a visitor pattern). The typst::syntax::ast module provides a typed API on top of the raw tree. This API resembles a more classical AST and is used by the interpreter.

Spans: After parsing, the syntax tree is numbered with span numbers. These numbers are unique identifiers for syntax nodes that are used to trace back errors in later compilation phases to a piece of syntax. The span numbers are ordered so that the node corresponding to a number can be found quickly.

Incremental: Typst has an incremental parser that can reparse a segment of markup or a code/content block. After incremental parsing, span numbers are reassigned locally. This way, span numbers further away from an edit stay mostly stable. This is important because they are used pervasively throughout the compiler, also as input to memoized functions. The less they change, the better for incremental compilation.

Evaluation

The evaluation phase lives in src/eval. It takes a parsed Source file and evaluates it to a Module. A module consists of the Content that was written in it and a Scope with the bindings that were defined within it.

A source file may depend on other files (imported sources, images, data files), which need to be resolved. Since Typst is deployed in different environments (CLI, web app, etc.) these system dependencies are resolved through a general interface called a World. Apart from files, the world also provides configuration and fonts.

Interpreter: Typst implements a tree-walking interpreter. To evaluate a piece of source, you first create a Vm with a scope stack. Then, the AST is recursively evaluated through trait impls of the form fn eval(&self, vm: &mut Vm) -> Result<Value>. An interesting detail is how closures are dealt with: When the interpreter sees a closure / function definition, it walks the body of the closure and finds all accesses to variables that aren't defined within the closure. It then clones the values of all these variables (it captures them) and stores them alongside the closure's syntactical definition in a closure value. When the closure is called, a fresh Vm is created and its scope stack is initialized with the captured variables.

Incremental: In this phase, incremental compilation happens at the granularity of the module and the closure. Typst memoizes the result of evaluating a source file across compilations. Furthermore, it memoizes the result of calling a closure with a certain set of parameters. This is possible because Typst ensures that all functions are pure. The result of a closure call can be recycled if the closure has the same syntax and captures, even if the closure values stems from a different module evaluation (i.e. if a module is reevaluated, previous calls to closures defined in the module can still be reused).

Layout

The layout phase takes Content and produces one Frame per page for it. To layout Content, we first have to realize it by applying all relevant show rules to the content. Since show rules may be defined as Typst closures, realization can trigger closure evaluation, which in turn produces content that is recursively realized. Realization is a shallow process: While collecting list items into a list that we want to layout, we don't realize the content within the list items just yet. This only happens lazily once the list items are layouted.

When we a have realized the content into a layoutable node, we can then layout it into regions, which describe the space into which the content shall be layouted. Within these, a node is free to layout itself as it sees fit, returning one Frame per region it wants to occupy.

Introspection: How content layouts (and realizes) may depend on how it itself is layouted (e.g., through page numbers in the table of contents, counters, state, etc.). Typst resolves these inherently cyclical dependencies through the introspection loop: The layout phase runs in a loop until the results stabilize. Most introspections stabilize after one or two iterations. However, some may never stabilize, so we give up after five attempts.

Incremental: Layout caching happens at the granularity of a node. This is important because overall layout is the most expensive compilation phase, so we want to reuse as much as possible.

Export

Exporters live in src/export. They turn layouted frames into an output file format.

  • The PDF exporter takes layouted frames and turns them into a PDF file.
  • The built-in renderer takes a frame and turns it into a pixel buffer.
  • HTML export does not exist yet, but will in the future. However, this requires some complex compiler work because the export will start with Content instead of Frames (layout is the browser's job).

IDE

The src/ide module implements IDE functionality for Typst. It builds heavily on the other modules (most importantly, syntax and eval).

Syntactic: Basic IDE functionality is based on a file's syntax. However, the standard syntax node is a bit too limited for writing IDE tooling. It doesn't provide access to its parents or neighbours. This is a fine for an evaluation-like recursive traversal, but impractical for IDE use cases. For this reason, there is an additional abstraction on top of a syntax node called a LinkedNode, which is used pervasively across the ide module.

Semantic: More advanced functionality like autocompletion requires semantic analysis of the source. To gain semantic information for things like hover tooltips, we directly use other parts of the compiler. For instance, to find out the type of a variable, we evaluate and realize the full document equipped with a Tracer that emits the variable's value whenever it is visited. From the set of resulting values, we can then compute the set of types a value takes on. Thanks to incremental compilation, we can recycle large parts of the compilation that we had to do anyway to typeset the document.

Incremental: Syntactic IDE stuff is relatively cheap for now, so there are no special incrementality concerns. Semantic analysis with a tracer is relatively expensive. However, large parts of a traced analysis compilation can reuse memoized results from a previous normal compilation. Only the module evaluation of the active file and layout code that somewhere within evaluates source code in the active file needs to re-run. This is all handled automatically by comemo because the tracer is wrapped in a comemo::TrackedMut container.

Tests

Typst has an extensive suite of integration tests. A test file consists of multiple tests that are separated by ---. For each test file, we store a reference image defining what the compiler should output. To manage the reference images, you can use the VS code extension in tools/test-helper.

The integration tests cover parsing, evaluation, realization, layout and rendering. PDF output is sadly untested, but most bugs are in earlier phases of the compiler; the PDF output itself is relatively straight-forward. IDE functionality is also mostly untested. PDF and IDE testing should be added in the future.