# 11. The Document Processor¶

The OpenDSA textbook compilation pipeline includes custom preprocessing of module files into compileable ReStructuredText source. The main motivation for using our own document pre-processor was to support integration beyond the file level in ways that Sphinx does (or at least, did) not do. This includes the ability to number document objects (figures, and tables, and equations), and display numbered references. When we started the OpenDSA project, DocUtils did not providie such features. Some of the pre-processor features might be added over time to Sphinx, in which case we might eventually remove them from the pre-processor. You can view the DocUtils To Do list at http://docutils.sourceforge.net/docs/dev/todo.html.

## 11.1. Overview¶

The document processor works as a three-pass compiler. The first two passes are executed on rst files before running Sphinx, and the last pass is run against html files produced by Sphinx. The process results in three files, two containing ducuments and objects numbers and one to check if the document has been modified. All global variables are declared in a separate file (config.py).

## 11.2. First Pass¶

INPUT
Modules as rst source files.
OUTPUT
A file JSON (page_chapter.json) containing a dictionary of modules and their associate chapter.
DESCRIPTION
During the first pass, the document processor creates a dictionary of the highlest level elements in the document (modules). The dictionary contains tuples defined as (module_name, [chapter_name, chapter_number]).

## 11.3. Second Pass¶

INPUT
Modules as rst source files.
OUTPUT
A JSON file (table.json) containing a dictionary of all documents objects and their appearance number.
DESCRIPTION
During the second pass, the document processor creates a dictionary of all the objects inside modules. The appearance number is the concatenation of chapter_number, module_number, and object_number. The dictionary contains tuples defined as (object_name, appearance_number.).

## 11.4. Integration with Sphinx¶

The numref (numref) directive adds numbers to document objects (figures, tables, and equations) to the output of the document preprocessor and uses it as hyperlink text for cross referencing.

## 11.5. Third Pass¶

INPUT
Modules as html files generated by Sphinx.
OUTPUT
Modified html files with an updated table of contents and navigation bar, and section numbers augmented with a chapter number prefix.
DESCRIPTION
During the third pass, the document processor parses the html files and replaces headers and section numbers as appropriate from the dictionaries created during the first two passes. Since our processor does not modify the Sphinx document tree, we have to modify html files to replace the raw Sphinx section number with our own numbering scheme. This phase applies only to the Table Of Content, the navigation bar, page headers, and sections. The document processor perform a third pass only if the html file has been modified by Sphinx. The file count.txt stores the latest modification times for the html files.

## 11.6. Where things are¶

There are many files that affect the eventual HTML output. Here is a list of places to look if you are trying to make changes.

OpenDSA/RST/source/_themes/haiku/basic/layout.html

OpenDSA/RST/source/_themes/haiku/static/haiku.css_t

OpenDSA/RST/preprocessor.py

OpenDSA/RST/ODSAextensions

OpenDSA/tools/configure.py