11. The Document Processor¶
The OpenDSA textbook compilation pipeline includes custom preprocessing of module files into compileable ReStructuredText source. The main motivation for using our own document pre-processor was to support integration beyond the file level in ways that Sphinx does (or at least, did) not do. This includes the ability to number document objects (figures, and tables, and equations), and display numbered references. When we started the OpenDSA project, DocUtils did not providie such features. Some of the pre-processor features might be added over time to Sphinx, in which case we might eventually remove them from the pre-processor. You can view the DocUtils To Do list at http://docutils.sourceforge.net/docs/dev/todo.html.
11.1. Overview¶
The document processor works as a three-pass compiler.
The first two passes are executed on rst
files before running
Sphinx, and the last pass is run against html
files produced
by Sphinx.
The process results in three files,
two containing ducuments and objects numbers and one to check if
the document has been modified.
All global variables are declared in a separate file (config.py).
11.2. First Pass¶
- INPUT
- Modules as
rst
source files. - OUTPUT
- A file JSON (page_chapter.json) containing a dictionary of modules and their associate chapter.
- DESCRIPTION
- During the first pass, the document processor creates a dictionary
of the highlest level elements in the document (modules).
The dictionary contains tuples defined as
(module_name, [chapter_name, chapter_number])
.
11.3. Second Pass¶
- INPUT
- Modules as
rst
source files. - OUTPUT
- A JSON file (table.json) containing a dictionary of all documents objects and their appearance number.
- DESCRIPTION
- During the second pass, the document processor creates a
dictionary of all the objects inside modules.
The appearance number is the concatenation of
chapter_number
,module_number
, andobject_number
. The dictionary contains tuples defined as(object_name, appearance_number.)
.
11.4. Integration with Sphinx¶
The numref
(numref) directive adds numbers to document
objects (figures, tables, and equations) to the output of the
document preprocessor and uses it as hyperlink text for cross
referencing.
11.5. Third Pass¶
- INPUT
- Modules as
html
files generated by Sphinx. - OUTPUT
- Modified
html
files with an updated table of contents and navigation bar, and section numbers augmented with a chapter number prefix. - DESCRIPTION
- During the third pass, the document processor parses the html files
and replaces headers and section numbers as appropriate from the
dictionaries created during the first two passes.
Since our processor does not modify the Sphinx document tree, we
have to modify
html
files to replace the raw Sphinx section number with our own numbering scheme. This phase applies only to the Table Of Content, the navigation bar, page headers, and sections. The document processor perform a third pass only if the html file has been modified by Sphinx. The filecount.txt
stores the latest modification times for the html files.
11.6. Where things are¶
There are many files that affect the eventual HTML output. Here is a list of places to look if you are trying to make changes.
OpenDSA/RST/source/_themes/haiku/basic/layout.html
OpenDSA/RST/source/_themes/haiku/static/haiku.css_t
OpenDSA/RST/preprocessor.py
OpenDSA/RST/ODSAextensions
OpenDSA/tools/configure.py