An Infrastructure for Writing E-Books on GitHub and Automatically Converting them to PDF and EPUB after each Commit

: Written by: Thomas Weise; Created: 26 August 2018

Introduction

A long time ago, when I was a PhD student, I wrote the book Global Optimization Algorithms – Theory and Application, which I published on my personal website as pdf. Since I recently developed the short course Metaheuristics for Smart Manufacturing and had a very nice experience teaching it, I have decided to begin to write a new book about optimization, "An Introduction to Optimization Algorithms," to incorporate my experience during the past ten years working in the field. When writing such a book, there are a couple of desirable features to improve the workflow and results, such as:

using a version control and distributed authoring system, which allows me to easily work on and extend the book as well as to make changes to the book and commit them wherever I am,
automated conversion of the book's sources to PDF, ideally whenever I commit a change,
automated provision of the PDF version of the book at an online location,
the generation of an electronic version of the book more suitable for handheld devises like mobile phones, i.e., an EPUB version, which should be automatically be built and provisioned like the PDF version,
maybe even the possibility that readers can file change requests, ask questions, or propose content to add in a structured way, and
the option to edit source code examples in an independent repository and update the book whenever they change.

In this post, I want to discuss a process, workflow, and tool support I came up with to achieve all of these features. This workflow has been designed in a way so that you can use it too, easily, for your own open books. Here we first describe the setup, which is stuff you have to do once and only once. After that, we discuss the features and commands that you have available inside your book. You can also find a set of slides about this project here.

The Setup

First, both for writing and hosting the book, we use a GitHub repository, very much like the one for the book I just began working on here. The book should be written in Pandoc's markdown syntax, which allows us to include most of the stuff we need, such as equations and citation references. For building the book, we will use Travis CI, which offers a free integration with GitHub to build open source software – triggered by repository commits. So every time you push a commit to your book repository, Travis CI will be notified, and check out the repository. But we have to tell Travis CI what to do with out book's sources, namely to compile them with Pandoc, which I have packaged with all necessary tools and filters into a docker container. Once Travis has finished downloading the container and building the book with it, it can "deploy" the produced files. For this purpose, we make use of GitHub Pages, a feature of GitHub which allows you to have a website for a repository. So we simply let Travis deploy the compiled book, in PDF and EPUB, to the book's repository website. Once the repository, website, and Travis build procedure are all set up, we can concentrate on working on our book and whenever some section or text is finished, commit, and enjoy the automatically new versions. Since the book's sources are available as GitHub repository, our readers can file issues to the repository, with change suggestions, discovered typos, or with questions to add clarification. They may even file pull requests with content to include.

The Repository

In order to use my workflow, you need to first have an account at GitHub and then create an open repository for your book. GitHub is built around the distributed version control system git, for which a variety of graphical user interfaces exist – see, e.g., of here. If you have a Debian-based Linux system, you can install the basic git client with command line interface as follows: sudo apt-get install git. You can use either this client or such a GUI to work with your repository. I suggest that in your main branch of the repository, you put a folder book where all the raw sources and graphics for your book go. In the repository root folder, you can then leave the non-book-related things, like README.md, .travis.yml, and LICENSE.md. At this step, you should choose a license for your project, maybe a creative commons one, if you want.

You should now put a file named "book.md" into the "book" folder of your repository, it could just contain some random text for now, the real book comes later.

The `gh-pages` Branch

Since we want the book to be automatically be built and published to the internet, we should have a gh-pages branch in our repository as well. I assume that you have a Unix/Linux system with git installed. In that case, you can do this as follows (based on here and here), by replacing YOUR_USER_NAME with your user name and YOUR_REPOSITORY with your repository name:

git clone --depth=50 --branch=master https://github.com/YOUR_USER_NAME/YOUR_REPOSITORY.git YOUR_USER_NAME/YOUR_REPOSITORY
cd YOUR_USER_NAME/YOUR_REPOSITORY
git checkout --orphan gh-pages
ls -la |awk '{print $9}' |grep -v git |xargs -I _ rm -rf ./_
git rm -rf .
git commit --allow-empty -m "root commit"
git push origin gh-pages

You can now safely delete the folder YOUR_USER_NAME/YOUR_REPOSITORY that was created during this procedure. If you go to the settings page of your repository, it should now display something like " Your site is published at https://YOUR_USER_NAME.github.io/YOUR_REPOSITORY/" under point "GitHub Pages". This is where your book will later go.

Personal Access Token

Later, we will use Travis CI to automatically build your book and to automatically deploy it the GitHub pages branch of your repository. For the latter, Travis will need a so-called personal access token, as described here. You need to create such a token following the steps detailed here. Basically, you go to your personal settings page for your GitHub account, click "Developer Settings" and then "Personal Access Tokens". You click "Generate new token" and confirm your password. Then you need to choose public_repo and click "Generate token". Make sure you store the token as text somewhere safe, we need this token text later on.

Travis CI: Building and Deployment

With that in place, we can now setup Travis CI for automated building and deployment. You can get a Travis account easily and even sign in with GitHub. When you sign into Travis, it should show you a list with your public GitHub repositories. You need to should enable your book repository for automated build.

Click on the now-activated repository in Travis and click "More Settings". Scroll down to "Environment Variables" and then add a variable named "GITHUB_TOKEN". As value, copy the text of the personal access token that we have created in the previous step.

`.travis.yml`

Now we need to finally tell Travis how to build our book, and this can be done by placing a file called .travis.yml into the root folder of your GitHub repository. This file should have the following contents, where YOUR_BOOK_OUTPUT_BASENAME should be replaced with the base name of the files to be generated (e.g., "myBook" will result in "myBook.pdf" and "myBook.epub" later):

sudo: required

language: generic

services:
- docker

script:
- |
baseDir="$(pwd)" &&\
inputDir="${baseDir}/book" &&\
outputDir="${baseDir}/output" &&\
mkdir -p "${outputDir}" &&\
docker run -v "${inputDir}/":/input/ \
-v "${outputDir}/":/output/ \
-e TRAVIS_COMMIT=$TRAVIS_COMMIT \
-e TRAVIS_REPO_SLUG=$TRAVIS_REPO_SLUG \
-t -i thomasweise/docker-bookbuilder book.md YOUR_BOOK_OUTPUT_BASENAME &&\
cd "${outputDir}"

deploy:
provider: pages
skip-cleanup: true
github-token: $GITHUB_TOKEN
keep-history: false
on:
branch: master
target-branch: gh-pages
local-dir: output

After adding this file, commit the changes and push the commit to the repository. Shortly thereafter, a new Travis build should start. If it goes well, it will produce two files, namely "http://YOUR_USER_NAME.github.io/YOUR_REPOSITORY/YOUR_BOOK_OUTPUT_BASENAME.pdf" and "http://YOUR_USER_NAME.github.io/YOUR_REPOSITORY/YOUR_BOOK_OUTPUT_BASENAME.epub" (where YOUR_USER_NAME will be the lower-case version of your user name). You can link them from the README.md file that you probably have in your project's root folder.

The Workflow and Available Commands

In your book, you can use all the features and syntax of Pandoc's markdown. My system will also automatically load and apply the pandoc-citeproc filter, which allows you to use citations to a bibliography and the pandoc-crossref for numbering and referencing figures and tables.

The workhorse is an R package named bookbuildeR. This package provides a set of additional commands and pre-processing capabilities, which are detailed here. You can use the following additional commands in your markdown:

You can now use the following commands in your markdown:

\relativel.path{path} is a path expression relative to the currently processed file's directory which will be resolved to be relative to the root folder of the document. This is intended to allow you to place chapters and sections in a hierarchical folder structure and their contained images in sub-folders, while referencing them relatively from the current path.
\relativel.input{file} is a command similar to \relativel.path. If this command is used, it must be the only text/command on the current line. It will resolve the path to file relative to the directory of the current file and *recursively* include that file. This is another tool to allow for building documents structured in chapters, sections, and folders without needed to specify the overall, global structure anywhere and instead specify the inclusion of files where they are actually needed.
\meta.time prints the current date and time.
\meta.date prints the current date.
\text.block{type}{label}{body} creates a text block by putting the title-cased type together with the block number in front in double-emphasis (the blocks of each type are numbered separately) and then the body. \text.block{definition}{d1}{blabla} \text.block{definition}{d2}{blubblub} will render to **Defininition 1:** blabla**Defininition 2:** blubblub. Blocks can be referenced in the text via their label in \text.ref{label}. The goal is to achieve theorem-style environments, but on top of markdown.
\text.ref{label} references a text block with the same label (see command above). In the example given above, \text.ref{d2} would render to Definition 2.
\relative.code{path}{lines}{tags} inserts the content of a file identified by the path path which is interpreted relative to the current file. Different from \relative.input, this is intented for reading code and this command therefore provides two content selection mechanisms: lines allows you to specify lines to insert, e.g., 20:22,1:4,7 would insert the lines 1 to 4, 7, and 20 to 22 (in that order). tags allows you to specify a comma-separated list of tags. A tag is a string, say "example1", and then everything between a line containing "start example1" and a line "end example1" is included. The idea is that you would put these start and end markers into line comments (in R starting with "#", in Java with "//"). If you specify multiple tags, the corresponding bodies are merged. If you specify both \code{lines} and \code{tags}, we first apply the line selection and then the tag selection only on the selected lines.
\repo.code{path}{lines}{tags} if and only if you specify a codeRepo url in your YAML metadata of the book, you can use this command, which allows you to download and access a repository with source code. This way, you can have a book project in one repository and a repository with separate, working, executable examples somewhere else. This second repository, identified by codeRepo, will be cloned once. Then, the path path is interpreted relative to the root path of the repository and you can include code from that repository in the same fashion as with \relative.code.
\repo.listing{label}{caption}{language}{path}{lines}{tags} puts the result of \repo.code{path}{lines}{tags} into a markdown/pandoc-crossref compatible listing environment that can be referenced via [@label] (where label should start with lst:), has caption caption, and is formatted for programming language language. Furthermore, this method also automatically removes meta-comments, such as /** ... */ in Java or #' .. in R, a feature currently only implemented for Java and R. If the source code repository resides on GitHub, this method will also append a link of the form (src) to the caption which goes to the file in the repository.
\repo.name is replaced with the source code repository name provided as codeRepo in the YAML metadata, without a trailing .git, if any.
\repo.commit is replaced with the commit of the source code repository that was downloaded an during the book building procedure.
\direct.r{rcode} directly executes a piece rcode of R code. If the code writes any output via, e.g., cat(..), then this output is pasted into the file. If the code does not produce such output, the its return value is transformed to a string and pasted.
\relative.r{path} similar to \direct.r, but instead execute the file refered by path, which is relative to the current directory.

The following commands will only work within Travis.CI builds and (intentionally) crash otherwise:

\meta.repository get the repository in format owner/repository`
\meta.commit get the commit id

With these facilities, you should be able to build electronic books automatically in a nice way.

Interaction with Source Code Repository

As discussed above, there are several commands for interacting with a source code repository. The idea is as follows: You can write your book online, by keeping the book sources in a GitHub repository. Whenever you make a commit to the repository, the book's pdf and epub files will be rebuilt, so the newest book version is always online.

If you write a book related to, e.g., computer science, you may have lots of example codes in some programming language in your book. I think that it is often nice to not just have examples on some "meta-level," but to have real, executable programs as examples. Of course, you may choose to print snippets of them in the book only, but they should be available "in full" somewhere.

For this purpose, the code repository exists. It can be a second GitHub repository, where you keep your programming examples. This second repository may have an independent build cycle, e.g., be a Maven build with unit tests executed on Travis CI, as well. You can specify the URL of this repository as codeRepo in the YAML meta data of your book and then use commands such as \repo.code{path}{lines}{tags}, \repo.listing{label}{caption}{language}{path}{lines}{tags}, \repo.name, and \repo.commit to directly access files in the meta information of this repository.

If you do that, you may choose to follow the approach given in plume-lib/trigger-travis to automatically trigger a build of your book when a commit to your source code repository happens. This is a good idea, because this way your book will stay up-to-date when you, e.g., fix bugs in your example codes or refactor them.

This concept means that you can edit your source code examples completely independently from the book. You could even write a book about an application you develop on GitHub and cite its sources wherever you want. By using the trigger-travis approach, you will get a new version of the book whenever you change the book and whenever you change the source code.

Infrastructure

While we have already discussed the interplay of GitHub and Travis CI to get your book compiled, we have omitted one more element of our infrastructure: Docker. Docker allows us to build something like very light-weight virtual machines (I know, they are not strictly virtual machines). For this purpose, we can build images, which are basically states of a file system. Our Travis builds load such an image, namely thomasweise/docker-bookbuilder, which provides my bookbuildeR R package on top of an R installation (thomasweise/docker-pandoc-r) on top of a Pandoc installation (thomasweise/docker-pandoc) on top of a TeX Live installation (thomasweise/docker-texlive-full). Of course, you could also use any of these containers locally or extend them in any way you want, in order to use different tools or book building procedures.

An Infrastructure for Writing E-Books on GitHub and Automatically Converting them to PDF and EPUB after each Commit

Introduction

The Setup

The Repository

The `gh-pages` Branch

Personal Access Token

Travis CI: Building and Deployment

`.travis.yml`

The Workflow and Available Commands

Interaction with Source Code Repository

Infrastructure

Popular Tags

Latest Articles

An Infrastructure for Writing E-Books on GitHub and Automatically Converting them to PDF and EPUB after each Commit

Introduction

The Setup

The Repository

The gh-pages Branch

Personal Access Token

Travis CI: Building and Deployment

.travis.yml

The Workflow and Available Commands

Interaction with Source Code Repository

Infrastructure

Popular Tags

Latest Articles

The `gh-pages` Branch

`.travis.yml`