Process Plan to create bookmarks from headings

This article provides a Process Plan that can be used to automatically generate bookmarks from the headings in a PDF.

The test file (taken from the PDF/VT sample files at https://pdfa.org/resource/cal-poly-pdfvt-test-suite/) contains several brochures for different cities. The Process Plan determines the headings (city name on the first page of each brochure) and uses them to create a bookmark structure.

Let's have a look at the Process Plan:

  1. First, the positions of the headings are determined by a matching Check. This Check is individual and works only for the sample PDF file. It is important to properly detect all headings in the PDF using a combination of Check properties (font size, font type, font color, etc.).
  2. The next step is the extraction of the text at the positions that have been detected in step 1. If the position of a text snippet touches another text snippet, they will be merged. This is important if you have headings that extend over two lines (like "District of Columbia" - see screenshot above) to receive only one bookmark for that heading.
  3. A loop goes over an array to determine all headings. When all headings are determined, the next step starts.
  4. To inject the bookmarks into the PDF the Process Plan uses the "Apply strutures” Action. The information that has been gathered so far is converted into a proper JSON structure that can be used in this Action for bookmarks.
  5. Finally, a Fixup defines that the bookmarks panel should be visible by default in the PDF viewer.

When executing the Process Plan, a Ask-at-runtime dialog will appear, to increase the search area for the text extraction slightly in order to find all headings properly (the default value of 20 pt is suitable in most cases).

Result PDF with new bookmark structure