pdfChip specific HTML aspects

In pdfChip most valid HTML tags can be used. Due to the big amount of available tags and and an even bigger number of possible combinations, some of them might result in an unexpected result. Due to the different needs for formatting content on a page with a fixed size than for a website (which shall be properly displayed on every output device) some formatting tags don't make sense.

This chapter contains some details of some special HTML features which have been added to achieve some special needs to be able to use PDFs (and not only images) as well as adding XMP Metadata, including PDF Standards identifier, adding an OutputIntent or attaching (embedding) files to the created PDF document. Please refer to the CSS chapter for details regarding layout.

Use PDF as image format

pdfChip allows the usage of PDF pages as source for image tags. Since PDFs can contain more than one page, a syntax for selecting the page to be placed has been added to the HTML syntax.

The PDF that is positioned will not become rasterized, but rather the original PDF content is merged with the generated PDF document.

Also Adobe Illustrator (.ai) files can be used in the same way like PDF files. Only the PDF representation of the file will be used then, all internal Illustrator information stored in the file will be discarded and not be a part of the new PDF file.

URL syntax for PDF pages

The URL for PDF supports the following features:

<URL>#page=<PAGE-NUM>&box=<BOXNAME>&boxadj=<LEFT>,<TOP>,<RIGHT>,<BOTTOM>
  • <URL>: the url to a PDF file
  • <PAGE-NUM>: the page number (one based)
  • <BOXNAME>: specify the page box used for placement: trim, crop, media, bleed, art. Default: CropBox
  • <LEFT>,<TOP>,<RIGHT>,<BOTTOM>: adjustment for the page box. Positive values will extend the selected page box. Default: 0
    Values can be specified in 'mm', 'pt', 'cm', 'pc', 'in' units. Default unit is 'pt.
  • Note:
    If the page=<PAGE-NUM> part is missing the first page from the PDF referenced by URL is used for placement.

Example

Place the first page of "sample.pdf"

<img src="sample.pdf">

Places the second page of sample.pdf

<img src="sample.pdf#page=2">

Supported tags HTML and CSS properties

  • HTML Tags:
    • <img src=“sample.pdf#page=2”>
  • CSS properties
    • background:url(“sample.pdf#page=2”)
    • background-image:url(“sample.pdf#page=2”)

Use Adobe Illustrator files (.ai) as image format

Also Adobe Illustrator (.ai) files can be used in the same way as PDF files. Only the PDF representation of the file will be used in this case, all internal Illustrator information stored in the file will be discarded and will not be included in the PDF file generated by pdfChip.

Support for image file formats

pdfChip supports the following image file formats:

  • GIF
  • PNG
  • JPEG, JPG
  • TIFF, TIF
  • PSD

and also:

  • SVG

For the image file formats, pdfChip passes the image data, including masks, alpha channels and ICC profiles, through to the PDF data. By doing so, the image resolution, width and height of image, Bits per color component, color model, transparency, and ICC profile are fully maintained.

For PSD and TIF with Photoshop information the following information is maintained additionally:

  • Clipping path (if present)
  • Photoshop layers are converted to spot channels
  • XMP metadata (if present)

SVG gets converted directly to PDF data structures (for details see pdfChip specific SVG aspects).

Create File Attachment annotations

File attachments can be created by using <a> link tags with pdfChip custom attributes.

A file attachment annotation is created if the <a> tag contains the following attributes:

  • href: must be present, content is ignored
  • data-cchip-embed: Path to file to embed

Optional attributes:

  • data-cchip-mimetype: MIME type of attachment (required for PDF/A-3)
  • data-cchip-desc: Desription for attachment
  • data-cchip-relationship: The AFRelationShip entry (“Source”, “Data”, “Alternative”, “Supplement”, "Unspecified"; required for PDF/A-3)
  • data-cchip-bookmark: Title of optional bookmark entry
  • data-cchip-bm-path: Optional path into bookmark tree

The <a> can't be empty and must include some visual content.

Example

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8"/>
    <title>PDF with File Attachment annotation</title>
    <style>
         ...
    </style>
  </head>

  <body>
        <p>This is a PDF, which contains an embedded text file.<br>It is linked with the text below:</p>
        <a
          href
			   data-cchip-embed="external/Hello.txt"
			   data-cchip-relationship="Supplement"
			   data-cchip-desc="Embedded text file"
			   data-cchip-mimetype="text/plain">
			     This is the File Attachment annotation.
		  </a>
  </body>

</html>
Click to copy

Embedding files as attachments into the generated PDF

Files can be embedded as a file attachment to the PDF by specifying a <link> tag with rel attribute with value "cchip-embedded-file". The href attribute of the link tag must point to a file.

  • rel: Value must be "cchip-embedded-file"
  • href: Path to file to embed

Optional attributes:

  • data-cchip-relationship: The AFRelationShip entry (“Source”, “Data”, “Alternative”, “Supplement”, "Unspecified"; required for PDF/A-3)
  • data-cchip-filename: Will be set to actual file name if not specified, otherwise it will be used as file name for the embedded file
  • data-cchip-mimetype: MIME type of attachment (required for PDF/A-3)
  • data-cchip-desc: Desription for attachment

Example

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8"/>
    <title>PDF with embedded file</title>
    <link
    	rel="cchip-embedded-file"
        href="<Path to file to embed>"
        data-cchip-relationship="<type of relationship, e.g. 'Supplement'>"
        data-cchip-desc="Some description"
        data-cchip-mimetype="<Type of embedded file, e.g. 'text/xml'>"
        data-cchip-filename="<define a custome file name, e.g. 'source.xml'>"
   />
   </head>
  <body>
    ...
  </body>
</html>
Click to copy

Add XMP Metadata

pdfChip allows the creation of XMP Metadata by using custom properties in <meta> tags inside <head>.

A <meta> tag is used for XMP metadata creation only if it contains all of the following attributes:

  • property
  • content
  • data-cchip-xmp-ns
  • data-cchip-xmp-prefix
  • data-cchip-xmp-property
  • data-cchip-xmp-type

The 'property' attribute

The contents of this attribute is actually not used for XMP creation, but according to the HTML specification it has to be present.

The 'content' attribute

The contents of this attribute will be used as XMP property value.

The 'data-cchip-xmp-ns' attribute

The cchip_xmp_ns attribute specifies the XMP namespace URI for the property.

The 'data-cchip-xmp-prefix' attribute

The cchip_xmp_prefix attribute specifies the preferred prefix for the XMP namespace URI of the property.

The 'data-cchip-xmp-property' attribute

The cchip_xmp_property attribute specifies the XMP property name.

The 'data-cchip-xmp-type' attribute

The cchip_xmp_type attribute specifies the XMP property value type.

Supported values (case insensitive):

  • langAlt: Creates a language alternative. Currently only the creation of the x-default entry is supported.
  • seq: Ordered list of simple types
  • bag: Unordered list of simple types
  • seqstruct: Ordered list of structured types
  • bagstruct: Unordered list of structured types

All other types are treated as simple XMP value types (e.g. Text, Date, …).

Arrays of simple types

The seq and bag property types create a new array if not already present and add the value to this array.

Arrays of structs

The seqstruct and bagstruct property types create a new array if not already present and add the struct value to this array. For specifying the namespace URI and prefix for the struct additional properties must be present in the <meta> tag:

  • data-cchip-xmp-struct-ns
  • data-cchip-xmp-struct-prefix

Struct members can be specified by the XMP Toolkit subpath syntax:

"History[1]/stEvt:when"

Examples

Adding the "dc:title" property

This example adds a language alternative for the dc:title property.

<html>
    <head>
        <meta
            property="Subject"
            content="ccmip test (Iñtërnâtiônàlizætiøn)"
            data-cchip-xmp-ns="http://purl.org/dc/elements/1.1/"
            data-cchip-xmp-prefix="dc"
            data-cchip-xmp-property="title"
            data-cchip-xmp-type="langAlt"
        >
    </head>
</html>

Adding a "xmpMM::History" property

This example adds a sequence of struct 'ResourceEvent'

<!-- Create a xmpMM:History Sequence of struct stEvt::ResourceEvent -->
<meta property="" content="Thursday, 06 August 2015 09:45 PM"
    data-cchip-xmp-ns="http://ns.adobe.com/xap/1.0/mm/"
    data-cchip-xmp-prefix="xmpMM"
    data-cchip-xmp-property="History"
    data-cchip-xmp-type="SeqStruct"
    data-cchip-xmp-struct-name="ResourceEvent"
    data-cchip-xmp-struct-ns="http://ns.adobe.com/xap/1.0/sType/ResourceEvent#"
    data-cchip-xmp-struct-prefix="stEvt"
>
<!-- Add an entry to the xmpMM:History sequence  -->
<meta property="" content="2013-09-06T16:01:13.000Z" 
    data-cchip-xmp-ns="http://ns.adobe.com/xap/1.0/mm/" 
    data-cchip-xmp-prefix="xmpMM" 
    data-cchip-xmp-property="History[1]/stEvt:when" 
    data-cchip-xmp-type="Date"
>
<meta property="" content="email_sent" 
    data-cchip-xmp-ns="http://ns.adobe.com/xap/1.0/mm/" 
    data-cchip-xmp-prefix="xmpMM" 
    data-cchip-xmp-property="History[1]/stEvt:action" 
    data-cchip-xmp-type="Text"
>
<meta property="" content="Zeitpunkt des Versands des Originals" 
    data-cchip-xmp-ns="http://ns.adobe.com/xap/1.0/mm/" 
    data-cchip-xmp-prefix="xmpMM" 
    data-cchip-xmp-property="History[1]/stEvt:parameters" 
    data-cchip-xmp-type="Text"
>
<meta property="" content="Microsoft Office Outlook 12.0" 
    data-cchip-xmp-ns="http://ns.adobe.com/xap/1.0/mm/" 
    data-cchip-xmp-prefix="xmpMM" 
    data-cchip-xmp-property="History[1]/stEvt:softwareAgent" 
    data-cchip-xmp-type="Text"
>

Create PDF Standards Identifier

pdfChip allows the creation of PDF documents that pretend compliancy to several PDF standards. There is no guarantee that the files are really compliant since no compliancy check is performed after creation of the PDF document. A <meta> tag is used for triggering the insertion of XMP metadata and Document Info entries for the following PDF standards:

PDF/A

If one of the PDF/A meta tags is present an XMP PDF/A Extension Schema will be created if necessary.

  • <meta property="cchip_pdfa" content="PDF/A-1a">
  • <meta property="cchip-pdfa" content="PDF/A-1b">
  • <meta property="cchip-pdfa" content="PDF/A-2a">
  • <meta property="cchip-pdfa" content="PDF/A-2u">
  • <meta property="cchip-pdfa" content="PDF/A-2b">
  • <meta property="cchip-pdfa" content="PDF/A-3a">
  • <meta property="cchip-pdfa" content="PDF/A-3u">
  • <meta property="cchip-pdfa" content="PDF/A-3b">

PDF/X

  • <meta property="cchip-pdfx" content="PDF/X-1A">
  • <meta property="cchip-pdfx" content="PDF/X-3">
  • <meta property="cchip-pdfx" content="PDF/X-4">

PDF/E

  • <meta property="cchip-pdfe" content="PDF/E-1">

PDF/VT

PDF/VT also sets PDF/X-4

  • <meta property="cchip-pdfvt" content="PDF/VT-1">
  • <meta property="cchip-pdfvt" content="PDF/VT-2">

PDF/UA

  • <meta property="cchip-pdfua" content="PDF/UA-1">

Add Output Intents

Output Intents can be included by specifying an <link> tag with rel attribute with value
"cchip-outputintent". The href attribute of the link tag must point to a PDF file that contains at least one Output Intent. pdfChip will parse the PDF file and extract the first Output Intent.

  • <link rel="cchip-outputintent" href="./templates/outputintent.pdf"/>

It will insert one Output Intent for every standard requested as described in "Create PDF Standards Identifier" if needed as well. All Output Intents will point to the same ICC profile.

  • <meta property="cchip-pdfx" ... > will result in /GTS_PDFX
  • <meta property="cchip-pdfa" ... > will result in /GTS_PDFA1
  • <meta property="cchip-pdfe" ... > will result in /GTS_PDFE

How to handle parts in separate HTML files

In practice, different parts of a planned document may be contained in a number of HTML files, which are using links between each other to jump between them. As a result pdfChip has to differ between external and internal cross references. It has to include and to adjust the links of those documents, which shall become part of the generated document and leave external links unchanged.

To achieve this, all (references) HTML files, which shall be included in the document have to be added to the CLI call:

pdfChip {path to cover/cover.html} {path to first chapter/first.html} {path to second chapter/second.html} ...

If an HTML contains a link (<a href="...">) and this link points to one of the input HTML files, this link will become a link annotation, otherwise it will be kept as is and this will become an URI action for an external resource. The HTML input files can be named identically.

  • If an HTML link has a href attribute and does not contain a fragment identifier ('#'), the first page of the linked document will be addressed
  • If a HTML-link has a href attribute and does contain a fragment identifier ('#'), the substring following the # character will be addressed and used as the ID

Defining the transparency blend space

Setting the blend space can be critical to ensuring consistent rendering results. The blend space for the PDF document to be created can be defined by means of the  "cchip-transparency-blendspace" value assigned to the 'rel' attribute inside a link tag in the  head section of the HTML document. The actual blend space can be defined as follows:

  • data-param (required); can have one of the following values:
    • DeviceCMYK
    • DeviceRGB
    • DeviceGray
    • ICC
  • href Either contains path to an ICC profile (only Gray, RGB and CMYK allowed) or is an empty string; only gets used if data-param = "ICC"

Whenever a transparency groups gets created, the following rules apply:

  • When a "cchip-transparency-blendspace" 'rel' entry in the head exists:
      Colorspace defined in data-param = ... (i.e. DeviceCMYK, DeviceRGB, DeviceGray or an ICC profile) will be used.
  • If no such entry exists
      If an OutputIntent is defined (e.g. per <meta name="cchip_pdfx" content="PDF/X-1a">), and the colorspace defined as destination is CMYK, DeviceCMYK will be used as transparency blendspace.
      If the OutputIntent defines a RGB or Gray colorspace as destination, the respective destination ICC profile will be used.
      If no OutputIntent is defined, the transparency blendspace will be set to DeviceCMYK

Examples

With referenced ICC profile

<html>
    <head>
        ...
        <link
                rel="cchip-transparency-blendspace"
                data-param = "ICC"
                href="./path/to/some/icc-profile.icc"
        />
        ...
    </head>
    <body>
        ...
    </body>

Without referenced ICC profile

<html>
    <head>
        ...
        <link
                rel="cchip-transparency-blendspace"
                data-param = "DeviceCMYK"
                href=""
        />
        ...
    </head>
    <body>
        ...
    </body>