pdfChip specific HTML aspects

In pdfChip most valid HTML tags can be used. Due to the big amount of available tags and and even bigger number of possible combinations, some of them might result in an unexpected result. Due to the different needs for formatting content on a page with a fixed size than for a website (which shall be properly displayed on every output device) some formatting tags doesn't make sense.

This chapter contains some details of some special HTML features which has been added to achieve some special needs to be able to use PDFs (an not only images) as well as adding XMP Metadata, including PDF Standards identifier, adding an OutputIntent or attaching (embedding) files to the created PDF document. Please refer to the CSS chapter for details regarding layout.

Use PDF as image format

pdfChip allows the usage of PDF pages as source for image tags. Since PDFs can contain more than one page a syntax for selecting the page to be placed has been added to the HTML syntax.

The PDF that is positioned will not become rasterized, but rather the original PDF content is merged with the generated PDF document.

Also Adobe Illustrator (.ai) files can be used in the same way like PDF files. Only the PDF representation of the file will be used then, all internal Illustrator information stored in the file will become discarded and not be part of the new PDF file.

URL syntax for PDF pages

The URL for PDF supports the following features:

<URL>#page=<PAGE-NUM>&box=<BOXNAME>&boxadj=<LEFT>,<TOP>,<RIGHT>,<BOTTOM>
  • <URL>: the url to a PDF file
  • <PAGE-NUM>: the page number (one based)
  • <BOXNAME>: specify the page box used for placement: trim, crop, media, bleed, art. values can be specified in 'mm', 'pt', 'cm', 'pc', 'in' units. Default unit is 'pt. Default: CropBox
  • <LEFT>,<TOP>,<RIGHT>,<BOTTOM>: adjustment for the page box. Positive values will extend the selected page box. Default: 0
  • Note:
    If the page=<PAGE-NUM> part is missing the first page from the PDF referenced by URL is used for placement.

Example

Place the first page of "sample.pdf"

<img src="sample.pdf">

Places the second page of sample.pdf

<img src="sample.pdf#page=2">

Supported tags HTML and CSS properties

  • HTML Tags:
    • <img src=“sample.pdf#page=2”>
  • CSS properties
    • background:url(“sample.pdf#page=2”)
    • background-image:url(“sample.pdf#page=2”)

Create File Attachment annotations

File attachments can be created by using <a> link tags with pdfChip custom attributes.

A file attachment annotation is created if the <a> tag contains the following attributes:

  • href (not used)
  • data-cchip-embed: Path to file to embed

Optional attributes:

  • data-cchip-mimetype: MIME type of attachment
  • data-cchip-desc: Desription for attachment
  • data-cchip-relationship: the AFRelationShip entry (“Source”, “Data”, “Alternative”, “Supplement”)
  • data-cchip-bookmark: Title of optional bookmark entry
  • data-cchip-bm-path: Optional path into bookmark tree

Add XMP Metadata

pdfChip allows the creation of XMP Metadata by using custom properties in <meta> tags inside <head>.

A <meta> tag is used for XMP metadata creation only if it contains all of the following attributes:

  • property
  • content
  • data-cchip-xmp-ns
  • data-cchip-xmp-prefix
  • data-cchip-xmp-property
  • data-cchip-xmp-type

The 'property' attribute

The contents of this attribute is actually not used for XMP creation, but according to the HTML specification it has to be present.

The 'content' attribute

The contents of this attribute will be used as XMP property value.

The 'data-cchip-xmp-ns' attribute

The cchip_xmp_ns attribute specifies the XMP namespace URI for the property.

The 'data-cchip-xmp-prefix' attribute

The cchip_xmp_prefix attribute specifies the preferred prefix for the XMP namespace URI of the property.

The 'data-cchip-xmp-property' attribute

The cchip_xmp_property attribute specifies the XMP property name.

The 'data-cchip-xmp-type' attribute

The cchip_xmp_type attribute specifies the XMP property value type.

Supported values (case insensitive):

  • langAlt: Creates a language alternative. Currently only the creation of the x-default entry is supported.
  • seq: Ordered list of simple types
  • bag: Unordered list of simple types
  • seqstruct: Ordered list of structured types
  • bagstruct: Unordered list of structured types

All other types are treated as simple XMP value types (e.g. Text, Date, …).

Arrays of simple types

The seq and bag property types create a new array if not already present and add the value to this array.

Arrays of structs

The seqstruct and bagstruct property types create a new array if not already present and add the struct value to this array. For specifying the namespace URI and prefix for the struct additional properties must be present in the <meta> tag:

  • data-cchip-xmp-struct-ns
  • data-cchip-xmp-struct-prefix

Struct members can be specified by the XMP Toolkit subpath syntax:

"History[1]/stEvt:when"

Examples

Adding the "dc:title" property

This example adds a language alternative for the dc:title property.

<html>
    <head>
        <meta
            property="Subject"
            content="ccmip test (Iñtërnâtiônàlizætiøn)"
            data-cchip-xmp-ns="http://purl.org/dc/elements/1.1/"
            data-cchip-xmp-prefix="dc"
            data-cchip-xmp-property="title"
            data-cchip-xmp-type="langAlt"
        >
    </head>
</html>

Adding a "xmpMM::History" property

This example adds a sequence of struct 'ResourceEvent'

<!-- Create a xmpMM:History Sequence of struct stEvt::ResourceEvent -->
<meta property="" content="Thursday, 06 August 2015 09:45 PM"
    data-cchip-xmp-ns="http://ns.adobe.com/xap/1.0/mm/"
    data-cchip-xmp-prefix="xmpMM"
    data-cchip-xmp-property="History"
    data-cchip-xmp-type="SeqStruct"
    data-cchip-xmp-struct-name="ResourceEvent"
    data-cchip-xmp-struct-ns="http://ns.adobe.com/xap/1.0/sType/ResourceEvent#"
    data-cchip-xmp-struct-prefix="stEvt"
>
<!-- Add an entry to the xmpMM:History sequence  -->
<meta property="" content="2013-09-06T16:01:13.000Z" 
    data-cchip-xmp-ns="http://ns.adobe.com/xap/1.0/mm/" 
    data-cchip-xmp-prefix="xmpMM" 
    data-cchip-xmp-property="History[1]/stEvt:when" 
    data-cchip-xmp-type="Date"
>
<meta property="" content="email_sent" 
    data-cchip-xmp-ns="http://ns.adobe.com/xap/1.0/mm/" 
    data-cchip-xmp-prefix="xmpMM" 
    data-cchip-xmp-property="History[1]/stEvt:action" 
    data-cchip-xmp-type="Text"
>
<meta property="" content="Zeitpunkt des Versands des Originals" 
    data-cchip-xmp-ns="http://ns.adobe.com/xap/1.0/mm/" 
    data-cchip-xmp-prefix="xmpMM" 
    data-cchip-xmp-property="History[1]/stEvt:parameters" 
    data-cchip-xmp-type="Text"
>
<meta property="" content="Microsoft Office Outlook 12.0" 
    data-cchip-xmp-ns="http://ns.adobe.com/xap/1.0/mm/" 
    data-cchip-xmp-prefix="xmpMM" 
    data-cchip-xmp-property="History[1]/stEvt:softwareAgent" 
    data-cchip-xmp-type="Text"
>

Create PDF Standards Identifier

pdfChip allows the creation of PDF documents that pretend compliancy to several PDF standards. There is no guarantee that the files are really compliant since no compliancy check is performed after creation of the PDF document. A <meta> tag is used for triggering the insertion of XMP metadata and Document Info entries for the following PDF standards:

PDF/A

If one of the PDF/A meta tags is present an XMP PDF/A Extension Schema will be created if necessary.

  • <meta property="cchip_pdfa" content="PDF/A-1a">
  • <meta property="cchip-pdfa" content="PDF/A-1b">
  • <meta property="cchip-pdfa" content="PDF/A-2a">
  • <meta property="cchip-pdfa" content="PDF/A-2u">
  • <meta property="cchip-pdfa" content="PDF/A-2b">
  • <meta property="cchip-pdfa" content="PDF/A-3a">
  • <meta property="cchip-pdfa" content="PDF/A-3u">
  • <meta property="cchip-pdfa" content="PDF/A-3b">

PDF/X

  • <meta property="cchip-pdfx" content="PDF/X-1A">
  • <meta property="cchip-pdfx" content="PDF/X-3">
  • <meta property="cchip-pdfx" content="PDF/X-4">

PDF/E

  • <meta property="cchip-pdfe" content="PDF/E-1">

PDF/VT

PDF/VT also sets PDF/X-4

  • <meta property="cchip-pdfvt" content="PDF/VT-1">
  • <meta property="cchip-pdfvt" content="PDF/VT-2">

PDF/UA

  • <meta property="cchip-pdfua" content="PDF/UA-1">

Add Output Intents

Output Intents can be included by specifying an <link> tag with rel attribute with value "cchip_outputintent". The href attribute of the link tag must point to a PDF file that contains at least one Output Intent. pdfChip will parse the PDF file and extract the first Output Intent.

  • <link rel="cchip_outputintent" href="./templates/outputintent.pdf"/>

It will insert one Output Intent for every standard requested as described in "Create PDF Standards Identifier" if needed as well. All Output Intents will point to the same ICC profile.

  • <meta property="cchip-pdfx" ... > will result in /GTS_PDFX
  • <meta property="cchip-pdfa" ... > will result in /GTS_PDFA1
  • <meta property="cchip-pdfe" ... > will result in /GTS_PDFE

How to handle parts in separate HTML files

In practice, different parts of a planned document may be contained in a number of HTML files, which are using links between each other to jump between them. As a result pdfChip has to differ between external and internal cross references. It has to include and to adjust the links of those documents, which shall become part of the generated document and leave external links unchanged.

To achieve this, all (references) HTML files, which shall be included in the document have to be added to the CLI call:

pdfChip {path to cover/cover.html} {path to first chapter/first.html} {path to second chapter/second.html} ...

If an HTML contains a link (<a href="...">) and this link points to one of the input HTML files, this link will become a link annotation, otherwise it will be kept as is and this will become an URI action for an external resource. The HTML input files can be named identically.

  • If an HTML link has a href attribute and does not contain a fragment identifier ('#'), the first page of the linked document will be addressed
  • If a HTML-link has a href attribute and does contain a fragment identifier ('#'), the substring following the # character will be addressed and used as the ID

Defining the transparency blend space

Setting the blend space can be critical to ensuring consistent rendering results. The blend space for the PDF document to be created can be defined by means of the  "cchip-transparency-blendspace" value assigned to the 'rel' attribute inside a link tag in the  head section of the HTML document. The actual blend space can be defined as follows:

  • data-param (required); can have one of the following values:
    • DeviceCMYK
    • DeviceRGB
    • DeviceGray
    • ICC
  • href Either contains path to an ICC profile (only Gray, RGB and CMYK allowed) or is an empty string; only gets used if data-param = "ICC"

Whenever a transparency groups gets created, the following rules apply:

  • When a "cchip-transparency-blendspace" 'rel' entry in the head exists:
    • Colorspace defined in data-param = ... (i.e. DeviceCMYK, DeviceRGB, DeviceGray or an ICC profile) will be used.
  • If no such entry exists:
    • If an OutputIntent is defined (e.g. per <meta name="cchip_pdfx" content="PDF/X-1a">), and the colorspace defined as destination is CMYK, DeviceCMYK will be used as transparency blendspace.
    • If the OutputIntent defines a RGB or Gray colorspace as destination, the respective destination ICC profile will be used.
    • If no OutputIntent is defined, the transparency blendspace will be set to DeviceCMYK

Examples

With referenced ICC profile

<html>
    <head>
        ...
        <link
                rel="cchip-transparency-blendspace"
                data-param = "ICC"
                href="./path/to/some/icc-profile.icc"
        />
        ...
    </head>
    <body>
        ...
    </body>

Without referenced ICC profile

<html>
    <head>
        ...
        <link
                rel="cchip-transparency-blendspace"
                data-param = "DeviceCMYK"
                href=""
        />
        ...
    </head>
    <body>
        ...
    </body>