XMP Metadata reports

Configuration

The configuration methods described here apply to the XMP metadata report created with the "Browse Metadata"/"Extract XMP metadata" action.

The configuration file enables the user to set up named filters for XMP metadata and other meta information (e.g. DocInfo Dictionary entries, or information retrieved from the PDF document directly such as image resolution).

The XMP metadata report will be created as an XML report according to the chosen config file.

File format

A report configuration file must be stored as a tab delimited UTF-8 encod­ed cfg file.

Constraints

For each prefix used for Include, Exclude or GroupingKey in the config file a matching namespace entry defining the namespace URI for this prefix must be contained in the file.

Example Configuration

You will find example configuration files in your CLI installation directory in "/var/Actions/Metadata/Filters/Export" for the following sections:

  • DublinCore
  • EXIF
  • General
  • IPTC
  • Photo
  • PLUS
  • Workflow

Configuration Keys

The configuration file can consist of the following elements (Keys):

DisplayName

A single entry that specifies the display name for this configuration. This key must exactly be defined once.

DictFlag

0 = Name shall be interpreted as string,

1 = Name shall be interpreted as dict key

Title

if DictFlag = 1, Dict key for display name lookup (does only apply for implemented properties)

if DictFlag = 0, the display name

Example

DisplayName

0

General

DisplayName

1

BOAGUI_MetaExpFilterDublinCore_long

Namespace

Defines a namespace URI for usage in the XMP metadata report and asso­ciates it with a namespace prefix and a schema name. This key is optional and can be used multiple times.

Prefix

The namespace prefix that is pre-­ferred for the namespace URI, e.g. "dc"

Namespace URI

The namespace URI, e.g. "http://purl.org/dc/elements/1.1/"

DictFlag

0 = Schema shall be interpreted as string,

1 = Schema shall be interpreted as dict key

Schema

The display name for the sche-­ma, that is associated with the namespace, e.g. "Dublin Core"

Example
Namespace dc http://purl.org/dc/elements/1.1/ 0 Dublin Core 

Property

Defines a namespace property for usage in the XMP metadata report and associates it with a namespace prefix and a property name. This key is optional and can be used multiple times.

Prefix

The namespace prefix, e.g. "ptb_image"

Name

The properties name, e.g. "px_width"

DictFlag

0 = Label shall be interpreted as string,

1 = Label shall be interpreted as dict key

Label

Display name for the property, e.g. "Image width in pixels"

Example
Property ptb_document file 0 File Name 

GroupingKey

Defines the grouping of XML reports. For each value type of the property in the namespace that is associated with the prefix, a distinct XMP metadata report will be generated containing only elements defined by the "Include" clause, that have the same value for prefix/property. This key is optional but must not be used more than once.

Type

Specifies for which objects in the PDF document the grouping key shall be searched/applied

Possible values are Document, Page, Image

Prefix

The namespace prefix

Must be included in the namespace definition

Property

The XMP property in the namespace identified by prefix to be used for report grouping

Example
GroupingKey Image xmpRights Owner 

Include

Whitelist. All metadata that matches any entry in this list and is not exclud­ed by the Exclude statement will be exported. This key should at least be used once (in order to create a report at all) but can be used multiple times.

Type
Specifies for which objects in the PDF document this include state-­ment will be applied
Possible values are Document, Page, Image, *
Note: * is used as a wildcard and matches all types
Prefix
The namespace prefix
Must be included in the namespace definition
Note: * is used as a wildcard and matches all prefixes
Property
The XMP property in the namespace identified by prefix to be used for matching
Note: * is used as a wildcard and matches all properties
Example
Include Image ptb_image thumbnail 

Exclude

Blacklist. All metadata that matches any entry in this list will not be export­ed. This key is optional and can be used multiple times.

Type
Specifies for which objects in the PDF document this include state­ment will be applied
Possible values are Document, Page, Image, *
Note: * is used as a wildcard and matches all types
Prefix
The namespace prefix
Must be included in the namespace definition
Note: * is used as a wildcard and matches all prefixes
Property
The XMP property in the namespace identified by prefix to be used for matching
Note: * is used as a wildcard and matches all properties
Example
Exclude Document dc title 

Order of filtering

Filtering will be executed in the following order:

GroupingKey not present

Include all items that

  • match at least one entry in the white list (Include key)
  • and match no entry in the black list (Exclude key)

GroupingKey present

For each value of the XMP metadata property as defined in the GroupingKey a separate report will be created which includes all items that

  • have an XMP metadata property as defined in the GroupingKey (e.g. "Image") which has a value as defined in the GroupingKey (e.g. "xmpRights")
  • and match at least one entry in the white list (Include key)
  • and match no entry in the black list (Exclude key)

Handling of non-XMP metadata

There are some namespaces and properties additionally defined by pdfaPilot and pdfToolbox.

DocInfo Dictionary

Namespace URI

http://www.callassoftware.com/ns/pdfaPilot2/1.0/metadatareport/docu-­ment

Namespace URI

http://www.callassoftware.com/ns/pdfToolbox4/1.0/metadatareport/docu-­ment

Preferred prefix

ptb_document

Available properties

DocInfo_<key>

Document info entry <key>

<key> has to be one of the following PDF document info dictionary keys:

CreationDate

The date when the PDF document was created

ModDate

The date when the PDF document was last modified

Creator

The application the original docu-­ment was created with

Producer

The application the PDF was pro-­duced with

Title

The document title

Subject

The subject of the document

Keywords

The keywords for the document

Trapped

The trapped key

PageMode

The mode in which the document shall be displayed when opened (e.g. "UseOutlines")

PageLayout

The way the pages are displayed when opening (e.g. "SinglePage")

PdfXVersion

The PDF/X version (e.g. PDF/X-1a)

PdfXConformance

The PDF/X conformance level (e.g. 1a)

PdfE1Version

The PDF/E version

Example
Property ptb_document DocInfo_Creator 0 Creator 

Image properties

Namespace URI

http://www.callassoftware.com/ns/pdfaPilot2/1.0/metadatareport/image

Namespace URI

http://www.callassoftware.com/ns/pdfToolbox4/1.0/metadatareport/image

Preferred prefix

ptb_image

Available properties

px_width

Image width in pixel

px_height

Image height in pixel

ppi_horizontal

Horizontal resolution in ppi

ppi_vertical

Vertical resolution in ppi

left

Offset of left image border in pt (relative to crop box)

right

Offset of right image border in pt (relative to crop box)

top

Offset of top image border in pt (relative to crop box)

bottom

Offset of bottom image border in pt (relative to crop box)

pt_llx

Quad-Point Lower Left x in Pt

pt_lly

Quad-Point Lower Left y in Pt

pt_ulx

Quad-Point Upper Left x in Pt

pt_uly

Quad-Point Upper Left y in Pt

pt_urx

Quad-Point Upper Right x in Pt

pt_ury

Quad-Point Upper Right y in Pt

pt_lrx

Quad-Point Lower Right x in Pt

pt_lry

Quad-Point Lower Right y in Pt

pt_width

Image width in pt

pt_height

Image height in pt

thumbnail

Image thumbnail

Example
Property ptb_image px_height 0 Image height in pixel 

Page properties

Namespace URI

http://www.callassoftware.com/ns/pdfaPilot2/1.0/metadatareport/page

Namespace URI

http://www.callassoftware.com/ns/pdfToolbox4/1.0/metadatareport/page

Preferred prefix

ptb_page

Available properties

nr

Page sequence number (’1’ basecd)

cropbox_width

Cropbox width

cropbox_height

Cropbox height

cropbox_left

Cropbox left

cropbox_right

Cropbox right

cropbox_top

Cropbox top

cropbox_bottom

Cropbox bottom

thumbnail

Page thumbnail

Example
Property ptb_page cropbox_top 0 Cropbox top