XMP Metadata reports
Configuration
The configuration methods described here apply to the XMP metadata report created with the "Browse Metadata"/"Extract XMP metadata" action.
The configuration file enables the user to set up named filters for XMP metadata and other meta information (e.g. DocInfo Dictionary entries, or information retrieved from the PDF document directly such as image resolution).
The XMP metadata report will be created as an XML report according to the chosen config file.
File format
A report configuration file must be stored as a tab delimited UTF-8 encoded cfg file.
Constraints
For each prefix used for Include, Exclude or GroupingKey in the config file a matching namespace entry defining the namespace URI for this prefix must be contained in the file.
Example Configuration
You will find example configuration files in your CLI installation directory in "/var/Actions/Metadata/Filters/Export"
for the following sections:
- DublinCore
- EXIF
- General
- IPTC
- Photo
- PLUS
- Workflow
Configuration Keys
The configuration file can consist of the following elements (Keys):
DisplayName
A single entry that specifies the display name for this configuration. This key must exactly be defined once.
DictFlag |
0 = Name shall be interpreted as string, 1 = Name shall be interpreted as dict key |
Title |
if DictFlag = 1, Dict key for display name lookup (does only apply for implemented properties) if DictFlag = 0, the display name |
Example
DisplayName |
0 |
General |
DisplayName |
1 |
BOAGUI_MetaExpFilterDublinCore_long |
Namespace
Defines a namespace URI for usage in the XMP metadata report and associates it with a namespace prefix and a schema name. This key is optional and can be used multiple times.
Prefix |
The namespace prefix that is pre-ferred for the namespace URI, e.g. "dc" |
Namespace URI |
The namespace URI, e.g. "http://purl.org/dc/elements/1.1/" |
DictFlag |
0 = Schema shall be interpreted as string, 1 = Schema shall be interpreted as dict key |
Schema |
The display name for the sche-ma, that is associated with the namespace, e.g. "Dublin Core" |
Example
Namespace dc http://purl.org/dc/elements/1.1/ 0 Dublin Core
Property
Defines a namespace property for usage in the XMP metadata report and associates it with a namespace prefix and a property name. This key is optional and can be used multiple times.
Prefix |
The namespace prefix, e.g. "ptb_image" |
Name |
The properties name, e.g. "px_width" |
DictFlag |
0 = Label shall be interpreted as string, 1 = Label shall be interpreted as dict key |
Label |
Display name for the property, e.g. "Image width in pixels" |
Example
Property ptb_document file 0 File Name
GroupingKey
Defines the grouping of XML reports. For each value type of the property in the namespace that is associated with the prefix, a distinct XMP metadata report will be generated containing only elements defined by the "Include" clause, that have the same value for prefix/property. This key is optional but must not be used more than once.
Type |
Specifies for which objects in the PDF document the grouping key shall be searched/applied Possible values are Document, Page, Image |
Prefix |
The namespace prefix Must be included in the namespace definition |
Property |
The XMP property in the namespace identified by prefix to be used for report grouping |
Example
GroupingKey Image xmpRights Owner
Include
Whitelist. All metadata that matches any entry in this list and is not excluded by the Exclude statement will be exported. This key should at least be used once (in order to create a report at all) but can be used multiple times.
Type |
Specifies for which objects in the PDF document this include state-ment will be applied Possible values are Document, Page, Image, * Note: * is used as a wildcard and matches all types |
Prefix |
The namespace prefix Must be included in the namespace definition Note: * is used as a wildcard and matches all prefixes |
Property |
The XMP property in the namespace identified by prefix to be used for matching Note: * is used as a wildcard and matches all properties |
Example
Include Image ptb_image thumbnail
Exclude
Blacklist. All metadata that matches any entry in this list will not be exported. This key is optional and can be used multiple times.
Type |
Specifies for which objects in the PDF document this include statement will be applied Possible values are Document, Page, Image, * Note: * is used as a wildcard and matches all types |
Prefix |
The namespace prefix Must be included in the namespace definition Note: * is used as a wildcard and matches all prefixes |
Property |
The XMP property in the namespace identified by prefix to be used for matching Note: * is used as a wildcard and matches all properties |
Example
Exclude Document dc title
Order of filtering
Filtering will be executed in the following order:
GroupingKey not present
Include all items that
- match at least one entry in the white list (
Include
key) - and match no entry in the black list (
Exclude
key)
GroupingKey present
For each value of the XMP metadata property as defined in the GroupingKey a separate report will be created which includes all items that
- have an XMP metadata property as defined in the GroupingKey (e.g. "Image") which has a value as defined in the GroupingKey (e.g. "xmpRights")
- and match at least one entry in the white list (
Include
key) - and match no entry in the black list (
Exclude
key)
Handling of non-XMP metadata
There are some namespaces and properties additionally defined by pdfaPilot and pdfToolbox.
DocInfo Dictionary
Namespace URI |
http://www.callassoftware.com/ns/pdfaPilot2/1.0/metadatareport/docu-ment |
Namespace URI |
http://www.callassoftware.com/ns/pdfToolbox4/1.0/metadatareport/docu-ment |
Preferred prefix |
ptb_document |
Available properties
DocInfo_<key> |
Document info entry <key> |
<key> has to be one of the following PDF document info dictionary keys:
CreationDate |
The date when the PDF document was created |
ModDate |
The date when the PDF document was last modified |
Creator |
The application the original docu-ment was created with |
Producer |
The application the PDF was pro-duced with |
Title |
The document title |
Subject |
The subject of the document |
Keywords |
The keywords for the document |
Trapped |
The trapped key |
PageMode |
The mode in which the document shall be displayed when opened (e.g. "UseOutlines") |
PageLayout |
The way the pages are displayed when opening (e.g. "SinglePage") |
PdfXVersion |
The PDF/X version (e.g. PDF/X-1a) |
PdfXConformance |
The PDF/X conformance level (e.g. 1a) |
PdfE1Version |
The PDF/E version |
Example
Property ptb_document DocInfo_Creator 0 Creator
Image properties
Namespace URI |
http://www.callassoftware.com/ns/pdfaPilot2/1.0/metadatareport/image |
Namespace URI |
http://www.callassoftware.com/ns/pdfToolbox4/1.0/metadatareport/image |
Preferred prefix |
ptb_image |
Available properties
px_width |
Image width in pixel |
px_height |
Image height in pixel |
ppi_horizontal |
Horizontal resolution in ppi |
ppi_vertical |
Vertical resolution in ppi |
left |
Offset of left image border in pt (relative to crop box) |
right |
Offset of right image border in pt (relative to crop box) |
top |
Offset of top image border in pt (relative to crop box) |
bottom |
Offset of bottom image border in pt (relative to crop box) |
pt_llx |
Quad-Point Lower Left x in Pt |
pt_lly |
Quad-Point Lower Left y in Pt |
pt_ulx |
Quad-Point Upper Left x in Pt |
pt_uly |
Quad-Point Upper Left y in Pt |
pt_urx |
Quad-Point Upper Right x in Pt |
pt_ury |
Quad-Point Upper Right y in Pt |
pt_lrx |
Quad-Point Lower Right x in Pt |
pt_lry |
Quad-Point Lower Right y in Pt |
pt_width |
Image width in pt |
pt_height |
Image height in pt |
thumbnail |
Image thumbnail |
Example
Property ptb_image px_height 0 Image height in pixel
Page properties
Namespace URI |
http://www.callassoftware.com/ns/pdfaPilot2/1.0/metadatareport/page |
Namespace URI |
http://www.callassoftware.com/ns/pdfToolbox4/1.0/metadatareport/page |
Preferred prefix |
ptb_page |
Available properties
nr |
Page sequence number (’1’ basecd) |
cropbox_width |
Cropbox width |
cropbox_height |
Cropbox height |
cropbox_left |
Cropbox left |
cropbox_right |
Cropbox right |
cropbox_top |
Cropbox top |
cropbox_bottom |
Cropbox bottom |
thumbnail |
Page thumbnail |
Example
Property ptb_page cropbox_top 0 Cropbox top