The JSON log files
There are three JSON logging files that will be created during each successfully completed invocation of pdfToolbox:
-
launch.json:
immediately created upon launching pdfToolbox CLI or on invocating processing of a PDF in pdfToolbox in the desktop version; neither is the kfpx file loaded yet nor is the PDF file to be processed accessed yet; only minimal information about the environment and context is collected, including the content of the command line, plus an automatically generated unique ID (app_uuid) and the job ID (when it was passed as a parameter on the command line) -
init.json:
gets written once the kfpx file is loaded and parsed, the PDF file to be processed is opened and some basic information extracted from the PDF file -
finish.json:
gets written once pdfToolbox exits (regardless whether processing was successful or not).
If init.json or finish.json are seemingly missing from logging, this is an indication that pdfToolbox terminated prematurely without writing those files, which can serve as a good basis for post mortem analysis. If init.json is not written, something probably went wrong while reading the kfpx file, accessing the PDF file for some basic analysis, or similar. If only finish.json is missing, something went wrong during execution of the kfpx profile, or when creating reports or producing some other output.
Structure of 'launch.json'
name | value |
verb |
type of logging file. Possible values: launch. Note: The other values (init, finish) are only used by the corresponding init.json and finish.json files. |
app_uuid | automatically generated unique ID on a per invocation basis. This unique ID is guaranteed to be the same value across all log files belonging to the same invocation (and also identical to the parent folder name containing the corresponding .json files) |
timestamp | timestamp. Format: YYYY/MM/DD hh:mm:ss |
timestamp_hour | the hour portion of the timestamp in the format YY (e.g. 20) |
timestamp_month | the month portion of the timestamp in the format MM (e.g. 07) |
timestamp_weekday | the weekday portion of the timestamp in the format W (e.g. 3 for Wednesday) |
job_id | a job ID provided via command line argument (not available in desktop version) |
process_id | process ID of the process in the operating system |
filename | file name of the file to be processed |
filepath |
folder path and file name of the file to be processed |
cli_params |
command line parameters |
program_name | name of the program (e.g. callas pdfToolbox CLI (x64)) |
program_version | version of the program (e.g. 10.0.461) |
platform | platform on which pdfToolbox is executed (e.g. Mac OS X 10.10.5) |
machine_ips |
list of IP addresses of the machine on which pdfToolbox is executed, in the form of an array of strings. Example: machine_ips : [ "192.168.1.1", "123.45.67.89"] Note: As machines can have more than one IP address, this entry is structured as a list of entries. Known limitation: implemented for Windows/Mac OS X/Linux platforms only |
machine_name | machine name of the machine on which pdfToolbox is executed |
machine_uuid |
UUID of the machine on which pdfToolbox is executed Note: this is a UUID derived from hardware parameters of the current machine, with some parts of the information removed such that the UUID is still unique but the hardware parameters as such cannot to be derived from the UUID. |
temp_folder | folder path to the temp folder used during invocation of pdfToolbox |
Example content for launch.json
{
"verb" : "launch",
"app_uuid" : "0a0ceba3-7ca9-421f-a973-8caae2950690",
"timestamp" : "2016/10/31 20:08:27",
"timestamp_hour" : 20,
"timestamp_month" : 10,
"timestamp_weekday" : 1,
"process_id" : 29543,
"job_id" : "some_job_ID_provided_through_cli_parameter",
"filename" : "4-Catching Text in PDFs - Michael Fuchs.pdf",
"filepath" : "/var/folders/80a56def-aa5f-418b-9685-8614ab6d41c2/0x10dd99000/4-Catching Text in PDFs.pdf",
"cli_params" : ["--hitsperpage=50", "--report=ERROR,WARNING,TEMPLATE=OVERVIEW"],
"program_name" : "callas pdfToolbox CLI (x64)",
"program_version" : "9.1.417",
"platform" : "Mac OS X 10.10.5",
"machine_ips" : ["192.168.17.63"],
"machine_name" : "pdftoolbox_satellite_3",
"machine_uuid" : "---6E851-41E8-5060-B0A7-C0550F43E418",
"temp_folder" : "/var/folders/yb/16cjr4dn2c5_27r2khx1bvbh0000gn/T/com.callassoftware.pdfToolbox/32e5717f-240a-4a17-86fe-a95551808721",
"temp_folder_hdd" : "",
}
Structure of init.json
Description of each entry in init.json – an addition to those entries described above for "launch.json". The verb entry has a value of "init".
name | value |
doc_created | timestamp when document was created; same format as timestamp entry (e.g. "2015/06/03 12:23:53.000") |
doc_id1 | first of the two values in the document ID entry in the document (e.g. "DB10E96543FE2B4E93226FA065FE83BC") |
doc_id2 | second of the two values in the document ID entry in the document (e.g. "00AFB863B1344FDDBE90D24E513AB992") |
doc_modified | timestamp when document was last modified; same format as timestamp entry (e.g. "2015/06/08 17:03:18.000") |
doc_pages | number of pages in the document |
doc_size | size of the PDF file in bytes (e.g. "863672", equivalent to ca. 863 KB) |
firstpage_size | structure reflecting the size of the first page in the PDF (see further below for a definition of the firstpage_size structure) |
pdf_creator | value of the creator entry in the document metadata (e.g. "Acrobat PDFMaker 10.1 for PowerPoint) |
pdf_encrypted |
whether the PDF is encrypted; possible values: 0 (not encrypted) and 1 (encrypted) Note: for encrypted PDF files an init.json file is only written when a correct password is specified |
pdf_version | PDF version of the PDF file (e.g. 1.7) |
pdf_writer | value of the writer entry in the document metadata (e.g. Adobe PDF Library 10.0) |
pdfa_version | if present, the PDF/A version (e.g. "PDF/A-3u") |
pdfe_version | if present, the PDF/E version (e.g. "PDF/E-2r") |
pdfua_version | if present, the PDF/UA version (e.g. "PDF/UA-1") |
pdfx_oi_icc_name | if present, the name of the PDF/X OutputIntent profile (e.g. "PSO Coated v3") |
pdfx_oi_info | if present, the text in the OutputIntent Info field |
pdfx_oi_output_cond_id | if present, the value of the PDF/X OutputIntentIdentifier (e.g. "FOGRA39") |
pdfx_version | if present, the PDF/X version (e.g. "PDF/X-1a") |
profile_filename | file name of the kfpx profile (e.g. "Convert to PDFA-1a.kfpx") |
profile_id |
internal ID of the kfpx profile (e.g. "P959c755539c8439e62c516c66a4a9097") |
profile_name |
human readable name of the kfpx profile (e.g. "Sheetfed offset (CMYK, RGB and spot colors) (GWG 2015)") |
variables | data structure representing the variables as evaluated upon initiating processing (equivalent to the JavaScript object app.variables; for details see documentation on "Variables and JavaScript") |
'firstpage_size' entry
The 'firstpage_size' sructure reflects the various page geometry boxes:
- Each of the page geometry boxes (mediabox, cropbox, bleedbox, trimbox, artbox) have an array of four entries as their value, in the order left, bottom, right, top.
- Each value in the array represent is expressed in pt (inch/72)
- The cropsize represents the effective width (w) and height (h) of the CropBox, or in its absence that of the MediaBox.
- The trimsize represents the effective width (w) and height (h) of the TrimBox, or in its absence that of the CropBox, or in its absence that of the MediaBox.
- The bleed represents the effective bleed on the four sides (in the order left, bottom, right, top), based on the trimsize. If the BleedBox is missing, all four values are 0 (zero).
Structure of page_size entry
"firstpage_size" : {
"mediabox" : [l b r t] ,
"cropbox" : [l b r t] ,
"bleedbox" : [l b r t] ,
"trimbox" : [l b r t] ,
"artbox" : [l b r t] ,
"cropsize" : [w h] ,
"trimsize" : [w h] ,
"bleed" : [l b r t]
}
Example content for init.json
{
"verb" : "init",
"app_uuid" : "0a0ceba3-7ca9-421f-a973-8caae2950690",
"timestamp" : "2016/10/31 20:08:27",
"timestamp_hour" : 20,
"timestamp_month" : 10,
"timestamp_weekday" : 1,
/* ... and all further entries defined for "launch.json" */
"profile" : "P959c755539c8439e62c516c66a4a9097",
"profile_name" : "Sheetfed offset (CMYK, RGB and spot colors) (GWG 2015)",
"doc_size" : 863672,
"doc_created" : "2015/06/03 12:23:53.000",
"doc_modified" : "2016/10/31 20:08:28.000",
"pdf_version" : "1.5",
"pdf_creator" : "Acrobat PDFMaker 10.1 für PowerPoint",
"pdf_writer" : "Adobe PDF Library 10.0",
"doc_id1" : "DB10E96543FE2B4E93226FA065FE83BC",
"doc_id2" : "00AFB863B1344FDDBE90D24E513AB992",
"doc_pages" : 23,
"firstpage_size" : {
"mediabox" : [-10 0 581 615] ,
"cropbox" : [-10 0 581 615] ,
"bleedbox" : [5 5 559 570] ,
"trimbox" : [10 10 551 565] ,
"artbox" : [] ,
"cropsize" : [591 625] ,
"trimsize" : [541 555] ,
"bleed" : [5 5 8 5]
}
"variables" : {
"Calcs_for_LFP_Preflight_-_viewing_distance" : {
"eff_min_fontsize":null,
"eff_min_imageresolution":null
},
"eff_min_fontsize":200,
"eff_min_imageresolution":40,
"input_scalingfactor":100,
"input_viewingdistance":10
}
}
Structure of finish.json
Description of each entry in finish.json (in addition to those entries described above for "launch.json" and for "init.json")
name | value |
retcode | the program exit code (see pdfTooolbox CLI manual for details). Note: value is provided as an integer. |
duration | duration of processing (essentially the difference between timestamp at finish and timestamp at lauch), formatted as hh:mm:ss:ttt (e.g. "0:00:11:045") |
doc_corrections | number of corrections applied during processing |
doc_max_severity | maximum severity; defined values are: 3 = error, 2 = warning, 1 = info, 0 = no message |
doc_messages | number of messages, i.e. the combined total of error, warning and info messages |
doc_errors | number of error messages |
doc_errors_list | an array of error details; each array entry contains an error-name together with its counter |
doc_warnings | number of warning messages |
doc_warnings_list | an array of warning details; each array entry contains a warning-name together with its counter |
doc_infos | number of info messages |
doc_infos_list | an array of info details; each array entry contains an info-name together with its counter |
num_images |
number of images |
num_fonts | number of fonts; two different font resources where the font name happens to be the same are counted as two fonts |
fonts | data structure representing the fonts in the PDF file |
num_spotcolors | number of spot colors; i.e. all Separation colour spaces whose name is not one of Cyan, Magenta, Yellow, Black, All or None |
spotcolor_names | array of spot colour names; for example: "spotcolor" : [ "Orange", "Purple" ] ) |
num_icc_profiles |
number of all ICC profiles; excluding ICC profiles in output intents |
icc_profiles_gray | array of names of ICC profiles; excluding ICC profiles in output intents; the CalGray colourspace, which strictly speaking is not an ICC based colourspace, is reported here as "CalGray"; for example: "icc_profiles_gray" : [ "Generic Gray Profile", "Gamma 2.2 Gray"] ) |
icc_profiles_rgb | array of names of 3-component ICC profiles; excluding ICC profiles in output intents. RGB ICC profiles are reported by their name (content of 'desc' field), the CalRGB colourspace, which strictly speaking is not an ICC based colourspace, is reported here as "CalRGB", the Lab colorspace, which strictly speaking is not an ICC based colourspace either, is reported here as "Lab"; for example: "icc_profiles_rgb" : [ "eciRGB v2", "CalRGB", "Lab"] |
icc_profiles_cmyk | array of names of CMYK ICC profiles; excluding ICC profiles in output intents; for example: "icc_profiles_cmyk" : [ "PSO Coated v3", "US Web Coated SWOP"] ) |
icc_profiles_lab | array of names of "Lab" ICC profiles; the "Lab" colour space, which strictly speaking is not an ICC based colur space, is still reported here as "Lab"; for example: "icc_profiles_lab" : [ "Lab"] |
pdfx_version | if present, the PDF/X version (e.g. "PDF/X-1a") |
pdfx_oi_output_cond_id | if present, the value of the PDF/X OutputIntentIdentifier (e.g. "FOGRA39") |
pdfx_oi_info | if present, the text in the OutputIntent Info field |
pdfx_oi_icc_name | if present, the name of the PDF/X OutputIntent profile (e.g. "PSO Coated v3") |
pdfa_version | if present, the PDF/A version (e.g. "PDF/A-3u") |
pdfua_version | if present, the PDF/UA version (e.g. "PDF/UA-1") |
pdfe_version | if present, the PDF/E version (e.g. "PDF/E-2r") |
pdf_encrypted | whether the PDF is encrypted; possible values: 0 (not encrypted) and 1 (encrypted) |
Sub-structure for fonts
"fonts" : [
{
"fontname" : "TimesNewRomanPS-BoldMT",
"fonttype" : "Type1",
"embedded" : 1,
"subset" : 1
},
{
"fontname" : "MyriadPro-BoldItalic",
"fonttype" : "Type0",
"embedded" : 1,
"subset" : 1
}
]
Sub-structure for errors_list/warnings_list/infos_list
Each entry in each of the three lists contains a key value pair, where the key is the name of a check (as configured in the kfpx used), and the value is a number n/m/o/p/q/r/s/t/u that reflects the number of occurrences the respective error/warning/info was triggered.
"doc_errors_list" : [
{
"name of check that has triggered an error" : n
},
{
"another name of a check that has triggered an error" : m
},
{
"yet another name of a check that has triggered an error" : o
}
]
"doc_warnings_list" : [
{
"name of check that has triggered a warning" : p
},
{
"another name of a check that has triggered a warning" : q
},
{
"yet another name of a check that has triggered a warning" : r
}
]
"doc_infos_list" : [
{
"name of check that has triggered an info message" : s
},
{
"another name of a check that has triggered an info message" : t
},
{
"yet another name of a check that has triggered an info message" : u
}
]
Example content for finish.json
{
"verb" : "finish",
"app_uuid" : "0a0ceba3-7ca9-421f-a973-8caae2950690",
"timestamp" : "2016/10/31 20:08:27",
"timestamp_hour" : 20,
"timestamp_month" : 10,
"timestamp_weekday" : 1,
/* ... and all further entries defined for "launch.json" */
"profile" : "P959c755539c8439e62c516c66a4a9097",
"profile_name" : "Sheetfed offset (CMYK, RGB and spot colors) (GWG 2015)",
/* ... and all further entries defined for "init.json" */
"retcode" : "8",
"duration" : "0:00:11:045",
"doc_corrections" : 870,
"doc_max_severity" : 3,
"doc_messages" : 236,
"doc_errors" : 160,
"doc_errors_list" : [
{
"Font not embeddded" : 17
},
{
"DeviceRGB used" : 28
},
{
"TrimBox entry missing" : 1
}
]
"doc_warnings" : 76,
"doc_warnings_list" : [
{
"Resolution less than 200ppi for continuous tone image" : 3
},
{
"Page empty" : 2
}
]
"doc_infos" : 0,
"doc_infos_list" : [
{
"Uses spot color" : 61
},
{
"Uses transpanrency" : 39
}
]
"pdf_encrypted" : 0,
"num_images" : 70,
"num_fonts" : 2,
"fonts" : [
{
"fontname" : "TimesNewRomanPS-BoldMT",
"fonttype" : "Type1",
"embedded" : 1,
"subset" : 1
},
{
"fontname" : "MyriadPro-BoldItalic",
"fonttype" : "Type0",
"embedded" : 1,
"subset" : 1
}
],
"num_spotcolors" : 3,
"spotcolor_names" : [
"Orange",
"Purple",
"Varnish"
],
"num_icc_profiles" : 3,
"icc_profiles_gray" : [
],
"icc_profiles_rgb" : [
"sRGB",
"eciRGB v2"
],
"icc_profiles_cmyk" : [
"PSO Coated v3"
],
"pdfx_version" : "PDF/X-4",
"pdfx_oi_output_cond_id" : "FOGRA39",
"pdfx_oi_info" : "Prepared for ISO 12647-2:2013, coated sheet fed offset",
"pdfx_oi_icc_name" : "PSO Coated v3",
"pdfa_version" : "PDF/A-2b",
"pdfua_version" : "",
"pdfe_version: ""
}