Distributed Processing

Distributed Processing mode

callas pdfToolbox Server/CLI offers a distributed processing mode, allowing tasks to be efficiently distributed across the network to multiple "satellite" instances. The outcomes are then returned back to the "origin of processing". As a result, pdfToolbox Server/CLI can be initiated in various operational modes:

  • "Dispatcher" controls which tasks are to be processed by which machines: the "Satellites". There must always be at least one Dispatcher in the network. 
  • "Satellite" receives tasks from the "Clients" or directly from the Dispatcher (if the Dispatcher is running with hotfolders), processes them and sends them back to the "Clients".
  • "Client" asks the Dispatcher for Satellites. After receiving an available Satellite, it sends the tasks to the Satellites and receives the results after processing.
  • "Monitor" monitors the Dispatcher and displays the current situation.

All these components can run on the same machine or on different machines. There must be at least one Dispatcher and at least one Satellite in the network. At least one client is required to submit tasks.

Communication

  1. Clients send a request for Satellite to Dispatcher
  2. Dispatcher assigns a Satellite and sends the address to the Client
  3. Client sends the task to the Satellite
  4. Satellite returns result to client

Starting a Dispatcher

pdfToolbox --dispatcher [--port=<port number>]

Example:

pdfToolbox --dispatcher [--port=1234]

port is the port number on which the dispatcher can be called over the network. The dispatcher port is 1200 by default and can be optionally set to a different port (see example).


Starting a Dispatcher using the Server UI

It is also possible to start the server as a dispatcher on Windows and MacOS using the user interface (Desktop). Hotfolder processing can be set up here as well. In this mode, the dispatcher also distributes tasks sent by other clients.

Starting a Satellite

pdfToolbox --satellite --endpoint=<Dispatcher IP number>[:<dispatcher port]> [--port=<port number>] [--connections=<number of concurrent connections]

Example with default Dispatcher:

pdfToolbox --satellite --endpoint=10.0.0.100

Example with different Dispatcher port:

pdfToolbox --satellite --endpoint=10.0.0.100:1234 --port=4321

At least one Satellite is required to process tasks.

endpoint is the IP number and port of the Dispatcher. The dispatcher port is 1200 by default, but can be changed when the dispatcher is started (see "Starting a Dispatcher"). If the default dispatcher port is used, the dispatcher port can be omitted.

port is the one used by the Satellite to communicate with the Clients. The Satellite port is 1201 by default and can be set to a different port at startup.


Starting a Satellite using the Server UI

It is also possible to start the server as a Satellite on Windows and MacOS using the user interface (Desktop). In this mode, the Satellite will not process any hotfolder jobs on the computer.A Satellite will always use the number of CPUs on the machine as the number of concurrent connections/processes. To limit this number, the Satellite must be started on the CLI with the --connections parameter.

The number of connections should not exceed the number of CPUs, as this can reduce performance per process and lead to system instability.


Assigning more than one Dispatcher to a Satellite

To connect a Satellite to more than one Dispatcher, it is possible to define more than one (--) endpoint. Please refer to the "Fallback for Dispatcher" section at the end of this chapter.

Distributing a process using a Client

The client is called using any regular pdfToolbox command line call.To distribute the call over the network, the command line parameters --dist and --endpoint are added. The client will first ask the dispatcher for a satellite connection, then send the command to the satellite and wait for the result to be sent back from the satellite.

pdfToolbox --dist --endpoint=<dispatcher IP number>[:<dispatcher port>] <any regular pdfToolbox call>

Examples:

pdfToolbox --dist --endpoint=10.0.0.100 <anyProfile.kfpx> <myPDF.pdf>
pdfToolbox --dist --endpoint=10.0.0.100 --redistill <myPDF.pdf>

Distributed processing: Variables and resources

When using normal Profiles, there is nothing to worry about when processing a file. All needed resources (like ICC profiles or "Place content"-Templates) are included in the Profile.

But sometimes some advanced scripting of e.g. a template requires external resources that are defined/referenced by a variable.To ensure that these resources are passed to the satellite during distributed processing, a variant of the --setvariable=<variable> option can be used:

--setvariable=RESOLUTION:300
--setvariablepath=<path to ressources file or folder>

Set the type of satellite (Optional)

Since some types of tasks should only be processed on a specific type of Satellite, it is possible to start a Satellite with one or more types set.


Each CLI call to process a task can be customized to one or more types of allowed Satellites.


Set typification for Satellite:

pdfToolbox --satellite --endpoint=<dispatcher IP number> --satellite_type=<type> [--satellite_type=<type>]

for example:

pdfToolbox --satellite --endpoint=10.0.0.100 --satellite_type=A
pdfToolbox --satellite --endpoint=10.0.0.100 --satellite_type=A --satellite_type=B


Set typification for Client:

pdfToolbox --dist --endpoint=<dispatcher IP number> --satellite_type=<type> [--satellite_type=<type>] <any regular pdfToolbox call>

for example:

pdfToolbox --dist --endpoint=10.0.0.100 --satellite_type=A <any regular pdfToolbox call>

Implementation details:

• If a Satellite has been started with a typification, only Client calls with the same type will be send to this Satellite.

• If a Client call contains multiple typifications, all typifications must match with those set for a satellite.

• If a Client call has no typification set, it can be processed on all satellites, even if they were started with a typification.

• The <type> string must be alpha-numeric and is case sensitive.

Disallow local processing

As a fallback, processing might happen locally (on the Client or on Dispatcher if run in hotfolder mode) if  an action cannot be distributed, a Satellite cannot be assigned within a timeframe or if no Dispatcher is available.
Local processing might not be desired for several reasons.

To prevent such local processing, both the client call and the start of a Dispatcher (when used as a server for hotfolders) can be modified with the option:

--nolocal

Example for Client:

pdfToolbox --dist --endpoint=<dispatcher IP number> --nolocal <any regular pdfToolbox call>

Local processing will be disabled and tasks will fail if no Satellite is ready for processing.

Example for Dispatcher:

pdfToolbox --dispatcher --nolocal

Here --nolocal is forwarded to child processes for hotfolder jobs. It has no effect on the processing of non-hotfolder files from a Client distributed by the Dispatcher.
If a Client wants to disable local processing, the --nolocal setting has to be set in each CLI call of the Client.

Fallback for Dispatcher

Some workflow systems may require a fallback for a dispatcher to ensure production stability.To cover this, a number of Dispatchers can be set up to run individually. One or more Dispatchers can be assigned to a Satellite.

Define multiple Dispatcher for a Satellite

Connects a satellite to two (or more) Dispatchers.

pdfToolbox --satellite --endpoint=<dispatcher 1 IP> [--endpoint=<dispatcher 2 IP> [--endpoint=<dispatcher IP>]

Set multiple Dispatcher in a Client call

Distributes a Client call via two (or more) Dispatcher. The first reachable Dispatcher with a free satellite will handle the task.

pdfToolbox --dist --endpoint=<dispatcher 1 IP> --endpoint=<dispatcher 2 IP> [--endpoint=<dispatcher IP>] <any regular pdfToolbox call>

Define a timeout for processing

In some workflow systems, long-running processes may not be allowed and must be terminated when a certain time frame is reached.

Due to the flexibility of distributed processing, a variety of timeouts can be set for each part:

  • for the Client call
  • for the Satellite
  • for the Dispatcher

Timeout for processing on a Satellite

  • If a timeout is defined for the client call, execution will be canceled after the specified time.
  • If a timeout is defined when a Satellite is started, all tasks processed by that Satellite will be canceled after the specified time.
  • If both are defined, the shorter timeout is used.

Example for Client:

pdfToolbox --dist --endpoint=<dispatcher IP> --timeout_satellite=<seconds> <any regular pdfToolbox call>

Example for Satellite:

pdfToolbox --satellite --endpoint=<dispatcher IP> --timeout=<seconds>

Timeout for local processing of Dispatcher or Client
A processing timeout (if no satellite is available or if the type of task cannot be distributed) for the fallback to local processing on the Client or the Dispacher (when used as a server for hotfolders) can also be defined.
If both are defined, the shorter defined timeframe will be used.

Example for Client:

pdfToolbox --dist --endpoint=<dispatcher IP> --timeout=<seconds> <any regular pdfToolbox call>

Example for Dispatcher:

pdfToolbox --dispatcher --timeout=<seconds>

Timeout for Dispatcher to search for Satellites
Additionally, a dispatcher timeout can be set, which defines the timeframe in which a satellite is searched. This can be set individually for each client call.

Example for Client:

pdfToolbox --dist --endpoint=<dispatcher IP> --timeout_dispatcher=<seconds> <any regular pdfToolbox call>

When running the dispatcher in hotfolder mode, the setting can be specified when starting the dispatcher (will then affect all distributed files from hotfolders):

Example for Dispatcher:

pdfToolbox --dispatcher --timeout_dispatcher=<seconds>

If a satellite or dispatcher timeout is set and the --nolocal option is defined, the task will not be processed locally. Processing will result in an error.

Setting --timeout_... or --nolocal parameters in the Additional CLI Parameters section of the Server UI when defining hotfolder jobs is not supported.

Using the CLI-Monitor

pdfToolbox --monitor --endpoint=<dispatcher IP>:<dispatcher port> [--endpoint=<dispatcher IP>:<dispatcher port>]

Example:

pdfToolbox --monitor --endpoint=10.0.0.100

Monitor is optional and mirrors the dispatcher's command line output to another computer. Endpoint is the IP number and port of the dispatcher.
If more than one dispatcher is used, multiple dispatcher IPs can be entered and monitored.

Licensing

  • Server: Regular pdfToolbox Server/CLI license required
  • Dispatcher: Dispatcher pdfToolbox Server/CLI license required
  • Satellite: Regular pdfToolbox Server/CLI license required
  • Monitor: No license required
  • Client: No license required

Distributed processing can be combined with the License Server in order to activate some or all components of these modules via this License Server. The setup is described here.

Distributed Processing in Enfocus Switch

Simply configure the appropriate settings in the Configurator for the steps to be distributed. If all tasks are to be processed on other machines (satellites), no local server license is required.
Some installations made better experiences, when the setting "Concurrent transfers to the same site" in Switch was set to "Automatic". Also the "Default number of slots for concurrent elements" should not be 0.

Known limitations until version 11.1

Due to technical limitatations of the used communication method (SOAP), it is not possible to process files greater than 2 GB using distributed processing until pdfToolbox v.11.1.

Since version 11.1, callas has removed this technical limitation. Until this version, files are processed locally on the client.