Distributed processing

Distributed Processing mode

callas pdfaPilot Server/CLI can be used in distributed processing mode in which all jobs are distributed over the network to as many "satellites" as being present and results are send back to the originator. Therefore pdfaPilot Server/CLI may be started in different modes:

  • "Dispatcher" must to be present exactly one time in the network. This node controls which jobs are to be processed by which machines: the "satellites".
  • "Satellite" receives jobs from the clients or directly from the dispatcher (if the dispatcher is run with hotfolders), processes them and sends them back to the clients.
  • "Client" asks the dispatcher for satellites and after receiving the next satellite it sends jobs to the satellites and receives the results.
  • "Monitor" monitors the dispatcher and displays the current situation.

All of these modules can run on the same or on different machines. There needs to be exactly one dispatcher and at least one satellite. In order to submit jobs at least one client is required. Distributed processing is supported for Windows, Mac OS X, Linux, Sun Solaris and Sun Intel. It is not available on AIX.

 

Starting a Dispatcher

--dispatcher [--port=<port number>] [--noserver] 

Example:

--dispatcher [--port=1300] --noserver 

Port is the port number on which the Dispatcher can be called over the network. This port is set to 1300 as default.

  • Note: By the setting the --noserver option, the Dispatcher will not observe existing hotfolders, but only distribute jobs to Satellites sent in by Clients. This option is only available using the CLI.

Starting a Dispatcher using the ServerUI

There is also the possibility to start a server as a dispatcher on Windows and MacOS using the user interface. Also hotfolder-processing can be set up here. In this mode, the Dispatcher will also distribute jobs which are send by other Clients.

Starting a Satellite

--satellite --endpoint=<dispatcher ip number>[:<dispatcher port>] [--port=<port number>] [--connections=<number of concurrent connections] 

Example:

--satellite --endpoint=10.0.0.100:1300 --port=1301 

In order to process jobs at least one Satellite is required.
Endpoint is the IP number and the port of the Dispatcher. Default is 1300, but it can be changed at the start of the Dispatcher (see above).
Port is the port that the Satellite is using in order to communicate with the Clients. The port of the Satellite is 1301 as default and can be defined optionally to another one port at the startup.
It is highly recommended to use separate port numbers for the communication between Satellite and Dispatcher than for Satellite and Client.

Starting a Satellite using the ServerUI

There is also the possibility to start a Server as a Satellite on Windows and MacOS using the user interface. In this mode, the Satellite will not process any hotfolder jobs on the computer.

  • Note: A Satellite will always use the number of CPUs on the respective machine as the number of concurrent connections/processes. To limit this number, the Satellite has to be started by CLI with the  --connections parameter. The number of connections should not exceed the number of CPUs, as this might reduce the performance per process and could result in some system stability problems.

Distribute a process using a Client

The client is called using any regular pdfaPilot command line command. In order to distribute the call over the network the command line param-­eters --dist and --endpoint are added. The client will then first ask the Dis-­patcher to receive a Satellite connection and then send the command to the Satellite and wait until the result is sent back from the Satellite.

pdfaPilot --dist --endpoint=<dispatcher ip number>[:<dispatcher port>] <any regular pdfaPilot call> 

Examples:

pdfaPilot --dist --endpoint=10.0.0.100:1300 <myPDF.pdf> 
pdfaPilot --dist --endpoint=10.0.0.100:1300 --level=2b --analyze <myPDF.pdf> 

Set type of satellite

As some kinds of jobs shall only be processed on a defined type of Satel-­lite, it is possible to start a Satellite with one or more types set. Every CLI call can also be amended with one or more typification of allowed types of Satellites the job shall be processed by.

Set typification for Satellite:

pdfaPilot --satellite --endpoint=<dispatcher IP number> --satellite_type=<type> [--satellite_type=<type>] 

for example:

pdfaPilot --satellite --endpoint=10.0.0.100 --satellite_type=A 
pdfaPilot --satellite --endpoint=10.0.0.100 --satellite_type=A --satellite_type=B 

Set typification for Client:

pdfaPilot --dist --endpoint=<dispatcher IP number> --satellite_type=<type> [--satellite_type=<type>] <any regular pdfaPilot call>

for example:

pdfaPilot --dist --endpoint=10.0.0.100 --satellite_type=A <any regular pdfaPilot call> 

Implementation details:

  • If a Satellite has been started with a typification, only Client calls with the same type set will be send to this satellite.
  • If a Client call contains a number of typifications, all typifications must match with those set for a satellite.
  • If a Client call has no typfication set, it can be processe on all satellites, even they have been started with a typfication.
  • The <type>-string has to be alpha-numeric and is case sensitive.

Avoid local processing

As a fallback, processing can be performed locally if either the action can not be distributed, a Satellite can not be assigned within a timeframe or if no Dispatcher is available. This type of local processing might be not desired for several reasons. To avoid such local processing, the Client call can be amended as well as the start of a Dispatcher (if run as a server with hotfolders) with the option --nolocal.

Example for Client:

pdfaPilot --dist --endpoint=<dispatcher IP number> --nolocal <any regular pdfaPilot call> 

Example for Dispatcher:

pdfaPilot --dispatcher --nolocal 

Fallback for Dispatcher

In some workflow systems, a fallback for a Dispatcher might be required to ensure production stability. To cover this, a number of Dispatcher can be set up, which will run indi-­vidually. One or multiple Dispatcher can be assigned to a Satellite.

Define multiple Dispatcher to a Satellite

Connects a satellite to two (or more) Dispatcher.

pdfaPilot --satellite --endpoint=<dispatcher 1 IP> [--endpoint=<dispatcher 2 IP> [--endpoint=<dispatcher IP>] 

Set multiple Dispatcher in a Client call

Distributes a Client call via two (or more) Dispatcher. First reachable Dis-­patcher with free satellite will process the job.

pdfaPilot --dist --endpoint=<dispatcher 1 IP> --endpoint=<dispatcher 2 IP> [--endpoint=<dispatcher IP>] <any regular pdfaPilot call> 

Define a timeout for processing

In some workflow systems, long running processes might not be allowed and shall be cancelled if a give timeframe is reached. Due to the flexibility of distributed processing, a variety of timeouts for the individual parts can be set:

  • for the Client call
  • for the Satellite
  • for the Dispatcher

Timeout for processing on a Satellite

When defining a timeout for the Client call, the execution will be cancelled after the given period. When defining a timeout when starting a Satellite, all jobs processed by this Satellite will be cancelled after the given period. If both are defined, the shorter timeframe will be used.

Example for Client:

pdfaPilot --dist --endpoint=<dispatcher IP> --timeout_satellite=<seconds> <any regular pdfaPilot call> 

Example for Satellite:

pdfaPilot --satellite --endpoint=<dispatcher IP> --timeout=<seconds> 

Timeout for local processing of Dispatcher or Client

A processing timeout (if no satellite is available or if the type of job can not be distributed) for the fallback to local processing on the Client or the Dis-­pacher (when used as a server for hotfolders) can also be defined. If both are defined, the shorter timeframe will be used.

Example for Client:

pdfaPilot --dist --endpoint=<dispatcher IP> --timeout=<seconds> <any regular pdfaPilot call> 

Example for Dispatcher:

pdfaPilot --dispatcher --timeout=<seconds> 

Timeout for Dispatcher to search for Satellites

Additionally, also a timeout for the Dispatcher can be set, which will define the timeframe in which is searched for Satellites. This can also be set individually for every Client call or when starting the Dispatcher (will have effect on all distributed files then). If both are defined, the shorter timeframe will be used.

Example for Client:

pdfaPilot --dist --endpoint=<dispatcher IP> --timeout_dispatcher=<seconds> <any regular pdfaPilot call> 

Example for Dispatcher:

pdfaPilot --dispatcher --timeout_dispatcher=<seconds> 
  • Note: If a timeout for satellites or dispatcher is set and the --nolocal option has been defined, it will not be tried to process the job locally. Processing will end up in an error.
  • Note: Setting --timeout... or --nolocal parameters in the "Additional CLI param­eter" area of the Server UI is not supported at the moment.

Using the CLI-Monitor

pdfaPilot --monitor --endpoint=<dispatcher ip number>:<dispatcher port> [--endpoint=<dispatcher IP>:<dispatcher port>] 

Example:

--monitor --endpoint=10.0.0.100:1300 

Monitor is optional and mirrors the command line output of the dispatch-­er to another computer. Endpoint is the IP number and the port of the dispatcher. When using more than one Dispatcher, also multiple Dispatcher IPs can be entered and observed.

Communication

1) Clients sends a request for Satellite to Dispatcher

2) Dispatcher assigns a Satellite and send the address to the Client

3) Client send the job to the Satellite

4) Satellite send the result back to the Client

Licensing requirements for distriubted processing setups

  • Server: Regular pdfaPilot
    • Server/CLI license required
  • Dispatcher: Dispatcher pdfaPilot
    • Server/CLI license required
  • Satellite: Regular pdfaPilot
    • Server/CLI license required
  • Monitor:
    • No license required
  • Client:
    • No license required

0 Comments

Send Your Comment

E-Mail me when someone replies to this comment