Table of Contents | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
WORK IN PROGRESS!
Introduction
Consider the situation where files with different categories (i.e. files with similar characteristics such as type, source, or similar) can be processed in parallel but all the files of any one category have to be processed sequentially.
...
Graphviz |
---|
digraph \{ graph [rankdir=TB] node [shape=Mrecord,style="filled",fillcolor=lightblue] node [fontsize=10] ranksep=0.3 subgraph cluster_0 \{ style=filled; color=lightgrey; node [style=filled,color=white]; label = "JobScheduler"; labeljust = "l"; pad = 0.1; FileOrderSource[label= "\{<f0> FileOrder\nSource|<f1>Identify\ Category|<f2>Is\ Lock\ Set|\{<f3>Yes|<f4>No\}\}"]; Setback[label= "\{<f0>Setback\nWait\}"]; StartProcessing[label= "\{<f0>Set\ Lock|Start\ Processing\ File\}"]; EndProcessing[label= "\{<f0>End\ Processing\ File|<f1>Release\ Lock\}"]; FileOrderSource:f3 -> Setback -> FileOrderSource:f2; FileOrderSource:f4 -> StartProcessing -> EndProcessing [weight=4]; \} File_Transfer -> FileOrderSource; \} |
- The most important feature of this solution is that it only requires one job chain and one set of jobs to seperate the processing of different categories of files, thereby sinplifying maintenance of the jobs and job chain.
- Use a single 'load_file' File Order Source directory to which all files are delivered.
- JobScheduler would use regular expressions to identify the files arriving in this directory on the basis of their names, timestamps or file extensions and forward them for processing accordingly.
- JobScheduler would then set a 'lock' for the subsidiary whose file is being processed to prevent further files from this subsidiary being processed as long as processing continues.
No Format Should a 'new' file from this subsidiary arrive whilst its predecessor is being processed, the job 'receiving' the new file will be 'set back' by JobScheduler as long as the lock for the subsidiary is set.
- This lock would be released by JobScheduler once processing of the 'first' file has been completed.
- The job 'receiving' the new file will now be able to forward the new file for processing.
...
- JS starts as soon as file matching with Regular Expression found in the directory.
No Format This directory is set in the "File Order Sources" area in the "Steps/Nodes" view of the "load_files" job chain. TODO: SCREENSHOT
- JobScheduler's
aquire_lock
job matches files using regular expressions and determines the file's category - for example, Berlin or Munich.No Format The regular expressions are also defined in the "File Order Sources" area shown in the screenshot. Note that for simplicity the regular expressions match the prefixes "a" and "b" in the file names and not directly "Berlin" or "Munich". The aquire_lock job uses a Rhino JavaScript to try to aquire the lock and wait if the lock is not available. This script is listed below.
...
- Once aquire_lock finds the matching category it will try to set a semaphore (flag) using JobScheduler's inbuilt LOCK mechanism
- Only one instance of each LOCK is allowed as can be seen in the screenshot below. Once a LOCK has been assigned to first a file of from a category (either Berlin or Munich), next file all subsequent files for this category has haves to wait with a setback until the LOCK is freehas been freed. TODO - ADD SCREENSHOT
- The same mechanism will be repeated for files from other categories. As long as a file of any given category is not being processed and therefore the corresponding LOCK not been set, the way will be free for the file from the other category to be allowed to be processed.
No Format This can be seen in the following screenshot of JobScheduler's JOE interface showing the progression of file orders along the {{load_files}} job chain.
- Once process is finished depending upon success or error, JobScheduler will move the file from Once process is finished depending upon success or error, JobScheduler will move the file from the in folder to either the done (on success) or failed (on error) folders.
- After moving input file to correct target directory JobScheduler, the
release_lock
job will be called, which will remove the lock/semaphore from JobScheduler and allow the next file from same category to be processed.
The following screenshot from the JobScheduler's JOE interface show
Code Block | ||
---|---|---|
| ||
function spooler_process() \{ try \{ var parameters = spooler_task.order().params(); var filePath = "" + String(parameters.value("scheduler_file_path")); spooler_log.info( " scheduler_file_path : " + filePath ); var lockName = "" + String(parameters.value( "lock_name" )); spooler_log.info( " lock_name : " + lockName ); /* var fileParts = filePath.split("\\"); var fileName = fileParts[fileParts.length-1]; spooler_log.info( "fileName : " + fileName ); if(fileName.match("^a[A-Za-z0-9_]*\.csv$")) \{ var lockName = "BERLIN_PROC"; var lock_name = "BERLIN_PROC"; spooler_log.info( "File matched with berlin lock_name : "+ lockName ); \} if(fileName.match("^b[A-Za-z0-9_]*\.csv$")) \{ var lockName = "MUNICH_PROC"; spooler_log.info( "File matched with berlin lock_name : "+ lockName ); \} */ if (spooler.locks().lock_or_null( lockName )) \{ spooler.locks().lock( lockName ).remove(); \} return true; \} catch (e) \{ spooler_log.warn("error occurred: " + String(e)); return false; \} \} |
- A job to force -
release_lock
job is also required for the situation that the processing of a job fails without the corresponding lock being released.No Format The script for this job to release the 'Berlin' lock would look something like:
...
- Just copy files from the 'Data/__test-files' folder to the 'in' folder, :
- JobScheduler will automatically start processing within a few seconds
- Once processing has been completed the file(s) added to the 'in' folder will be moved to the 'done' or 'failed' folders, depending on whether processing was successful or not.
- DO NOT attempt to start an order for the job chain. This will only cause an error in the
aquire_lock
job.
How does the Demo Work?
{{DiagramBoxRight
...