Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
outlinh1. true
outlinh1. true
1printablefalse
2stylh1. none
3indent20px

WORK IN PROGRESS!

Introduction

Consider the situation where files with different categories (i.e. files with similar characteristics such as type, source, or similar) can be processed in parallel but all the files of any one category have to be processed sequentially.

...

Graphviz
digraph \{

graph [rankdir=TB]
node [shape=Mrecord,style="filled",fillcolor=lightblue]
node [fontsize=10]
ranksep=0.3
	subgraph cluster_0 \{

		style=filled;
		color=lightgrey;
		node [style=filled,color=white];
		label = "JobScheduler";
		labeljust = "l";
		pad = 0.1;

FileOrderSource[label= "\{<f0> FileOrder&#92;nSource|<f1>Identify&#92; Category|<f2>Is&#92; Lock&#92; Set|\{<f3>Yes|<f4>No\}\}"];
Setback[label= "\{<f0>Setback&#92;nWait\}"];
StartProcessing[label= "\{<f0>Set&#92; Lock|Start&#92; Processing&#92; File\}"];
EndProcessing[label= "\{<f0>End&#92; Processing&#92; File|<f1>Release&#92; Lock\}"];

FileOrderSource:f3 -> Setback ->  FileOrderSource:f2;
FileOrderSource:f4 -> StartProcessing -> EndProcessing [weight=4];
	\}

File_Transfer ->  FileOrderSource;
\}
  • The most important feature of this solution is that it only requires one job chain and one set of jobs to seperate the processing of different categories of files, thereby sinplifying maintenance of the jobs and job chain.
  • Use a single 'load_file' File Order Source directory to which all files are delivered.
  • JobScheduler would use regular expressions to identify the files arriving in this directory on the basis of their names, timestamps or file extensions and forward them for processing accordingly.
  • JobScheduler would then set a 'lock' for the subsidiary whose file is being processed to prevent further files from this subsidiary being processed as long as processing continues.
    No Format
     Should a 'new' file from this subsidiary arrive whilst its predecessor is being processed, the job 'receiving' the new file will be 'set back' by JobScheduler as long as the lock for the subsidiary is set.
    
  • This lock would be released by JobScheduler once processing of the 'first' file has been completed.
  • The job 'receiving' the new file will now be able to forward the new file for processing.

...

  • JS starts as soon as file matching with Regular Expression found in the directory.
    No Format
     This directory is set in the "File Order Sources" area in the "Steps/Nodes" view of the "load_files" job chain. 
     TODO: SCREENSHOT
    

Image Added

  • JobScheduler's aquire_lock job matches files using regular expressions and determines the file's category - for example, Berlin or Munich.
    No Format
     The regular expressions are also defined in the "File Order Sources" area shown in the screenshot.
     Note that for simplicity the regular expressions match the prefixes "a" and "b" in the file names and not directly "Berlin" or "Munich".
     The aquire_lock job uses a Rhino JavaScript to try to aquire the lock and wait if the lock is not available. This script is listed below.
    

...

  • Once aquire_lock finds the matching category it will try to set a semaphore (flag) using JobScheduler's inbuilt LOCK mechanism
  • Only one instance of each LOCK is allowed as can be seen in the screenshot below. Once a LOCK has been assigned to first a file of from a category (either Berlin or Munich), next file all subsequent files for this category has haves to wait with a setback until the LOCK is freehas been freed. TODO - ADD SCREENSHOT
  • The same mechanism will be repeated for files from other categories. As long as a file of any given category is not being processed and therefore the corresponding LOCK not been set, the way will be free for the file from the other category to be allowed to be processed.
    No Format
    
     This can be seen in the following screenshot of JobScheduler's JOE interface showing the progression of file orders along the {{load_files}} job chain.
    

Image Added

  • Once process is finished depending upon success or error, JobScheduler will move the file from Once process is finished depending upon success or error, JobScheduler will move the file from the in folder to either the done (on success) or failed (on error) folders.
  • After moving input file to correct target directory JobScheduler, the release_lock job will be called, which will remove the lock/semaphore from JobScheduler and allow the next file from same category to be processed.

The following screenshot from the JobScheduler's JOE interface show
Image Added

Code Block
languagejavascript
function spooler_process() \{

  try \{

     var parameters = spooler_task.order().params();
    
     var filePath = "" + String(parameters.value("scheduler_file_path"));
     spooler_log.info( " scheduler_file_path : " + filePath );
     var lockName = "" + String(parameters.value( "lock_name" ));
     spooler_log.info( "  lock_name : " +  lockName );

/*
     var fileParts = filePath.split("\\");
     var fileName  = fileParts[fileParts.length-1];
     spooler_log.info( "fileName : " + fileName );

     if(fileName.match("^a[A-Za-z0-9_]*\.csv$")) \{
        var lockName = "BERLIN_PROC";
        var lock_name = "BERLIN_PROC";
        spooler_log.info( "File matched with berlin lock_name : "+ lockName  );
     \}
    
     if(fileName.match("^b[A-Za-z0-9_]*\.csv$")) \{
        var lockName = "MUNICH_PROC";
        spooler_log.info( "File matched with berlin lock_name : "+ lockName  );
     \}
*/
     if (spooler.locks().lock_or_null( lockName )) \{
        spooler.locks().lock( lockName ).remove();
     \}
     return true;

  \} catch (e) \{

    spooler_log.warn("error occurred: " + String(e));
    return false;

  \}

\}
  • A job to force -release_lock job is also required for the situation that the processing of a job fails without the corresponding lock being released.
    No Format
     The script for this job to release the 'Berlin' lock would look something like:
    

...

  • Just copy files from the 'Data/__test-files' folder to the 'in' folder, :
    • JobScheduler will automatically start processing within a few seconds
    • Once processing has been completed the file(s) added to the 'in' folder will be moved to the 'done' or 'failed' folders, depending on whether processing was successful or not.
  • DO NOT attempt to start an order for the job chain. This will only cause an error in the aquire_lock job.

How does the Demo Work?

{{DiagramBoxRight

...