Description of the JobSchedulerExistsFile Job - check whether a file exists
Checks for the existence of a file, a directory or for specific files inside of a directory.
The polling in above graphic is provided by a file order source setting in the job chain.
Example Job to Create a Result Set.
This job is creating a result set. A result set contain the names of all files which are selected as specified by the filter criteria. The content of the result set is returned as a parameter, but can be written to a file, too.
Parameters, which are useful for creating a result-set, are:
Parameter Name | Description |
---|---|
raise_error_if_result_set_is | Raise error on expected size of result-set |
result_list_file | Name of the result-list file |
expected_size_of_result_set | Number of expected hits in result-list |
on_empty_result_set | Set next node on empty result set |
scheduler_sosfileoperations_resultset | The result of the operation as a list of items |
scheduler_sosfileoperations_resultsetsize | The amount of hits in the result set of the operation |
An example for a job-xml file:
<job order='no' > <params> <param name="file" value="." /> <param name="file_spec" value="" /> <param name="gracious" value="false" /> <param name="max_file_age" value="0" /> <param name="min_file_age" value="0" /> <param name="max_file_size" value="-1" /> <param name="min_file_size" value="-1" /> <param name="skip_first_files" value="0" /> <param name="skip_last_files" value="0" /> <param name="count_files" value="false" /> <param name="create_order" value="false" /> <param name="create_orders_for_all_file" value="false" /> <param name="order_jobchain_name" value="" /> <param name="next_state" value="" /> <param name="on_empty_result_set" value="empty" /> <param name="expected_size_of_result_set" value="0" /> <param name="raise_error_if_result_set_is" value="0" /> <param name="result_list_file" value="empty" /> </params> <script language="java" java_class="sos.scheduler.file.JobSchedulerExistsFile" /> </job>
This job can be used standalone, as a single job, or as an order driven job in a jobchain as a jobchain node. Parameters are respectively accepted as job- or as order-parameters.
A job can process multiple parameters that are analyzed when the job starts. Parameters are defined in the configuration of the job or of the order. Parameters can also be submitted by API methods. Parameters are optional or mandatory and may contain default values.
This job is creating a result set. A result set contain the names of all files which are selected as specified by the filter criteria. The content of the result set is returned as a parameter, but can be written to a file, too.
Parameter Definitions
Parameters Used by JobSchedulerExistsFile
Name | Description | Mandatory | Default |
---|---|---|---|
File or Folder to watch for | true | . | |
Regular Expression for filename filtering | false |
| |
Specify error message tolerance | false | false | |
Maximum age of a file | false | 0 | |
Minimum age of a file | false | 0 | |
Maximum size of a file | false | -1 | |
Minimum size of one or multiple files | false | -1 | |
Number of files to remove from the top of the result-set | false | 0 | |
Number of files to remove from the bottom of the result-set | false | 0 | |
Return the size of resultset | false | false | |
Activate file-order creation | false | false | |
Create a file-order for every file in the result-list | false | false | |
create_orders_for_new_files | Create a file-order for every new file in the result-list | false | false |
param_name_file_path | The name of the parameter that contains the name of the file to be transferred | false | --- |
The name of the jobchain which belongs to the order | false |
| |
The first node to execute in a jobchain | false |
| |
merge_order_parameter | Merge actual order parameter into new created order | false | false |
Set next node on empty result set | false | empty | |
Number of expected hits in result-list | false | 0 | |
Raise error on expected size of result-set | false | 0 | |
Name of the result-list file | false | empty | |
check_steady_state_of_files | Check the completeness of a file (steady state) | false | false |
steady_state_count | Maximum Number of Checkpoints | false | 30 |
check_steady_state_interval | Temporal distance between checkpoints | false | 1 |
Parameter file: File or Folder to watch for
File or Folder to watch for
Checked file or directory
Supports masks for substitution in the file name and directory name with format strings that are enclosed by [and]
. The following format strings are supported:
[date: date format ] '''date format''' must be a valid Java data format string, e.g. '''yyyyMMddHHmmss''' , '''yyyy-MM-dd.HHmmss''' etc.
An example:
<param name="file" value="sample/hello[date:yyyyMMdd].txt" />
On 2050-12-31 the parameter file contains the value "sample/hello20501231.txt" .
This parameter supports substitution of job parameter names with their value if the job parameter name is enclosed by % and % .
An example: <param name="file" value"%scheduler_file_path%" />
During the job runtime the parameter file contains the value of the job parameter scheduler_file_path . Using Directory Monitoring with File Orders the job parameter scheduler_file_path contains automatically the path of the file that triggered the order.
Data-Type : SOSOptionString
The default value for this parameter is ..
This parameter is mandatory.
Parameter file_spec: Regular Expression for filename filtering
Regular Expression for filename filtering
Regular Expression for file filtering. The behaviour is CASE_INSENSITIVE.
Only effective if the parameter file is a directory.
Some remarks on regular expression, as used in JobScheduler:
- A regular expression is not a wildcard . To get an impression of the differences one have a look on the meaning of the wildcard .txt, which will select all filenames with the filename-extension ".txt". A regular expression to match, e.g. works the same way, this "wildcard" must look like "^.\.txt$". That looks a little bit strange but it is much more flexible and powerfull on filtering filenames than the concept of wildcards, if one want to filter more complex names or pattern.
- The general syntax of an regular expression , also referred to as regex or regexp, is described here . It is different to other RegExp definitions, e.g. as for Perl.
Data-Type : SOSOptionRegExp
Parameter gracious: Specify error message tolerance
Specify error message tolerance
Enables or disables error messages that are caused by an empty result-set, which is the result of an operation, executed by the job. Therefore this parameter can control the sequence of nodes or states in a job-chain.
Valid values:
'''false, 0, off, no, n, nein, none''' , '''true, 1, on, yes, y, ja, j''' and '''all''' .
The following rules apply when the result set is empty:
GRACIOUS | Standalone Job | Order Job |
---|---|---|
false, 0, off, no, n, nein, none | error log, Task error | error log, set_state error |
true, 1, on, yes, y, ja, j | no error log, Task success | no error log, set_state error |
all | no error log, Task success | no error log, set_state success |
For example, the setting "gracious=all" will suppress all errors regarding an empty result-set and will terminate a Job (standalone and inside a jobchain) as it would be without errors.
Data-Type : SOSOptionGracious
The default value for this parameter is false.
Parameter max_file_age: Maximum age of a file
maximum age of a file
Specifies the maximum age of a file. If a file is older, then it is deemed not to exist, it will be not included in the result list.
Data-Type : SOSOptionTime
The default value for this parameter is 0.
Parameter min_file_age: Minimum age of a file
minimum age of a file
Specifies the minimum age of a files. If the file(s) is newer then it is classified as non-existing, it will be not included in the result list.
Data-Type : SOSOptionTime
The default value for this parameter is 0.
Parameter max_file_size: Maximum size of a file
maximum size of a file
Specifies the maximum size of a file in bytes: should the size of one of the files exceed this value, then it is classified as non-existing.
valid values for file size are
Value | Description |
---|---|
-1 | The value of the parameter has no effect and the parameter is not part of the filter. |
number | a number stand for the size in byte, e.g. 40 means 40 bytes. |
numberKB | a number with the chars "KB" stand for the size in kilobyte. |
numberMB | a number with the chars "MB" stand for the size in megabyte. |
numberGB | a number with the chars "GB" stand for the size in gigabyte. |
Data-Type : SOSOptionFileSize
The default value for this parameter is -1.
Parameter min_file_size: Minimum size of one or multiple files
minimum size of one or multiple files
Specifies the minimum size of one or multiple files in bytes: should the size of one of the files fall below this value, then it is not included in the result list of the operation.
valid values for file size are
Value | Description |
---|---|
-1 | The value of the parameter has no effect and the parameter is not part of the filter. |
number | a number stand for the size in byte, e.g. 40 means 40 bytes. |
numberKB | a number with the chars "KB" stand for the size in kilobyte. |
numberMB | a number with the chars "MB" stand for the size in megabyte. |
numberGB | a number with the chars "GB" stand for the size in gigabyte. |
Data-Type : SOSOptionFileSize
The default value for this parameter is -1.
Parameter skip_first_files: Number of files to remove from the top of the result-set
number of files to remove from the top of the result-set
The number of files are removed from the beginning of the set resulting by min_file_size , min_file_age etc. These files are excluded from further operations.
The result set is sorted according to the used filter parameters:
- min_file_age , max_file_age : in ascending order by date of last modification, the newest file first.
- min_file_size , max_file_size : in ascending order by file size, the smallest file on top.
- if parameters for file age as well as file size are given the result set is sorted by file age.
Only either skip_first_files or skip_last_files is allowed to be set at the same time.
Data-Type : SOSOptionInteger
The default value for this parameter is 0.
Parameter skip_last_files: Number of files to remove from the bottom of the result-set
Number of files to remove from the bottom of the result-set
The number of files are removed from the end of the set resulting by min_file_size, min_file_age etc. These files are excluded from further operations.
The result set is sorted according to the constraining parameters used:
- min_file_age, max_file_age: in ascending order by date of last modification, the newest file first.
- min_file_size, max_file_size: in ascending order by file size, the smallest file first.
If parameters for file age as well as file size are given the set is sorted by file age.
Only either skip_first_files or skip_last_files is allowed to be set at one time.
Data-Type : SOSOptionInteger
The default value for this parameter is 0.
Parameter count_files: Return the size of resultset
Return the size of resultset
If this parameter is set true " true " the number of matches is returned in the order parameter " scheduler_SOSFileOperations_file_count ".
Valid values: true, 1, on, yes, y, ja, j and false, 0, off, no, n, nein
This parameter is valid and available for order driven jobs only. JobChains, for example, are order driven jobs. In standalone jobs this parameter will be ignored without further notice.
Data-Type : SOSOptionBoolean
The default value for this parameter is false.
Parameter create_order: Activate file-order creation
Activate file-order creation
With this parameter it is possible to specify, that for all filenames in the resultlist or for the first file only (see create_orders_for_all_files ) a file-order has to be created and launched.
Valid values: true, 1, on, yes, y, ja, j and false, 0, off, no, n, nein
Data-Type : SOSOptionBoolean
The default value for this parameter is false.
Use together with parameter:
create_orders_for_all_files - Create a file-order for every file in the result-listorder_jobchain_name - next_state -
Parameter create_orders_for_all_files: Create a file-order for every file in the result-list
Create a file-order for every file in the result-list
Valid values: true, 1, on, yes, y, ja, j and false, 0, off, no, n, nein
Data-Type : SOSOptionBoolean
The default value for this parameter is false.
Use together with parameter:
create_order - Activate file-order creationorder_jobchain_name - next_state -
Parameter create_orders_for_new_files: Create a file-order for every new file in the result-list
Create a file-order for every new file in the result-list
If this parameter is set to "true", for each new file which is in the result set, a file-order is created and started.
This parameter is in effect only if the create_orders parameter is not set or has the value "true".
example 1: create a file-order
create_orders_for_new_files=true
Valid values: true, 1, on, yes, y, ja, j and false, 0, off, no, n, nein.
DataType: SOSOptionBoolean
Default: false
Parameter param_name_file_path: The name of the parameter containing the name of the file to be transferred
The name of the parameter containing the name of the file to be transferred
This parameter sets the name of the parameter that contains the name of the transferred file. The default value is scheduler_file_path. The name should be changed from the default if it is not desired to create file_orders that have to handle a file sink.
DataType: SOSOptionString
Default: ---
Parameter order_jobchain_name: The name of the jobchain which belongs to the order
The name of the jobchain which belongs to the order
The name of the job chain which has to be launched by the order is the value of this parameter.
One must take into account, that the name of the jobchain must contain a subfolder structure if the jobchain is not in the folder "live". An example: the jobchain "Test" is located in "live/sample/FileOperations/". The value which has to be specfied is then "/sample/FileOperations/Test".
Data-Type : SOSOptionString
Use together with parameters:
- create_order - Activate file-order creation
- next_state -
Parameter next_state: The first node to execute in a jobchain
The first node to execute in a jobchain
The name of the node of a jobchain, with which the execution of the chain must be started, is the value of this parameter.
Data-Type : SOSOptionJobChainNode
Use together with parameters:
- create_order - Activate file-order creation
- order_jobchain_name -
Parameter merge_order_parameter: Merge actual order parameter into new created order
merge actual order parameter into new created order
This parameter specifies that the order, which has to be created, will be extended by the parameters of the actual order.
DataType: SOSOptionBoolean
Default: false
Parameter on_empty_result_set: Set next node on empty result set
Set next node on empty result set
The next Node (Step, Job) to execute in a JobChain can be set with this parameter. The value of the parameter is a (valid) node-name of the current JobChain. In case of an empty result-set, e.g. due to non existent files, the current job will end without an errors and the JobChain will continue with the name of the node which is given as the value of this parameter.
Data-Type : SOSOptionJobChainNode
The default value for this parameter is empty.
Parameter expected_size_of_result_set: Number of expected hits in result list
Number of expected hits in result-list
Data-Type : SOSOptionInteger
The default value for this parameter is 0.
Use together with parameter:
- raise_error_if_result_set_is - raise error on expected size of result-set
Parameter raise_error_if_result_set_is: Raise error on expected size of result set
Raise error on expected size of result-set
With this parameter it is possible to raise an error if the quantity of hits of the result list is according to the value of this parameter.
An example:
Assuming, that the parameter "raise_error_if_result_set_is=ne" is defined and the parameter "expected_size_of_result_set=1" is specified as well. If the number of hits is not equal to "1" an error will raised.
Data-Type : SOSOptionRelOp
The default value for this parameter is 0.
Use together with parameter:
- expected_size_of_result_set - number of expected hits in result-list
Parameter result_list_file: Name of the result list file
Name of the result-list file
If the value of this parameter specifies a valid filename the result-list will be written to this file.
Data-Type : SOSOptionFileName
The default value for this parameter is empty.
Parameter check_steady_state_of_files: Check the completeness of a file (steady state)
Check the completeness of a file (steady state)
In some file transfer scenarios the receiver of a file has no knowledge about the time when the sender creates the file. In case of a (very) large file it can be the situation that the receiver tries to read the file but the sender has not finished writing it. If the receiver get the file at the moment the sender is still writing, as a result he will get a corrupted, incomplete file.
Setting this parameter to "true" the receiver will check the file for completeness before he starts the transfer.
At the end, this is not a very secure approach, because the receiver is checking the date of last modification and the size of the file. If both not changing between a time intervall, which is defined by the parameters ..., the file is guessed to be complete. If the sender is terminated without writing the complete file, or the network is down, or the speed of processing the file is going slow, the receiver will get a corrupted file.
A better approach for avoiding corrupt files is to use the atomic method: writing a file and after completion of writing rename the file. For more details about this method see parameter atomic_suffix or atomic_prefix.
If more than one file is to be transferred, the transactional approach is the first choice. See parameter transactional.
DataType: SOSOptionBoolean
Default: false
Parameter steady_state_count: Maximum Number of Checkpoints
Maximum Number of Checkpoints
The value of this option specifies the number of retries for to check the steady state of a file.
DataType: SOSOptionInteger
Default: 30
Parameter check_steady_state_interval: Temporal distance between checkpoints
Temporal distance between checkpoints
The value of this option defines the temporal distance in seconds between two checkpoints.
DataType: SOSOptionTime
Alias: Steady_State_Interval
Default: 1
Return Parameters from JobSchedulerExistsFile
The order parameters described below are returned by the job to the JobScheduler. JobSchedulerExistsFile
Name | Title | Mandatory | Default |
---|---|---|---|
File to process for a file-order | false | empty | |
Pathname of the file to process for a file-order | false | empty | |
Name of the file to process for a file-order | false | empty | |
The result of the operation as a list of items | false | empty | |
The amount of hits in the result set of the operation | false | empty | |
Return the size of the result set after a file operation | false | 0 |
Parameter scheduler_file_path: File to process for a file-order
file to process for a file-order
Using Directory Monitoring with File Orders the job parameter scheduler_file_path contains automatically the path of the file that triggered the order.
Data-Type : SOSOptionFileName
The default value for this parameter is empty.
Parameter scheduler_file_parent: Pathname of the file to process for a file-order
Pathname of the file to process for a file-order
Data-Type : SOSOptionFileName
The default value for this parameter is empty.
Parameter scheduler_file_name: Name of the file to process for a file-order
Name of the file to process for a file-order
Data-Type : SOSOptionFileName
The default value for this parameter is empty.
Parameter scheduler_sosfileoperations_resultset: The result of the operation as a list of items
The result of the operation as a list of items
Data-Type : SOSOptionstring
The default value for this parameter is empty.
Use together with parameter:
Parameter scheduler_sosfileoperations_resultsetsize: The amount of hits in the result set of the operation
The amount of hits in the result set of the operation
Data-Type : SOSOptionsInteger
The default value for this parameter is empty.
Use together with parameter:
Parameter scheduler_sosfileoperations_file_count: Return the size of the result set after a file operation
Return the size of the result set after a file operation
Data-Type : SOSOptionInteger
The default value for this parameter is 0.
Use together with parameter: