Problem
Within a job chain file_order_source
starts an order when the file is created, not when the file is ready.
In some file transfer scenarios the receiver of a file has no knowledge about when the sender creates a file. In case of a large file, it is possible for the receiver to try to read a file before the sender has finished writing it. If the receiver then attempts to use the file at this moment, he will get a corrupted, incomplete file.
Solutions
Using file_order_source
There are three ways how to use file_order_source
:
- The sender creates a file named
abc.txt~
. After the transfer is completed, the sender renames the file toabc.txt
. You would use a regular expression such as^.*\.txt$
to check for the presence of files.
- The sender creates a file named
abc.txt
. When it is ready, a second file with 0 byte will be created. The name of the second file isabc.txt.trigger
. Here, you would use a regular expression such as^.*\.txt.trigger$
. Note that with this approach you have the disadvantage that the name of the trigger file is listed underscheduler_file_path
, not the name of the file that should be executed.
- Set-up a job chain where the file size is checked in the first node. Then carry out a setback if the file size is changing. This can be done with the job JobSchedulerExistsFile
See also: Directory Monitoring with File Orders
Using the JobSchedulerExistsFile job
- This job has the advantage over file_order_source solutions that it allows the use of parameters, for example for the name of the target directory, and it allows you to configure the polling rate.
- The JobSchedulerExistsFile job also checks whether the file size is constant - i.e. Is the file still being written? - and will only proceed if the file size is not changing.
- The JobSchedulerExistsFile job has three parameters to manage the check steady state behaviour.
- check_steady_state_of_files: If
true
, job will check the steady state. - check_steady_state_interval: Interval in seconds between two checks
- steady_state_count: If set, this is maximum number of intervals. If the maximum is reached, the task will be terminated with an error.
- check_steady_state_of_files: If
See also: Job JobSchedulerExistsFile
Related Downloads
You can download example files covering both file_order_source
and JobSchedulerExistsFile job solutions