Scope
- API jobs make use of the JobScheduler API Interface
- Such jobs can be optimized considering the requirements and capabilities of the environment that they are operated for.
Performance Goals
- Performance is about effective use of resources. We can speed up the execution of certain processes but would have to accept a performance degradation for other processes.
- Performance optimization requires a clear goal to be achieved. There is no such thing as an "overall speed improvement", instead performance improvements are oriented towards specific use cases.
Resource Consumption
- For starters let's define the exact meaning when using the following terms:
- low number: 1 - 1000 objects
- medium number: 1000 - 4000 objects
- high number: 4000 - 10000 objects
- Then let's consider the impact of a single instance and a distributed environment:
- In a single instance environment all job related objects are managed and executed within the same instance (JobScheduler Master).
- In a distributed environment all job related objects are managed by the same instance (JobScheduler Master), but are executed on distributed instances (JobScheduler Agents) at run-time.
- In a first step let's check what resources are consumed when operating jobs:
- Number of job related objects
- This is about the number of job-related objects, i.e. jobs, job chains and orders, that are available in the system, independently from the fact that they are running or not.
- JobScheduler has to track events for jobs, e.g. when to start and to stop them. Therefore a high number of job related objects causes a performance impact. Some commonly used environments include up to 15000 jobs and 5000 job chains.
- Number of job nodes
- This is about the number of jobs that are used in job nodes for job chains.
- Jobs can be re-used for any number of job chains.
- You could operate 1000 job chains with each using 5 job nodes with individual jobs which results in a total of 5000 jobs.
- You could operate e.g. 100 different jobs that are organized in 1000 job chains each using a different sequence of 5 jobs.
- The length of a job chain, i.e. the number of job nodes, is important:
- In common environment job chains with up to 30 job nodes are used.
- You can operate a job chain with a high number of e.g. 4000 job nodes. In fact this will have a slight effect on performance as JobScheduler has to check predecessor and successor job nodes.
- Jobs can be re-used for any number of job chains.
- This is about the number of jobs that are used in job nodes for job chains.
- Resources used per job
- Depending on the nature of your job this job will consume
- an instance of a JVM
- memory and CPU as required by your job implementation
- Depending on the nature of your job this job will consume
- Total of running jobs
- This number has less impact on JobScheduler than you would expect, but it affects the available resources like memory and CPU.
- x
- Number of job related objects
Parallelism