Getting Started - Batch Framework
Overview
Batch Framework provides the perfect environment to write and execute the batch jobs
for the developers. It allows the batch job configuration in various ways, logging
manager to take care of the job logging, different repository implementations
to sent and receive the data between the jobs, management agent to manage
and monitor the running batch job and finally different controllers to
write your batch jobs. May be the following diagram provides the better understanding
of the framework by using pool job controller in working.
Batch Framework control flow diagram
Rest of the sections will explain the each concept in the framework.
Top
System Requirements
Batch Framework can be run on any JRE starting from 1.4.2.
Top
Download
Batch Framework 1.0 beta version can be downloaded here.
Download the Batch_Framework_1.0_beta.zip from the available downloadable files. This distribution
contains the framework jar (batch_framework_1.0_beta.jar), all the jars required in the runtime and source
code to add to debug path if needed.
Top
Installation
Following are the Setps to install the Batch Framework.
- Extract the downloaded Batch_Framework_1.0_beta.zip file into any directory.
- Add the batch_framework_1.0_beta.jar available in the extracted directory into the classpath.
- Add all the jars available in the lib directory into the classpath.
- Add framework-config.xml and batch-config.xml files in conf directory into the classpath.
This is the basic configuration needed to get going with the Batch Framework. There are so many
options available to customize the framework and running environment, these will be
discussed while going further.
Top
My First Batch Job
The best way to understand the framework is to write our first batch job. So, lets go ahead and write our first batch job which
actually prints "Hello World". Following Steps explains how to write our first batch job and execute it using our
framework.
- Write the following java code in com.mycompany.jobs.HelloWorldJob.java in your favourite IDE and compile.
package com.mycompany.jobs;
import org.jmonks.batch.framework.controller.basic.BasicJobProcessor;
import org.jmonks.batch.framework.JobContext;
import org.jmonks.batch.framework.ErrorCode;
public class HelloWorldJob extends BasicJobProcessor
{
public HelloWorldJob()
{
}
public ErrorCode process(JobContext context)
{
System.out.println("Hello World");
return ErrorCode.JOB_COMPLETED_SUCCESSFULLY;
}
public long getTotalRecordsCount()
{
return 10;
}
public long getProcessedRecordsCount()
{
return 5;
}
public Object getProcessorState()
{
return "give job info being processed.";
}
}
- Add the following configuration to the batch-config.xml file in the classpath.
<job-config job-name="helloworld" status="active">
<job-controller job-controller-class-name="org.jmonks.batch.framework.controller.basic.BasicJobController">
<basic-job-processor basic-job-processor-class-name="com.mycompany.jobs.HelloWorldJob" thread-count="1"/>
</job-controller>
<job-logging-config>
<job-logger-config logger-name="com.mycompany.jobs" logger-level="DEBUG"/>
</job-logging-config>
</job-config>
- Run the following command from command prompt. Assuming the CLASSPATH varibale is holding all the jars and files mentioned in the Installation section.
java -classpath %CLASSPATH% org.jmonks.batch.framework.Main job-name=helloworld
Top
Job Controllers
Job Controllers as the name implies, define, how the job should be written and how it will be
executed (controlled) in runtime. This is the important component in the framework and this is the component
developer really care and work with.
Job Controllers define their own logic to write a batch job and explains the way this
logic will be executed in the runtime. Along with the controller logic, they support the
admin applications by providing the useful information to monitor and manage.
They provide set of interfaces for developers to implement and put their business logic in.
They provide the way to configure the batch job in the job configuration. As of now framework,
provides two controllers names as PoolJobController and BasicJobController.
PoolJobController
This controller provides the logic revolves around the job pool, job loader and job processor(s).
The way this logic works is, there will be a designated pool for each batch job, there
will be a loader loads all the
data needs to be processed into the pool, there will be one ore more than one
job processor(s) picks the data from the pool and process them. This controller provides
three interfaces to implement the batch job, JobPool which holds the job data, PoolJobLoader which
loads the job data into the JobPool and PoolJobProcessor which picks the data from the pool
and process them.
For better and quick understanding, lets take a task of writing a batch job "process-integers" which actually loads 10 integer objects into the pool and process them
by using 3 job processors.
- Write a Loader class IntegerJobLoader.java extending the AbstractPoolJobLoader class. The
AbstractPoolJobLoader implements the PoolJobLoader to eliminate the burden of implementing
the managemtn APIs for developers.
package com.mycompany.jobs;
import org.jmonks.batch.framework.controller.pool.AbstractPoolJobLoader;
import org.jmonks.batch.framework.JobContext;
import org.jmonks.batch.framework.ErrorCode;
public class IntegerJobLoader extends AbstractPoolJobLoader
{
public IntegerJobLoader()
{
}
public ErrorCode loadPool(JobContext jobContext)
{
for(int i=0;i<10;i++)
loadJobData(new Integer(i));
loadJobData(null);
return ErrorCode.JOB_COMPLETED_SUCCESSFULLY;
}
public long getTotalJobDataCount()
{
return 10;
}
}
- Write a Processor class IntegerJobProcessor.java extending the AbstractPoolJobProcessor class. The
AbstractPoolJobProcessor implements the PoolJobProcessor to eliminate the burden of implementing
the managemtn APIs for developers.
package com.mycompany.jobs;
import org.jmonks.batch.framework.controller.pool.AbstractPoolJobProcessor;
import org.jmonks.batch.framework.JobContext;
import org.jmonks.batch.framework.ErrorCode;
public class IntegerJobProcessor extends AbstractPoolJobProcessor
{
public IntegerJobProcessor()
{
}
public void initialize(JobContext jobContext)
{
}
public ErrorCode process(Object jobData)
{
Integer value=(Integer)jobData;
// Perform business logic on jobData.
System.out.println("Received Value = " + value.toString());
return ErrorCode.JOB_COMPLETED_SUCCESSFULLY;
}
public void cleanup()
{
}
}
- Thinking of writing a JobPool implementation.
Typically, framework will come up with different implementations of JobPool.
You can choose the available JobPool implementation by configuring
the required JobPool implementation in job configuration as mentioned in the next step. As of now,
framework has one JobPool implementation which is CollectionJob which uses java collections as the pool.
- Add the following configuration to the job configuration file which is batch-config.xml file for our discussion.
<job-config job-name="process-integers" status="active">
<job-controller job-controller-class-name="org.jmonks.batch.framework.controller.pool.PoolJobController">
<pool-job-loader pool-job-loader-class-name="com.mycompany.jobs.IntegerJobLoader">
<property key="any-config-param">any-config-value</property>
<pool-job-loader/>
<pool-job-processor pool-job-processor-class-name="com.mycompany.jobs.IntegerJobProcessor" thread-count="3">
<property key="any-config-param">any-config-value</property>
<pool-job-processor/>
<job-pool job-pool-class-name="org.jmonks.batch.framework.controller.pool.CollectionJobPool">
<property key="pool-size">1000</property>
</job-pool>
</job-controller>
<job-logging-config>
<job-logger-config logger-name="com.mycompany.jobs" logger-level="INFO"/>
</job-logging-config>
</job-config>
- Run the following command from command prompt. Assuming the CLASSPATH varibale is holding all the jars and files mentioned in the Installation section.
java -classpath %CLASSPATH% org.jmonks.batch.framework.Main job-name=process-integers
Please see the org.jmonks.batch.framework.controller.pool
package for additonal details on PollJobController and its interfaces.
BasicJobController
This controller allows the developer to execute a simple business logic.This provides BasicJobProcessor
interfaces which will have process method. All you need to do is implement this one perform the business
logic.
My First Batch Job uses this processor to write its business logic.
Follow the instructions available in that section to write the batch jobs using BasicJobController.
Please see the org.jmonks.batch.framework.controller.basic
package for additonal details on BasicJobController and its interfaces.
Top
Controlling the Logging
Logging is one of the very important activity in writing the batch jobs. This information
will be very useful when trying to debug the failures of a batch job. Batch Framework comes with a
logging manager which takes care of all the logging for the batch jobs. Following is the default logging configuration
from the framework configuration (framework-config.xml) file. In the rest of this section, we will
explore this configuration.
<framework-logging-config
framework-logging-level="DEBUG"
job-logging-directory="/batchserver/logs"
job-base-package-name="com.mycompany.batch"
job-logging-level="INFO"/>
Framework does its own logging to a "batch_framework.log" file in the directory given in
the "job-logging-directory" attribute in the configuration. Initially, it will be set to
"/batchserver/logs". This can changed to wherever you like the logging needs to be done.
The default logging level for the all the framework information will be "DEBUG" which is
specified in the attribute "framework-logging-level".
Along with its own logging, framework provides the logging facility for all the batch jobs.
For each batch job, it creates a directory in the framework logging directory (directory specified
in "job-logging-directory" attribute) when the job executes first time.
Each invocation of the batch job will create a seperate file appending the timestamp to the job name.
The "job-base-package-name" attribute value will be used to create a single logger for all the batch
jobs and its logging level will be defaulted to "INFO" which is specfied by the "job-logging-level" attribute value.
These two values can be changed according to your requirements. Please see the
org.jmonks.batch.framework.LoggingManager for
additonal details.
Using a single logger to all the jobs is not enough for everyone. There could be scenarios that
logging level needs to be controlled for each batch job. To satisfy those requirements
framework provides the flexibility to control the logging level of each batch job. This can be
done by declaring the additional loggers in the job configuration. Following job configuration
defines it own loggers to control the logging.
<job-config job-name="helloworld" status="active">
<job-controller job-controller-class-name="org.jmonks.batch.framework.controller.basic.BasicJobController">
<basic-job-processor basic-job-processor-class-name="com.mycompany.jobs.HelloWorldJob" thread-count="1"/>
</job-controller>
<job-logging-config>
<job-logger-config logger-name="com.mycompany.jobs" logger-level="TRACE"/>
<job-logger-config logger-name="com.mycompany.jobs.hello" logger-level="ERROR"/>
</job-logging-config>
</job-config>
Along with these facilities, framework provides the controlling of the logging level
of every job at the runtime through management and monitoring applications.
Top
Job Configuration
Job configuration can be done either in XML files or any database that supports JDBC
connectivity. Batch Framework provides two factories to configure based on your needs.
Following XML snippet in framework configuration (framework-config.xml) controls the
configuration of the jobs.
<job-config-factory-config job-config-factory-class-name="org.jmonks.batch.framework.config.xml.XMLJobConfigFactory">
<property key="job-config-file-classpath-location">batch-config.xml</property>
</job-config-factory-config>
By default framework is configured with XMLJobConfigFactory to read the job configuration from XML files and again
from batch-config.xml file which is available in classpath. All this information can be controlled
in the framework configuration file.
XML Configuration
org.jmonks.batch.framework.config.xml.XMLJobConfigFactory allows us to configure the batch jobs in XML files.
It is flexible to read the XML job configuration file either from the classpath or as an absolute path in the file system.
This can be dictated by property key in "job-config-factory-config". Use the key "job-config-file-classpath-location"
to allow the factory to read the configuration file from the class path and use the key "job-config-file-absolute-location" to allow the factory
to read the configuration file from the file system and this value should be an absolute path.
Following XML snippet configures factory with the absolute path.
<job-config-factory-config job-config-factory-class-name="org.jmonks.batch.framework.config.xml.XMLJobConfigFactory">
<property key="job-config-file-absolute-location">/batchserver/config/batch-config.xml</property>
</job-config-factory-config>
Please see the
org.jmonks.batch.framework.config.xml.XMLJobConfigFactory for
additional details and check out that package for all the configuration classes which explains
the format of the job configuration.
Database Configuration
org.jmonks.batch.framework.config.xml.DBJobConfigFactory allows us to configure job configuration
in any database complaint with JDBC. As of now SQL scripts for MySQL and Oracle has been
provided to create the tables for the job configuration.
MySQL
To configure the framework to use MySQL for the job configuration, please follow the following steps.
- Execute the run_all.sql script available in <extracted_dir>/bin/dbscripts/mysql directory.
- Change the "job-config-factory-config" in framework configuration as below and change the values appropriate to your MySQL database settings.
<job-config-factory-config job-config-factory-class-name="org.jmonks.batch.framework.config.db.DBJobConfigFactory">
<property key="jdbc-driver-class-name">com.mysql.jdbc.Driver</property>
<property key="jdbc-url">jdbc:mysql://localhost:3306/batchserver</property>
<property key="username">root</property>
<property key="password">password</property>
</job-config-factory-config>
Oracle
To configure the framework to use Oracle for the job configuration, please follow the following steps.
- Execute the run_all.sql script available in <extracted_dir>/bin/dbscripts/oracle directory.
- Change the "job-config-factory-config" in framework configuration as below and change the values appropriate to your Oracle database settings.
<job-config-factory-config job-config-factory-class-name="org.jmonks.batch.framework.config.db.DBJobConfigFactory">
<property key="jdbc-driver-class-name">oracle.jdbc.driver.OracleDriver</property>
<property key="jdbc-url">jdbc:oracle:thin:@hostname:1521:instancename</property>
<property key="username">scott</property>
<property key="password">tiger</property>
</job-config-factory-config>
Please see the
org.jmonks.batch.framework.config.db.DBJobConfigFactory for
additional details and check out that package for all the configuration classes which explains
the format of the job configuration.
Top
Repository Configuration
Repository can be used to share the data between batch jobs, saves the job statistics
and save all the management and monitoring applications. Framework comes with the default
configuration to use DB4O (Database for Objects) as the repository and will be saved in the
"/batchserver/repository" directory. There are mutliple implementations of this repository
available. By changing the configuration in framework configuration, we can switch to
different repository implementations.
DB4O - Standalone Mode
By default framework comes with this configuration and it will configured in the following way
in framework configuration. If needed, directory name can changed to wherever we needed.
<repository-config repository-class-name="org.jmonks.batch.framework.repository.db4o.Db4oRepository">
<property key="db4o-directory">/batchserver/repository</property>
</repository-config>
DB4O - Client Server Mode
org.jmonks.batch.framework.respository.db4o.ClientServerDb4oRepsoitory impelementation allows
us to use DB4O database running in the client server mode. Following is the configuration to use
that implemenation. Planning to provide a script to create a DB4O server in future phases.
Please see the
org.jmonks.batch.framework.repository.db4o.ClientServerDb4oRepository for
additional details on how to configure this one in framework configuration file.
MySQL
org.jmonks.batch.framework.respository.jdbc.JdbcRepository implementation allows us to use any
database complaint with JDBC. To configure the framework to use MySQL for the repository, please follow the following steps.
- Execute the run_all.sql script available in <extracted_dir>/bin/dbscripts/mysql directory. (If you have done this step for job configuration, ignore this step.)
- Change the "repository-config" in framework configuration as below and change the values appropriate to your MySQL database settings.
<job-config-factory-config job-config-factory-class-name="org.jmonks.batch.framework.repository.jdbc.JdbcRepository">
<property key="jdbc-driver-class-name">com.mysql.jdbc.Driver</property>
<property key="jdbc-url">jdbc:mysql://localhost:3306/batchserver</property>
<property key="username">root</property>
<property key="password">password</property>
</job-config-factory-config>
Oracle
org.jmonks.batch.framework.respository.jdbc.JdbcRepository implementation allows us to use any
database complaint with JDBC. To configure the framework to use Oracle for the repository, please follow the following steps.
- Execute the run_all.sql script available in <extracted_dir>/bin/dbscripts/mysql directory. (If you have done this step for job configuration, ignore this step.)
- Change the "repository-config" in framework configuration as below and change the values appropriate to your Oracle database settings.
<job-config-factory-config job-config-factory-class-name="org.jmonks.batch.framework.repository.jdbc.JdbcRepository">
<property key="jdbc-driver-class-name">oracle.jdbc.driver.OracleDriver</property>
<property key="jdbc-url">jdbc:oracle:thin:@hostname:1521:instancename</property>
<property key="username">scott</property>
<property key="password">tiger</property>
</job-config-factory-config>
Management and Monitoring applications are going to use Repository heavily for their operations.
So it would be beneficial to use a repository works in client server mode to easily
run these applicatins.
Top
JMX Connector Configuration
JobManagementAgent uses the JobConnectorHelper configuration in framework configuration file
to create the JMX conncetor server and register this inforamtion in the look up location. Framework configuration
file comes with the connector helpers class org.jmonks.batch.framework.management.jmxmp.RepositoryJMXMPConnectorHelper.
JMXMP - Repository
org.jmonks.batch.framework.management.jmxmp.RepositoryJMXMPConnectorHelper uses the JMXMP connector to create the
connector servers and uses the repository to save the connector server information. Framework uses this connector
helper by default. Here is the configuration.
<job-connector-config job-connector-helper-class-name="org.jmonks.batch.framework.management.jmxmp.RepositoryJMXMPConnectorHelper"/>
Top
Feedback
I am trying my level best to keep this tutorial understandable and it is upto date with the releases.
If you see any mistakes or inconsistency in examples, please send me a message
(If you are a
member of sourceforge)
or an email.
Top
|