This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Batch architecture

How do you run batch processes in an open architecture?

1: Converting JCLs
2: Replicate JES functionality
3: Program compilation
4: Data access

The following is a description of the main components of the IBM Batch architecture; it is important to understand the capabilities of each in order to replicate them on an open container-based architecture.

JCL
JES
Application programs (COBOL, PL/I, etc.)
Data (files and databases).

JCL

We can think of a JCL as a distant ancestor of a DAG (Directed Acrylic Graph), it is a set of sentences, inherited from punch card technology, that define the process and the sequence of steps to be executed.

In the JCL we find the basic characteristics of the process or job (name, type, priority, resources allocated, etc.), the sequence of programmes to be executed, the sources of input information and what to do with the output data of the process.

The main statements found in a JCL are the following

A JOB card, where the name of the process and its characteristics are defined.
One or more EXEC cards with each program to be executed.
One or more DD cards defining the files (data sets) used by the previous programs.

//JOB1    JOB (123),CLASS=C,MSGCLASS=S,MSGLEVEL=(1,1),NOTIFY=&SYSUID
//*
//STEP01   EXEC PGM=PROGRAM1
//INPUT1   DD   DSN=DEV.APPL1.SAMPLE,DISP=SHR
//OUTPUT1  DD   DSN=DEV.APPL1.CUOTA,
//              DISP=(NEW,CATLG,DELETE),VOLUME=SER=SHARED,
//              SPACE=(CYL,(1,1),RLSE),UNIT=SYSDA,
//              DCB=(RECFM=FB,LRECL=80,BLKSIZE=800)
//*

JES

The JES is the z/OS component (subsystem) responsible for batch processing. It performs two main tasks:

Scheduling the batch processes
- Assigning the process to a class or initiator (jobs can be assigned to specific queues)
- Defining the priority of the process
- Allocating/limiting the resources assigned to the process (memory, time, etc.)
- Control the execution sequence (STEPs) of the process
Execute programs
- Validate the JCL statements
- Loading programs (COBOL, PL/I) into memory for subsequent execution
- Assigning the input/output files to the symbolic names defined in the COBOL PL/I application programs
- Logging

Application programs

Programs, usually coded in COBOL, that implement the functionality of the process.

The executable program resulting from the compilation of the source code is stored as a member of a partitioned library (PDS). A specific card in the JCL (JOBLIB / STEPLIB) identifies the libraries from which the programs are to be loaded.

The JES calls the main program of the process (defined in the EXEC card of the JCL), which in turn can call various subroutines statically or dynamically.

Data

Data is accessed mainly through the use of files (datasets) and relational databases (DB2).

The input and output files are defined in the programs by means of a symbolic name.

         SELECT LOAN ASSIGN TO "INPUT1"
         ORGANIZATION IS LINE SEQUENTIAL
         ACCESS IS SEQUENTIAL.

The assignment of symbolic names to read/write files is done in the JCL, via the DD card.

//*
//INPUT1   DD   DSN=DEV.APPL1.SAMPLE,DISP=SHR

The files are generally of one of the following types

Sequential, the records must be accessed sequentially, i.e. to read the 1000th record, the previous 999 records must be read first.
VSAM. There are different types of VSAM files, and it is possible to access the records directly using a key (KSDS) or a record number (RRDS).

In the case of access to a database (DB2), the information necessary for the connection (security, database name, etc.) is passed as parameters in the JCL.

Mainframe Batch Migration to Open Architecture

To migrate batch processes built on mainframe technology, we will replicate the functionality described above on a Kubernetes cluster.

It is therefore necessary to:

Convert the JCLs (JOBs) to a tool or framework that allows the execution of workflows on a Kubernetes platform.
Replicate the functionality of the JES to allow the scheduling and execution of COBOL PL/I programs on the Kubernetes cluster.
Recompile the application programs.
Provide access to data (files and databases).

1 - Converting JCLs

How to convert a JCL mainframe into a DAG?

Below is a simple example of how to convert a JCL into an Argo workflow (yaml).

Other frameworks or tools that allow the definition of DAGs and have native integration with the Kubernetes platform can be used.


apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
 name: batch-job-example
spec:
 entrypoint: job
 templates:
   - name: job
     dag:
       tasks:
         - name: extracting-data-from-table-a
           template: extractor
           arguments:
         - name: extracting-data-from-table-b
           template: extractor
           arguments:
         - name: extracting-data-from-table-c
           template: extractor
           arguments:
         - name: program-transforming-table-c
           dependencies: [extracting-data-from-table-c]
           template: exec
           arguments:
         - name: program-aggregating-data
           dependencies:
             [
               extracting-data-from-table-a,
               extracting-data-from-table-b,
               program-transforming-table-c,
             ]
           template: exec
           arguments:
         - name: loading-data-into-table1
           dependencies: [program-aggregating-data]
           template: loader
           arguments:
         - name: loading-data-into-table2
           dependencies: [program-aggregating-data]
           template: loader
           arguments:
   - name: extractor
   - name: exec
   - name: loader

Batch ETL process divided into three phases:

The extraction of information from a set of DB2 tables (template extractor).

Transforming and aggregating these tables using COBOL applications (template exec).

Loading the resulting information (template loader)

Each JOB is transformed into a DAG in which the sequence of tasks (STEPs) to be executed and their dependencies are defined.

Similarly to PROCS in the mainframe, it is possible to define templates with the main types of batch tasks of the installation (DB2 data download, execution of COBOL programs, file transfer, data conversion, etc.).

Each STEP within the DAG is executed in an independent container on a Kubernetes cluster.

Dependencies are defined at the task level in the DAG and non-linear execution trees can be built.

Result of the execution of the process, graphically displayed in Argo

2 - Replicate JES functionality

How to replicate how the JES works?

If you’re familiar with The Twelve-Factor App, you’ll know that one of its principles is to make the application code independent of any element that might vary when it’s deployed in different environments (test, quality, production, etc.).

Storing the configuration in the environment

An app’s config is everything that is likely to vary between deploys (staging, production, developer environments, etc)

The Twelve-Factor App. III Config

We can translate the information contained in the JCLs into configuration files (config.yml), which contain the necessary information for running the code in each of the environments defined in the installation (resource allocation, connection to the database, name and location of the input and output files, level of detail of the logging, etc.).

To understand what functionality we need to replicate, let’s divide a JCL into two parts:

JOB card
EXEC and DD cards


//JOB1    JOB (123),CLASS=C,MSGCLASS=S,MSGLEVEL=(1,1),NOTIFY=&SYSUID
//*
//STEP01   EXEC PGM=BCUOTA
//INPUT1   DD   DSN=DEV.APPL1.SAMPLE,DISP=SHR
//OUTPUT1  DD   DSN=DEV.APPL1.CUOTA,
//              DISP=(NEW,CATLG,DELETE),VOLUME=SER=SHARED,
//              SPACE=(CYL,(1,1),RLSE),UNIT=SYSDA,
//              DCB=(RECFM=FB,LRECL=80,BLKSIZE=800)
//*

JOB card

In the JOB card, we will find the basic information for scheduling the process in Kubernetes:

Information needed to classify the JOB (CLASS). Allows you to classify the types of JOBs according to their characteristics and assign different execution parameters to them.
Define default output (MSGCLASS).
The level of information to be sent to the std out (MSGLEVEL)
Maximum amount of memory allocated to the JOB (REGION)
Maximum estimated time for execution of the process (TIME)
User information (USER)
Etc.

In Kubernetes, the kube-scheduler component is responsible for performing these tasks. It searches for a node with the right characteristics to run the newly created pods.

There are several options;

Batch processes can use the Kubernetes job controller, it will run a pod for each task (STEP) of the workflow and stop it when the task is completed.
If more advanced functionality is required, such as defining and prioritising different execution queues, specialised schedulers such as Volcano can be used.
Finally, it is possible to develop a Kubernetes controller tailored to the specific needs of an installation.

EXEC & DD cards

In each STEP of the JCL we find an EXEC tab and several DD tabs.

It is in these cards that the (COBOL) program to be executed and the associated input and output files are defined. Below is an example of how to transform a STEP of JCL.

---
stepname: "step01"
exec:
 pgm: "bcuota"
dd:
 - name: "input1"
   dsn: "dev/appl1/sample.txt"
   disp: "shr"
   normaldisp: "catlg"
   abnormaldisp: "catlg"
 - name: "output1"
   dsn: "dev/appl1/cuota.txt"
   disp: "new"
   normaldisp: "catlg"
   abnormaldisp: "delete"

For program execution, EXEC and DD instructions are converted to YAML. This information is passed to the d8parti controller, which specialises in running batch programs.

The d8parti controller acts like the JES:

It is in charge of the syntax validation of the YAML file
It maps the symbolic names in COBOL programs to physical input/output files
Loads COBOL into memory for execution
Writes monitoring/logging information

3 - Program compilation

How to reuse mainframe application programs?

The mainframe COBOL PL/I programs are directly reusable on the targeted technical platform (Linux).

As mentioned above, the d8parti module will be responsible for the following tasks

Initialise the language runtime (i.e. COBOL)
Assign the input/output files to the symbolic names of the program
Loading and execution of the main program (defined in the EXEC tab of the JCL)

This main program can make various calls to other subroutines using a CALL statement. These calls are managed by the runtime of the language used.

We can visualise this operation as an inverted tree

Compiled programs can be stored in a shared directory and loaded at runtime (dynamic CALL), mimicking the IBM mainframe (STEPLIB).

However, it is possible to change the above behaviour and implement an immutable container model, which has several advantages over the above model. In this case, the previous execution tree should be functionally decomposed into one or more repos.

Modifying any of the components of these repos generates a new version of the same and the corresponding regeneration of the container(s) that use it.

With this strategy we achieve

Simplify the application development and testing process.
Enable incremental introduction of changes to the system, minimizing risks
Enable the portability of processes to different Cloud platforms (on-prem, on-cloud).

Once a business function has been isolated in a container with a standard interface, it can be modified or rewritten in any other programming language and deployed transparently without affecting the rest of the system.

4 - Data access

How to access data stored in SQL files and databases?

Files

In mainframe architecture, a Data Set is a set of related records stored in a UNIT / VOLUME.

To understand these concepts, we need to go back to the days when mass storage devices were based on tapes or cartridges. So when a process needed to access the information in a data set, the tape or cartridge had to be mounted in a UNIT and identified by a name or VOLUME.

Today, information resides on disk and does not need to be mounted/unmounted for access, we can compare mainframe VOLUMEs to an NFS share.

Different mount points can be defined for the application container to isolate the information and protect access (e.g. by environment, development and production). The containers are accessed via SDS (Software Define Storage) to decouple the storage from the process.

Finally, the mainframe files need to be transferred and converted (EBCDIC) into Linux files for use on the target platform. This process can be automated using off-the-shelf tools or using Spark data conversion processes.

SQL databases

The main mainframe database engine is IBM DB2, although other types of products (IMS DB, IDMS, Adabas) are still in use.

For DB2 applications, there are two main strategies for accessing data:

Replication of DB2 data on a new SQL database (e.g. PostgreSQL).
Accessing DB2 on the mainframe platform from the Kubernetes cluster using the Coexistence Proxy (DB2 Proxy).

In the first case, replication tools (e.g. IBM CDC) or ETL processes (e.g. using Spark) are used to replicate the data from the DB2 tables to a new SQL database.

The DB2 SQL statements (EXEC SQL … END-EXEC.) are pre-compiled to be able to access the new database manager, it is necessary to make small changes in the SQL to adapt it, but there is a methodology and tools to carry out this process automatically:

DDL replication (tablespaces, tables, indexes, columns, etc.)
Adapting the DATE/TIME data types.
SQLCODEs
Upload and download utilities
Etc

The main drawback of this strategy is the need to maintain the data integrity of the model, generally the referential integrity model of the database is not defined in the DB2 manager, it must be deduced by the logic of the applications.

All read/update processes that access the affected tables (whether batch or online) must either be migrated to the new platform or a coexistence/replication mechanism must be defined between the platforms (mainframe DB2 / next-gen SQL). This mechanism must maintain data integrity on both platforms until the migration process is complete.

For tables containing master data accessed by a large number of applications, this coexistence is particularly critical.

There is no need to maintain data integrity between platforms (mainframe / next-gen) if you choose to continue accessing DB2 mainframe through the coexistence proxy. Processes (online or batch) can be migrated one at a time and in stages (canary deployment).

Once the process of migrating the application programs (Online and Batch) has been completed, the data can be migrated to a new database on the target platform (Next-gen).