• No results found

High-Performance Batch Processing Framework

N/A
N/A
Protected

Academic year: 2021

Share "High-Performance Batch Processing Framework"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

High-Performance Batch Processing

Framework

It is hard to find a mid to large sized business today that does not have at least a batch job or process that runs independent of the web application running within a web browser. In most businesses, in fact, there are several batch jobs or processes that automate core, high volume business tasks like mass mailing reports to

customers, creating nightly reports for all customers, processing and/or transmitting data (files) sent from/to external partners (interface data processing),

importing/exporting bulk data in and out of web applications/databases. The characteristics of the typical “Batch Process” include:

• A long-running process that must occur on a regularly scheduled basis (say month-end, midnight each day, etc.,)

• Process running asynchronously from user interaction, i.e. it is not part of a user session in an online system. In most cases, a user does not start it and is not waiting on it to complete. Sometimes, a user initiates it by clicking on a button or link within an online application but does not want to be held up from doing anything else within the application but wants a high-volume processing task to be completed as soon as possible and wants to be notified when the processing is completed. (“Real-time or “Near, Real-time”

Asynchronous Processing)

• There may be complex logic or calculations to perform on the data.

• The volume of data to be processed is high, usually on the order of tens of thousands to millions of records.

• The process may require a large set of data from an external system that is delivered in sets on a schedule.

Technology existed for a long time to develop, deploy and operate batch jobs for maximum performance. Multi-core processors are standard these days in any servers. Memory has become very inexpensive and most servers come with high memory (RAM and disk space). Most programming languages like Java and .NET

(2)

In spite of the availability of the technology to build scalable and reliable Batch Processes, most businesses have been unable to reap its full benefits because of the following reasons:

• Setting out on the wrong foot: Batch programs, in most projects, get developed and deployed as a "Process" with no multi-threading support. Only upon deployment into Production/UAT (or if you are fortunate, during the late phases of QA where performance testing may get done) it becomes obvious that the business/performance goals will not be met unless

parallelism is built into the batch processing by means of multi-threading support. This leads to unanticipated and costly delays and lot of rework on the code.

• I’m not comfortable developing multi-threaded programs Syndrome: Programming with threads and other technologies [like JMS, RMI] is very complex, tricky and error-prone. Most developers are not experienced in and not comfortable with building multi-threaded applications. JDK 5.0 made significant improvements in the area of multi-threading support. But even with those improvements, the complexities of developing multi-threading batch jobs have not gone away. Additionally, since a majority of the

business applications written in the past decade and a half have been written for deployment on Web servers, developers have relied on these Web

servers to handle issues like scalability and multi-threading. This reliance has reduced the experience level (to non-existent in some cases) of developers in terms of these skills.

• Debugging headaches – I don’t understand why my code does not

work: Unwitting programmers often developed code that introduced

hard-to-debug programming errors. Without strong foundation of operating system principles like threads, deadlocks, resource pools, CPU utilization, memory management, etc., it is very hard to debug problems.

• Where does the buck stop?: Tuning the performance of the batch jobs for maximum performance requires thorough understanding of the hardware and the operating system and other software components running on that hardware. Most software service providers and most teams lack well-coordinated communication/collaboration between developers, architects, network and system engineers, quality engineers and business leaders that is required to "tune" the programs for maximum performance.

(3)

• Tuning is not my responsibility mindset: Most development shops have the mindset that configuring the batch programs for optimum performance is not their responsibility and that it is the sole responsibility of infrastructure team. Since the infrastructure team has limited knowledge of how the application is architected or built they are unable to tune the batch jobs for optimum performance.

• Operational Challenges:

o Is it ok to kill a Batch Job? In most cases when software upgrades or patches have to be pushed out to production and the batch jobs that are running have to be brought down, system/network engineers managing the production systems do not have a clue as to whether or not it is ok to kill a process that is executing the batch job. No visibility exists to see if the batch job is currently processing some business tasks or if it is just in the sleep (wait) mode. This can lead to

unnecessary and sometimes costly errors where the transaction does not get processed completely.

o No support to adjust the amount of audit information logged: Operationally, when problems occur (like processes running slow or processes not running at all or abruptly stopping), system/network engineers and help desk personnel struggle to identify the source of the problem. In most situations, either too much unnecessary

information is being written into the application’s audit log files or too little useful information is being written into the log files. Most

developers (or development shops) build in support for infrastructure support or help desk team to adjust the level of audit trail information on the fly to help triage the issue towards resolution faster.

o No automated/emailed alerts & notifications to the Business

managers and End-users: Most development shops or developers

do not build support within batch jobs for automated

alerts/notifications to be sent to Business users (via email, SMS etc) when problems occur within batch jobs. Keeping business managers and end-users abreast of slowdowns or delays in processing can go a long way in managing end customer expectations. This also

significantly reduces the unnecessary load on the application/IT help desk and developers in “manually” handling the huge in-flow of

(4)

end-o Lack end-of cend-onsend-olidated suppend-ort tend-o perfend-orm a Full Health Check end-or a

Pulse Check: Most development shops or developers do not build

support within batch jobs for system/network engineers to do a health check of some or all of the vital stats of the batch job like memory management, CPU utilization, number of active threads, processing speed, etc., This leaves the system engineers to rely of numerous mostly expensive monitoring tools and utilities to check each vital stat separately making it difficult to visualize the big picture.

The net result of one or more of the above reasons is that businesses spend significant portions of their operational and development/maintenance budgets in maintaining and managing the numerous batch jobs that automate the core business tasks.

Tripod’s ‘High Performance Batch Processing Framework’ helps companies reduce the Total Cost of Ownership (TCO) of its batch jobs Tripod leverages this framework and its Distributed Agile Development Process to address each of the challenges faced in building and managing batch jobs or processes.

Tripod’s High-Performance Batch Processing Framework is a Java-based framework that encapsulates the “collective experience” of Tripod’s architects, developers, quality assurance engineers, help desk personnel, and

systems/network engineers on numerous, successfully delivered and/or managed client implementations/projects that involved complex, high-volume transactional systems and batch processes. The framework “out-of-the-box” provides the necessary plumbing that is needed for a batch job to be developed economically and rapidly, and for the batch job to operationally perform, scale and be operated with ease in production. This framework “efficiently” addresses the following core aspects of any Batching framework:

Job control (start and stop; immediate shut-down and graceful shut-down)Job partitioning (partition or breakup your job into smaller chunks of work)Parallel processing and distribution (multi-threading at two levels and

distribution of load onto multiple java virtual machines, java application servers, and physical/virtual servers)

Fine-grained transaction control (Each chunk of work gets completed or

doesn’t; ensure that work does not get completed partially and that data does not get corrupted)

(5)

Error handling (understanding when an error occurs and what caused it,

error notification, controlling whether the job chunk continues after error and the overall job stops or continues).

Job monitoring (mechanisms to check the status of the job to address

questions like “is the job complete?” “what portion of the job is complete?”, “how much more time is needed to complete a job?”, etc.,)

The key benefits that Tripod’s Batch Framework offers to its Clients are:

 Reduction in development time for new batch jobs: Each time a new batch job needs to be developed, development is just limited to programming the specific business logic and not the internal “plumbing” (multithreading, memory management, shared memory, resource pooling, scheduling, run time audit trail information logging, performance, etc.,) required to execute and manage the batch jobs.

 Reduction in QA time for new batch jobs: Since Tripod’s framework is performance and time-tested, testing new batch jobs should be essentially a simple task of testing the business specific features that are batch job

specific. Lot of time does not have to be spent on testing the “plumbing” features like parallelism of threads, memory management, CPU

management, etc.,

 Improved Quality of Tripod deliverables: It is just a direct benefit gained by using time-tested reusable components.

 Improved End-User/Client Satisfaction Levels: It is just a direct benefit gained by not having bugs go into UAT or production and clients seeing well-performing and scalable batch jobs.

 Improved Reduced Cost of Development and Deployment: Tripod can develop, perform quality assurance and deliver high-transactional batch jobs faster and with less number of resources. The cost savings achieved are directly passed on to Tripod’s Clients.

(6)

 Operational efficiencies: In production, we have numerous benefits: 1. Better Performance and Scalability: Overall the jobs will perform

better and will be more scalable. By adjusting parameters like concurrent thread pool size within text-format and human-readable configuration files, the performance of the batch jobs can be tuned dynamically without any code-compile-release cycles

2. Better Operational Visibility: The framework supports ‘Run-time Inquiry’, a feature which allows the system/network engineers to improve operational visibility by easily integrating with other monitoring services/tools.

3. Better Manageability: Graceful Shutdown and Health Check are standard features that come with the framework for all batch jobs. The framework enables systems/network engineers to ‘on the fly’ query the batch job if it is busy processing business tasks and/or to pass on a message (signal) to the batch job to “gracefully” shut down on its own when it’s completed all tasks that it is currently processing but not take on additional tasks. System/Network engineers can use these features out of the box, with no additional coding to efficiently and economically manage the batch jobs in production.

© 2009 Tripod Technologies, LLC ALL RIGHTS RESERVED

Copyright in whole and in part of this document “High Volume Batch Processing Framework)” belongs to Tripod Technologies, LLC. This work may not be used, sold, transferred, adopted, abridged, copied or reproduced in whole or in part in any manner or form or in any media without the prior written consent of Tripod Technologies, LLC.

References

Related documents

Since no lease is taken under Certified Inventory Control (as compared with field warehousing), the agreement provides that ACE GLOBAL shall have full right of ingress and egress

The figure for the comparative period includes EUR 4 thousand attributable to DIGIDIS S.L., which was deconsolidated with effect from November 30, 2013.. After elimination of

Public health & food Research design Organisations in context Organisation in context Organisations in context Organisations in context Meso-level Meso-level

dari hasil beberapa penelitian diatas menunjukan bahwa pemberian ransum ayam petelur rendah protein yang cukup asam amino dengan penambahan enzym mannanase

• ALMA School of Nursing and Allied Health – Alexandria- PN program • Virginia School of Nursing and Medical Institute- Springfield- PN program.. Request for

Attainable distances may vary by up to 30 % due to component tolerances, mounting con- ditions, ambient conditions and material quali- ties (especially when mounted in metal) Testing

Borda voting is one kind of majority judgment voting where voters can express their opinion by ranking the candidates.. From that time it is seen that candidates are elected and

To analyze only endogenous subclinical hypothyroidism, we excluded 253 participants in the Cardiovascular Health Study, 207 in the Health ABC Study, 43 in the Osteoporotic Fractures