• No results found

Batch Processing How- To Or the The Single Threaded Batch Processing Paradigm

N/A
N/A
Protected

Academic year: 2021

Share "Batch Processing How- To Or the The Single Threaded Batch Processing Paradigm"

Copied!
37
0
0

Loading.... (view fulltext now)

Full text

(1)

Batch Processing How- To

Or the “The Single Threaded Batch Processing Paradigm”

Stefan Rufer, Netcetera

Matthias Markwalder, SIX Card Solutions 6840

(2)

Speakers

> St ef an Rufer

– St udied business IT at t he Universit y of Applied Sciences in Bern

– Senior Soft ware Engineer at Net cet era

– Main int erest : Server side applicat ion developm ent using JEE

> Mat t hias Markwalder

– Graduat ed from ETH Zurich

– Senior Developer + Fram ework Responsible at SIX Card Solut ions

(3)

3

Why are we here?

(4)

AGENDA

> What do we do

> Sharing our ex perience

(5)

5

What do we do

> Credit / debit card t ransact ion processing

> Backoff ice bat ch processing applicat ion 24x 7x 365

> 1.7 Mio card t ransact ions a day

> Volum e will double by end of 2010  be ready…

> Migrat ed from Fort é UDS t o JEE

(6)

How do we do it

> Transact ional int egrit y at any t im e

> Cust om bat ch processing fram ework (not Spring Bat ch)

> 1 cont roller  builds t he jobs

35 workers  process t he st eps of jobs

(or as m any as you want and your syst em can t ake)

> 1 applicat ion server (12 cores)

(7)

7

Batch Processing Basics

> It ‘ s sim ple, but parallel:

– Read file(s)

– Process a bit

– Writ e file(s)

> Term inology from Spring Bat ch

(8)

AGENDA

> What do we do

> Sharing our ex perience > Wrap up + Q&A

(9)

9

Bake an omelet

> 200g flour, 3 eggs, 2 dl m ilk, 2 dl wat er, ½ t able spoon salt

> St ir well, wait 30m in ( )

> St ir again

> Put lit t le but t er in heat ed pan

> Add 1dl dough

> Bake unt il slight ly brown, flip over, bake again half as long

> Put cheese / m arm alade / apfelm us / ... on t op, f old

(10)

Jobs run in parallel

Mot ivat ion

> Load balancing

Ex am ple

> Com plet e yest erdays report s while doing t oday's business

How t o achieve

> Use bat ch scheduling applicat ion t hat cont rols your ent ire processing.

(11)
(12)

Load limitations

Mot ivat ion

> Load balancing

Ex am ple

> Generat e 70 report s, but m ax 20 in parallel

How t o achieve

> Num ber of workers one job can use

(13)

13

Decouple controller + workers

Mot ivat ion

> Scalabilit y

Ex am ple

(14)

Mot ivat ion

> Avoid st ruct uring st eps in code

Ex am ple

> Collect dat a, af t erwards writ e a file.

How t o achieve

> Sequent ial ex ecut ion

> Fail on ex cept ion (rollback ent ire st ep)

(15)

15

Mot ivat ion

> Minim ize work left

Ex am ple

> Process 30'000

t ransact ions in 3 st eps.

How t o achieve

> Parallel ex ecut ion

> Cont inue on ex cept ion (st ill rollback ent ire st ep)

(16)

Mot ivat ion

> Speedup

Ex am ple

> A file of 200'000 credit card aut horisat ions and t ransact ions have t o be read int o dat abase.

How t o achieve

> Cut input file in pieces of 10'000 lines each.

– bt w: perl, sort are unbeat en for t his...

> Process each piece in a parallel st ep.

(17)

17

Parallelize processing

Motivat ion

> Speedup

Ex ample

> Summarize accounting data and

store result in database again.

How to achieve

> Group data in chunks of 10'000 and process each chunk in a parallel step.

> Choose grouping criteria carefully:

– No overlapping data areas

(18)

Parallelize processing – how to group

Motivat ion

> Structuring your data in parallelizable chunks

> Load balancing

Ex ample

> Parallelize processing by client as data is distinct by design.

How to achieve

> Group by client

> Group by keys: Ranges or ids

– Ranges (1..5) can grow very large

(19)

19

Parallelize writing

Mot ivat ion

> Transact ional int egrit y while writ ing files.

> Easy recovery while writ ing files.

Ex am ple

> Collect dat a f or t he paym ent file.

How t o achieve

> Collect dat a in parallel and writ e t o a st aging t able.

> St aging t able cont ent very close t o t arget file form at .

(20)

Different processes write in parallel

Mot ivat ion

> Don't lock out each ot her Ex am ple

> Account inf orm at ion changes while account balance grows. How t o achieve

> No opt im ist ic locking

> Modify delt as on sum s and count ers

> Keep dist inct f ields f or diff erent parallel jobs

(21)

21

Avoid insert and update in same table and

step

Mot ivat ion

> Speedup

> Avoid DB locks

Ex am ple

> Sum mary rows in sam e t able as t he raw dat a.

How t o achieve

(22)

Let the database work for you

Mot ivat ion

> Simple code

> Speedup

Ex am ple

> Sort ing or joining arrays in m em ory.

How t o achieve

> Code review.

(23)

23

Read long, write short

Motivat ion

> Keep lock contention on database minimal

> Keep transactional DB overhead minimal

Ex ample

> Fully process the whole batch of 1‘000 records before starting to write to

DB.

How to achieve

> 1 (one) "writing" database transaction per step.

interface IModifyingStepRunner {

void prepareData();

void writeData();

(24)

This omelet did not taste like grandma's!

> Despite following the recipe, there are the hidden corners

(25)

25

Don't forget to catch Error

Motivat ion

> Application int egrity delegated to DB

Ex ample

> OutOfMemoryError caused half of a batch to be committed. Fatal as rerun

can not fix inconsistency. How to fix

try {

result = action.doInTransaction(status); } catch (Throwable err) {

transactionManager.rollback(status);

throw err; }

(26)

Use BufferedReader / BufferedWriter

Mot ivat ion

> Speedup (file reading t im e cut in half)

Ex am ple

> Forgot t o use Buff eredReader in file reading f ram ework.

How t o f ix

> Code review.

(27)

27

Use 1 thread only

Mot ivat ion

> Simplicit y for t he program m er

> Saf et y (no concurrent access)

Ex am ple

> Singlet on, synchronized blocks, st at ic variables, st at ef ul st ep runners – we had it all...

How t o achieve

(28)

Cache wisely

Mot ivat ion

> Speedup

> Lim it m em ory use Ex am ple

> Tax rat es do not change during a processing day, cache it long.

> Cust omer dat a will be reused if processing t ransact ion of sam e cust om er – cache it short .

How t o achieve

> Cache per worker

(29)

29

Support JDBC batch operations

Mot ivat ion

> Speedup Ex am ple

List<Booking> bookings = new ArrayList<Booking>(); ...

bookingDao.update(bookings);

How t o achieve

> Enhance your dat abase layer wit h a built - in JDBC bat ch f acilit y.

> Ex ecut e bat ch aft er 1000 it em s added.

(30)

Structured patching

Mot ivat ion

> Risk m anagem ent

> St ay agile in product ion

Ex am ple

> Bug f ound, fix ed and unit t est ed. Deploy t o product ion asap.

How t o achieve

> Eclipse- wizard t o creat e pat ch (all f iles involved t o fix a bug)

(31)

31

Never, ever, update primary keys

Mot ivat ion

> Good dat abase design

> Speedup

Ex am ple

> Hom em ade library always wrot e ent ire row t o dat abase.

How t o f ix

> Only writ e changed f ields (dirt y flags).

(32)

AGENDA

> What do we do

> Sharing our ex perience

(33)

33

Future

> Scalabilit y is an issue wit h a single dat abase server.

– Part it ioning opt ions used, but not t o t he end.

– Will Moore's law save us again?

(34)

If you remember just three things...

Java batch processing works and is cool :- )

Trade- offs:

>

Do not stock the work, start.

>

Single threaded, many JVMs.

>

Designing for scalability, stability needs experts.

(35)

Stefan Rufer st efan.rufer@net cet era.ch

Netcetera AG www.net cet era.ch

Matthias Markwalder m at t hias.m arkwalder@six

-group.com

(36)

Links / References

> ht t p:/ / en.wikipedia.org/ wiki/ Bat ch_processing

> ht t p:/ / st at ic.springf ramework.org/ spring- bat ch/

> ht t p:/ / www.bm c.com / product s/ off ering/ cont rol- m .ht m l

> ht t p:/ / www.javaspecialist s.eu/

And t o really learn how t o bake f ine om elet s, buy a book:

> ht t p:/ / de.wikipedia.org/ wiki/ Marianne_Kalt enbach

(37)

37

Other batch processing frameworks (public

only)

> http:/ / www.bmap4j.org/

> http:/ / freshmeat.net/ projects/ jppf > http:/ / hadoop.apache.org/

References

Related documents