• No results found

8.0 Environment Variables

N/A
N/A
Protected

Academic year: 2021

Share "8.0 Environment Variables"

Copied!
933
0
0

Loading.... (view fulltext now)

Full text

(1)

WebSphere

®

DataStage

and

QualityStage

Parallel

Job

Advanced

Developer

Guide

Version8.0.1

(2)
(3)

WebSphere

®

DataStage

and

QualityStage

Parallel

Job

Advanced

Developer

Guide

Version8.0.1

(4)

Note

Beforeusingthisinformationandtheproductthatitsupports,besuretoreadthegeneralinformationunder“Noticesand trademarks”onpage899.

(5)

Contents

Chapter

1.

Introduction

.

.

.

.

.

.

.

. 1

Terminology . . . 1

Chapter

2.

Job

design

tips

.

.

.

.

.

.

. 3

WebSphereDataStageDesignerinterface . . . 3

Processinglargevolumesofdata . . . 3

Modulardevelopment . . . 3

Designingforgoodperformance. . . 4

Combiningdata . . . 4

Sortingdata . . . 5

Defaultandexplicittypeconversions . . . 5

UsingTransformerstages . . . 7

UsingSequentialFilestages . . . 7

UsingDatabasestages . . . 8

Databasesparselookupvs.join . . . 8

DB2databasetips. . . 8

Oracledatabasetips . . . 9

TeradataDatabaseTips . . . 9

Chapter

3.

Improving

performance

.

.

. 11

Understandingaflow . . . 11

Scoredumps . . . 11

Examplescoredump . . . 11

Tipsfordebugging . . . 12

Performancemonitoring . . . 12 Jobmonitor . . . 12 Iostat . . . 13 Loadaverage. . . 13 Runtimeinformation . . . 13 Performancedata . . . 14

OS/RDBMSspecifictools. . . 15

Performanceanalysis . . . 15

Selectivelyrewritingtheflow . . . 16

Identifyingsuperfluousrepartitions . . . 16

Identifyingbufferingissues . . . 16

Resourceestimation . . . 17

Creatingamodel . . . 17

Makingaprojection . . . 19

Generatingaresourceestimationreport . . . . 20

Examplesofresourceestimation . . . 20

Resolvingbottlenecks . . . 23

Choosingthemostefficientoperators. . . 23

Partitionerinsertion,sortinsertion. . . 24

CombinableOperators. . . 24

DiskI/O . . . 24

Ensuringdataisevenlypartitioned . . . 25

Buffering . . . 25

Platformspecifictuning . . . 25

Tru64 . . . 25

HP-UX . . . 26

AIX . . . 26

Diskspacerequirementsofpost-release7.0.1data sets . . . 26

Chapter

4.

Link

buffering

.

.

.

.

.

.

. 27

Bufferingassumptions. . . 27

Controllingbuffering . . . 28

Bufferingpolicy . . . 28

Overridingdefaultbufferingbehavior . . . . 28

Operatorswithspecialbufferingrequirements. . 29

Chapter

5.

Specifying

your

own

parallel

stages

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. 31

Definingcustomstages . . . 31

Definingcustomstages . . . 35

Definingbuildstages . . . 38

Buildstagemacros . . . 43

Informationalmacros . . . 43

Flow-controlmacros . . . 43

Inputandoutputmacros . . . 43

TransferMacros . . . 44

Howyourcodeisexecuted . . . 44

Inputsandoutputs . . . 44

ExampleBuildStage . . . 46

Definingwrappedstages . . . 46

Examplewrappedstage . . . 50

Chapter

6.

Environment

Variables

.

.

. 51

Buffering . . . 55 APT_BUFFER_FREE_RUN . . . 55 APT_BUFFER_MAXIMUM_MEMORY . . . . 56 APT_BUFFER_MAXIMUM_TIMEOUT . . . . 56 APT_BUFFER_DISK_WRITE_INCREMENT. . . 56 APT_BUFFERING_POLICY . . . 56 APT_SHARED_MEMORY_BUFFERS . . . 56

BuildingCustomStages . . . 56

DS_OPERATOR_BUILDOP_DIR . . . 56 OSH_BUILDOP_CODE . . . 57 OSH_BUILDOP_HEADER . . . 57 OSH_BUILDOP_OBJECT . . . 57 OSH_BUILDOP_XLC_BIN . . . 57 OSH_CBUILDOP_XLC_BIN . . . 57 Compiler . . . 57 APT_COMPILER . . . 57 APT_COMPILEOPT . . . 57 APT_LINKER . . . 58 APT_LINKOPT . . . 58 DB2Support . . . 58 APT_DB2INSTANCE_HOME . . . 58 APT_DB2READ_LOCK_TABLE. . . 58 APT_DBNAME . . . 58 APT_RDBMS_COMMIT_ROWS . . . 58 DB2DBDFT . . . 58 Debugging . . . 58 APT_DEBUG_OPERATOR . . . 59 APT_DEBUG_MODULE_NAMES . . . 59 APT_DEBUG_PARTITION . . . 59 APT_DEBUG_SIGNALS . . . 59 APT_DEBUG_STEP. . . 59 APT_DEBUG_SUBPROC . . . 60

(6)

APT_EXECUTION_MODE . . . 60 APT_PM_DBX . . . 60 APT_PM_GDB . . . 60 APT_PM_LADEBUG . . . 60 APT_PM_SHOW_PIDS . . . 60 APT_PM_XLDB . . . 61 APT_PM_XTERM . . . 61 APT_SHOW_LIBLOAD . . . 61 Decimalsupport. . . 61 APT_DECIMAL_INTERM_PRECISION . . . . 61 APT_DECIMAL_INTERM_SCALE. . . 61 APT_DECIMAL_INTERM_ROUND_MODE . . 61 DiskI/O . . . 61 APT_BUFFER_DISK_WRITE_INCREMENT. . . 61 APT_CONSISTENT_BUFFERIO_SIZE. . . 61 APT_EXPORT_FLUSH_COUNT . . . 62 APT_IO_MAP/APT_IO_NOMAPand APT_BUFFERIO_MAP/ APT_BUFFERIO_NOMAP . . . 62 APT_PHYSICAL_DATASET_BLOCK_SIZE . . . 62

GeneralJobAdministration . . . 62

APT_CHECKPOINT_DIR. . . 62 APT_CLOBBER_OUTPUT . . . 62 APT_CONFIG_FILE . . . 62 APT_DISABLE_COMBINATION . . . 62 APT_EXECUTION_MODE . . . 63 APT_ORCHHOME . . . 63 APT_STARTUP_SCRIPT . . . 63 APT_NO_STARTUP_SCRIPT . . . 63 APT_STARTUP_STATUS . . . 64 APT_THIN_SCORE. . . 64 JobMonitoring . . . 64 APT_MONITOR_SIZE. . . 64 APT_MONITOR_TIME . . . 64 APT_NO_JOBMON. . . 64 APT_PERFORMANCE_DATA . . . 64 Lookupsupport . . . 64 APT_LUTCREATE_MMAP . . . 64 APT_LUTCREATE_NO_MMAP . . . 65 Miscellaneous . . . 65 APT_COPY_TRANSFORM_OPERATOR. . . . 65 APT_DATE_CENTURY_BREAK_YEAR . . . . 65 APT_EBCDIC_VERSION . . . 65 APT_IMPEXP_ALLOW_ZERO_LENGTH_FIXED_NULL65 APT_IMPORT_REJECT_STRING_FIELD_OVERRUNS65 APT_INSERT_COPY_BEFORE_MODIFY. . . . 66 APT_OLD_BOUNDED_LENGTH . . . 66 APT_OPERATOR_REGISTRY_PATH . . . 66 APT_PM_NO_SHARED_MEMORY . . . 66 APT_PM_NO_NAMED_PIPES . . . 66 APT_PM_SOFT_KILL_WAIT . . . 66 APT_PM_STARTUP_CONCURRENCY . . . . 66 APT_RECORD_COUNTS. . . 66 APT_SAVE_SCORE. . . 67 APT_SHOW_COMPONENT_CALLS . . . 67 APT_STACK_TRACE . . . 67 APT_WRITE_DS_VERSION . . . 67 OSH_PRELOAD_LIBS. . . 67 Network . . . 67 APT_IO_MAXIMUM_OUTSTANDING . . . . 68 APT_IOMGR_CONNECT_ATTEMPTS . . . . 68 APT_PM_CONDUCTOR_HOSTNAME . . . . 68 APT_PM_NO_TCPIP . . . 68 APT_PM_NODE_TIMEOUT. . . 68 APT_PM_SHOWRSH . . . 68 APT_PM_USE_RSH_LOCALLY. . . 68 NLSSupport . . . 68 APT_COLLATION_SEQUENCE . . . 68 APT_COLLATION_STRENGTH . . . 69 APT_ENGLISH_MESSAGES. . . 69 APT_IMPEXP_CHARSET. . . 69 APT_INPUT_CHARSET . . . 69 APT_OS_CHARSET . . . 69 APT_OUTPUT_CHARSET . . . 69 APT_STRING_CHARSET. . . 69 OracleSupport . . . 69 APT_ORACLE_LOAD_DELIMITED . . . 69 APT_ORACLE_LOAD_OPTIONS . . . 70 APT_ORACLE_NO_OPS . . . 70 APT_ORACLE_PRESERVE_BLANKS. . . 70 APT_ORA_IGNORE_CONFIG_FILE_PARALLELISM70 APT_ORA_WRITE_FILES . . . 70 APT_ORAUPSERT_COMMIT_ROW_INTERVAL APT_ORAUPSERT_COMMIT_TIME_INTERVAL . 70 Partitioning . . . 71 APT_NO_PART_INSERTION . . . 71 APT_PARTITION_COUNT . . . 71 APT_PARTITION_NUMBER. . . 71

Readingandwritingfiles. . . 71

APT_DELIMITED_READ_SIZE. . . 71 APT_FILE_IMPORT_BUFFER_SIZE . . . 71 APT_FILE_EXPORT_BUFFER_SIZE . . . 72 APT_IMPORT_PATTERN_USES_FILESET . . . 72 APT_MAX_DELIMITED_READ_SIZE . . . . 72 APT_PREVIOUS_FINAL_DELIMITER_COMPATIBLE72 APT_STRING_PADCHAR . . . 72 Reporting . . . 72 APT_DUMP_SCORE . . . 72 APT_ERROR_CONFIGURATION . . . 72 APT_MSG_FILELINE . . . 74 APT_PM_PLAYER_MEMORY . . . 74 APT_PM_PLAYER_TIMING. . . 74 APT_RECORD_COUNTS. . . 74 OSH_DUMP . . . 75 OSH_ECHO . . . 75 OSH_EXPLAIN . . . 75 OSH_PRINT_SCHEMAS . . . 75 SASSupport . . . 75 APT_HASH_TO_SASHASH . . . 75 APT_NO_SASOUT_INSERT . . . 75 APT_NO_SAS_TRANSFORMS . . . 75 APT_SAS_ACCEPT_ERROR. . . 75 APT_SAS_CHARSET . . . 76 APT_SAS_CHARSET_ABORT . . . 76 APT_SAS_COMMAND . . . 76 APT_SASINT_COMMAND . . . 76 APT_SAS_DEBUG . . . 76 APT_SAS_DEBUG_IO . . . 76 APT_SAS_DEBUG_LEVEL . . . 76 APT_SAS_DEBUG_VERBOSE . . . 76 APT_SAS_NO_PSDS_USTRING . . . 77 APT_SAS_S_ARGUMENT . . . 77

(7)

APT_SAS_SCHEMASOURCE_DUMP. . . 77 APT_SAS_SHOW_INFO . . . 77 APT_SAS_TRUNCATION . . . 77 Sorting . . . 77 APT_NO_SORT_INSERTION . . . 77 APT_SORT_INSERTION_CHECK_ONLY . . . 77 TeradataSupport . . . 78 APT_TERA_64K_BUFFERS . . . 78 APT_TERA_NO_ERR_CLEANUP . . . 78 APT_TERA_NO_SQL_CONVERSION . . . . 78 APT_TERA_NO_PERM_CHECKS . . . 78 APT_TERA_SYNC_DATABASE. . . 78 APT_TERA_SYNC_PASSWORD . . . 78 APT_TERA_SYNC_USER. . . 78 TransportBlocks. . . 78 APT_AUTO_TRANSPORT_BLOCK_SIZE . . . 79 APT_LATENCY_COEFFICIENT . . . 79 APT_DEFAULT_TRANSPORT_BLOCK_SIZE . . 79 APT_MAX_TRANSPORT_BLOCK_SIZE/ APT_MIN_TRANSPORT_BLOCK_SIZE . . . . 79

Guidetosettingenvironmentvariables . . . 79

Environmentvariablesettingsforalljobs . . . 79

Optionalenvironmentvariablesettings . . . . 79

Chapter

7.

Operators

.

.

.

.

.

.

.

.

. 81

StagetoOperatorMapping . . . 81

Changeapplyoperator. . . 84

Dataflowdiagram . . . 84

Properties . . . 85

Schemas . . . 86

Syntaxandoptions. . . 86

Example . . . 89

Changecaptureoperator . . . 91

Dataflowdiagram . . . 91

Keyandvaluefields . . . 91

Syntaxandoptions. . . 92

Example1:alloutputresults . . . 95

Example2:droppingoutputresults . . . 96

Compareoperator . . . 97

Dataflowdiagram . . . 97

Properties . . . 97

Syntaxandoptions. . . 99

Example1:runningthecompareoperatorin parallel . . . 101

Example2:runningthecompareoperator sequentially . . . 101

Copyoperator . . . 101

Dataflowdiagram . . . 102

Properties . . . 102

Syntaxandoptions . . . 102

PreventingWebSphereDataStagefrom removingacopyoperator . . . 103

Example1:Thecopyoperator. . . 104

Example2:runningthecopyoperator sequentially . . . 104

Diffoperator . . . 105

Dataflowdiagram . . . 105

Properties . . . 105

Transferbehavior . . . 106

Syntaxandoptions . . . 107

Example1:generalexample . . . 110

Example2:DroppingOutputResults . . . . 111

Encodeoperator . . . 112

Dataflowdiagram . . . 112

Properties . . . 112

Syntaxandoptions . . . 113

EncodingWebSphereDataStagedatasets . . . 113

Example . . . 114

Filteroperator . . . 114

Dataflowdiagram . . . 114

Properties . . . 115

Syntaxandoptions . . . 115

Jobmonitoringinformation. . . 116

Expressions . . . 117

Inputdatatypes . . . 118

Example1:comparingtwofields. . . 119

Example2:testingforanull . . . 120

Example3:evaluatinginputrecords. . . 120

Jobscenario:mailinglistforawineauction . . 120

Funneloperators . . . 122

Dataflowdiagram . . . 122

Properties . . . 122

Funneloperator . . . 123

Sortfunneloperators. . . 123

Generatoroperator . . . 126

Dataflowdiagram . . . 127

Properties . . . 127

Syntaxandoptions . . . 127

Usingthegeneratoroperator . . . 128

Example1:usingthegeneratoroperator . . . 129

Example2:executingtheoperatorinparallel 130 Example3:usinggeneratorwithaninputdata set . . . 130

Definingtheschemafortheoperator . . . . 131

Timestampfields . . . 136

Headoperator . . . 137

Dataflowdiagram . . . 138

Properties . . . 138

Syntaxandoptions . . . 138

Example1:headoperatordefaultbehavior . . 139

Example2:extractingrecordsfromalargedata set . . . 139

Example3:locatingasinglerecord . . . 140

Lookupoperator . . . 140

Dataflowdiagrams . . . 140

Properties . . . 141

Syntaxandoptions . . . 142

Partitioning . . . 145

Create-onlymode . . . 145

Example1:singlelookuptablerecord . . . . 146

Example2:multiplelookuptablerecord . . . 146

Example3:interestratelookupexample . . . 147

Example4:handlingduplicatefieldsexample 148 Mergeoperator. . . 149

Dataflowdiagram . . . 150

Properties . . . 150

Syntaxandoptions . . . 150

Mergingrecords . . . 152

Understandingthemergeoperator . . . 154

Example1:updatingnationaldatawithstate data . . . 157

(8)

Jobscenario:galacticindustries . . . 161

Missingrecords . . . 163

Modifyoperator . . . 165

Dataflowdiagram . . . 165

Properties . . . 166

Syntaxandoptions . . . 166

Transferbehavior . . . 167

Avoidingcontiguousmodifyoperators . . . . 167

Performingconversions . . . 167

Allowedconversions . . . 198

pcompressoperator . . . 200

Dataflowdiagram . . . 200

Properties . . . 200

Syntaxandoptions . . . 200

Compresseddatasets . . . 201

Example . . . 202

Peekoperator . . . 203

Dataflowdiagram . . . 203

Properties . . . 203

Syntaxandoptions . . . 204

Usingtheoperator . . . 205

PFTPoperator . . . 206

Dataflowdiagram . . . 206

Operatorproperties . . . 207

Syntaxandoptions . . . 207

Restartability . . . 212

Remdupoperator . . . 213

Dataflowdiagram . . . 213

Properties . . . 213

Syntaxandoptions . . . 214

Removingduplicaterecords . . . 215

Usingoptionstotheoperator . . . 215

Usingtheoperator . . . 217

Example1:usingremdup . . . 217

Example2:usingthe-lastoption. . . 217

Example3:case-insensitivestringmatching . . 217

Example4:usingremdupwithtwokeys . . . 218

Sampleoperator . . . 218

Dataflowdiagram . . . 218

Properties . . . 218

Syntaxandoptions . . . 219

Examplesamplingofadataset . . . 220

Sequenceoperator. . . 221

Dataflowdiagram . . . 221

Properties . . . 221

Syntaxandoptions . . . 222

ExampleofUsingthesequenceOperator . . . 222

Switchoperator . . . 223

Dataflowdiagram . . . 223

Properties . . . 224

Syntaxandoptions . . . 224

Jobmonitoringinformation. . . 228

Examplemetadataandsummarymessages . . 228

Customizingjobmonitormessages . . . 229

Tailoperator. . . 229

Dataflowdiagram . . . 229

Properties . . . 229

Syntaxandoptions . . . 230

Example1:tailoperatordefaultbehavior . . . 230

Example2:tailoperatorwithbothoptions . . 230

Transformoperator . . . 231

Runningyourjobonanon-NFSMPP . . . . 231

Dataflowdiagram . . . 231

Properties . . . 231

Syntaxandoptions . . . 232

Transferbehavior . . . 240

Thetransformationlanguage . . . 241

ThetransformationlanguageversusC . . . . 277

Usingthetransformoperator . . . 278

Example1:student-scoredistribution . . . . 278

Example2:student-scoredistributionwitha lettergradeaddedtoexample. . . 281

Example3:student-scoredistributionwitha classfieldaddedtoexample . . . 284

Example4.studentrecorddistributionwithnull scorevaluesandareject. . . 287

Example5.studentrecorddistributionwithnull scorevalueshandled . . . 289

Example6.studentrecorddistributionwith vectormanipulation . . . 292

Example7:studentrecorddistributionusing sub-record . . . 296

Example8:externalCfunctioncalls. . . 299

Writerangemapoperator. . . 301

Dataflowdiagram . . . 301

Properties . . . 301

Syntaxandoptions . . . 302

Usingthewriterangeoperator. . . 303

Chapter

8.

The

import/export

library

305

Recordschemas . . . 306

Example1:importschema . . . 306

Example2:exportschema . . . 306

Fieldandrecordproperties. . . 307

Completeandpartialschemas. . . 308

Implicitimportandexport . . . 312

Errorhandlingduringimport/export . . . . 315

ASCIIandEBCDICconversiontables . . . . 316

Importoperator . . . 322

Dataflowdiagram . . . 322

Properties . . . 322

Syntaxandoptions . . . 323

Howtoimportdata . . . 330

Example1:importingfromasingledatafile 332 Example2:importingfrommultipledatafiles 333 Exportoperator . . . 334

Dataflowdiagram . . . 334

Properties . . . 335

Syntaxandoptions . . . 336

Howtoexportdata . . . 340

Example1:datasetexporttoasinglefile . . . 344

Example2:DataSetExporttoMultiplefiles . . 345

Import/exportproperties . . . 346

Settingproperties . . . 346

Properties . . . 348

Properties:referencelisting. . . 355

Chapter

9.

The

partitioning

library

.

. 401

Theentirepartitioner. . . 401

Usingthepartitioner . . . 402

(9)

Properties . . . 402

Syntax. . . 403

Thehashpartitioner . . . 403

Specifyinghashkeys . . . 404

Example . . . 404

Usingthepartitioner . . . 405

Dataflowdiagram . . . 405

Properties . . . 405

Syntaxandoption. . . 406

Themoduluspartitioner. . . 407

Dataflowdiagram . . . 407

Properties . . . 408

Syntaxandoption. . . 408

Example . . . 408

Therandompartitioner . . . 409

Usingthepartitioner . . . 410

Dataflowdiagram . . . 410

Properties . . . 410

Syntax. . . 411

TherangePartitioner. . . 411

Considerationswhenusingrangepartitioning 412 Therangepartitioningalgorithm. . . 412

Specifyingpartitioningkeys . . . 412

Creatingarangemap . . . 413

Example:configuringandusingrange partitioner . . . 415

Usingthepartitioner . . . 415

Dataflowdiagram . . . 416

Properties . . . 416

Syntaxandoptions . . . 416

Writerangemapoperator. . . 418

Dataflowdiagram . . . 418

Properties . . . 419

Syntaxandoptions . . . 419

Usingthewriterangeoperator. . . 420

Themakerangemaputility . . . 421

Syntaxandoptions . . . 421

Usingthemakerangemaputility . . . 423

Theroundrobinpartitioner . . . 423

Usingthepartitioner . . . 424

Dataflowdiagram . . . 424

Properties . . . 424

Syntax. . . 424

Thesamepartitioner . . . 425

Usingthepartitioner . . . 425

Dataflowdiagram . . . 425

Properties . . . 426

Syntax. . . 426

Chapter

10.

The

collection

library

.

.

. 427

Theorderedcollector. . . 427

Orderedcollecting. . . 427

Properties . . . 428

Syntax. . . 428

Theroundrobincollector . . . 428

Roundrobincollecting . . . 429

Properties . . . 429

Syntax. . . 430

Thesortmergecollector . . . 430

Understandingthesortmergecollector . . . . 430

Dataflowdiagram . . . 430

Specifyingcollectingkeys . . . 431

Properties . . . 433

Syntaxandoptions . . . 433

Chapter

11.

The

restructure

library

435

Theaggtorecoperator . . . 435

Outputformats. . . 435

Properties . . . 436

Syntaxandoptions . . . 436

Example1:theaggtorecoperatorwithoutthe toplevelkeysoption . . . 438

Example2:theaggtorecoperatorwithmultiple keyoptions . . . 438

Example3:Theaggtorecoperatorwiththe toplevelkeysoption . . . 439

Thefield_exportoperator . . . 440

Dataflowdiagram . . . 440

Properties . . . 441

Syntaxandoptions . . . 441

Example . . . 442

Thefield_importoperator . . . 443

Dataflowdiagram . . . 443

Properties . . . 444

Syntaxandoptions . . . 444

Example . . . 445

Themakesubrecoperator . . . 447

Dataflowdiagram . . . 447

Properties . . . 448

Transferbehavior . . . 448

Subrecordlength . . . 448

Syntaxandoptions . . . 449

Themakevectoperator . . . 450

Dataflowdiagram . . . 450

Properties . . . 450

TransferBehavior . . . 451

Non-consecutivefields . . . 451

Syntaxandoption. . . 451

Example1:Themakevectoperator . . . 452

Example2:Themakevectoperatorwithmissing inputfields . . . 452

ThepromotesubrecOperator . . . 453

DataFlowDiagram . . . 453

Properties . . . 454

SyntaxandOption . . . 454

Example . . . 454

ThesplitsubrecOperator . . . 455

DataFlowDiagram . . . 455

Properties . . . 455

SyntaxandOption . . . 456

Example . . . 456

Thesplitvectoperator . . . 457

Dataflowdiagram . . . 457

Properties . . . 458

Syntaxandoption. . . 458

Example . . . 458

Thetagbatchoperator . . . 459

Taggedfieldsandoperatorlimitations . . . . 459

Operatoractionandtransferbehavior . . . . 460

Dataflowdiagram . . . 460

Properties . . . 461

(10)

Inputdatasetrequirements . . . 462

Syntaxandoptions . . . 462

Example1:Thetagbatchoperator,simple flatteningoftagcases . . . 464

Example2:Thetagbatchoperator,missingand duplicatecases . . . 465

Example3:Thetagbatchoperatorwithmultiple keys . . . 466

Thetagswitchoperator . . . 467

Dataflowdiagram . . . 467

Properties . . . 468

Inputandoutputinterfaceschemas . . . 468

Thecaseoption. . . 468

Usingtheoperator . . . 468

Syntaxandoptions . . . 469

Example1:Thetagswitchoperator,default behavior . . . 470

Example2:thetagswitchoperator,onecase chosen. . . 471

Chapter

12.

The

sorting

library

.

.

.

. 475

Thetsortoperator . . . 475

Configuringthetsortoperator. . . 477

Usingasorteddataset . . . 477

Specifyingsortingkeys . . . 478

Dataflowdiagram . . . 479

Properties . . . 479

Syntaxandoptions . . . 480

Example:usingasequentialtsortoperator . . 483

Example:usingaparalleltsortoperator . . . 484

Performingatotalsort . . . 485

Example:performingatotalsort . . . 487

Thepsortoperator . . . 489

Performingapartitionsort . . . 489

Configuringthepartitionsortoperator . . . . 491

Usingasorteddataset . . . 491

DataFlowDiagram . . . 493

Properties . . . 493

Syntaxandoptions . . . 494

Example:usingasequentialpartitionsort operator . . . 496

Example:usingaparallelpartitionsortoperator 497 Performingatotalsort . . . 498

Rangepartitioning. . . 500

Example:PerformingaTotalSort. . . 501

Chapter

13.

The

join

library

.

.

.

.

. 505

Dataflowdiagrams . . . 505

Properties . . . 506

Transferbehavior . . . 507

Inputdatasetrequirements . . . 507

Memoryuse. . . 507

Jobmonitorreporting . . . 507

Comparisonwithotheroperators. . . 507

Inputdatausedintheexamples . . . 508

innerjoinoperator . . . 508

Syntaxandoptions . . . 509

Example . . . 509

leftouterjoinoperator. . . 510

Syntaxandoptions . . . 510

Example . . . 511

rightouterjoinoperator . . . 512

Syntaxandoptions . . . 512

Example . . . 513

fullouterjoinoperator. . . 514

Syntaxandoptions . . . 514

Example . . . 515

Chapter

14.

The

ODBC

interface

library

.

.

.

.

.

.

.

.

.

.

.

.

.

. 517

AccessingODBCfromWebSphereDataStage. . . 517

NationalLanguageSupport . . . 517

ICUcharactersetoptions . . . 518

MappingbetweenODBCandICUcharactersets 518 Theodbcreadoperator . . . 518

Dataflowdiagram . . . 519

Properties . . . 519

Syntaxandoptions . . . 519

Operatoraction. . . 521

Columnnameconversion . . . 521

Datatypeconversion. . . 522

Externaldatasourcerecordsize . . . 522

Readingexternaldatasourcetables . . . 522

Joinoperations . . . 523

Example1:readinganexternaldatasourcetable andmodifyingafieldname . . . 523

Theodbcwriteoperator . . . 524

Writingtoamultibytedatabase . . . 525

Dataflowdiagram . . . 525

Properties . . . 525

Operatoraction. . . 525

Wheretheodbcwriteoperatorruns . . . 526

Syntaxandoptions . . . 528

Example1:writingtoanexistingexternaldata sourcetable . . . 530

Example2:creatinganexternaldatasourcetable 531 Example3:writingtoanexternaldatasource tableusingthemodifyoperator . . . 532

Otherfeatures . . . 533

Theodbcupsertoperator . . . 533

Dataflowdiagram . . . 533

Properties . . . 534

Operatoraction. . . 534

Syntaxandoptions . . . 535

Example . . . 536

Theodbclookupoperator . . . 537

Dataflowdiagram . . . 538

Properties . . . 539

Syntaxandoptions . . . 539

Example . . . 541

Chapter

15.

The

SAS

interface

library

543

UsingWebSphereDataStagetorunSAScode . . 543

WritingSASprograms . . . 543

UsingSASonsequentialandparallelsystems 544 PipelineparallelismandSAS . . . 545

ConfiguringyoursystemtousetheSAS interfaceoperators. . . 545

(11)

RepresentingSASandnon-SASDatain

DataStage . . . 547

GettinginputfromaSASdataset . . . 548

GettinginputfromaWebSphereDataStagedata setoraSASdataset . . . 549

Convertingbetweendatasettypes . . . 550

ConvertingSASdatatoWebSphereDataStagedata 551 aWebSphereDataStageexample . . . 552

ParallelizingSASsteps . . . 553

ExecutingPROCstepsinparallel. . . 559

SomepointstoconsiderinparallelizingSAS code . . . 563

UsingSASwithEuropeanlanguages . . . 564

UsingSAStodoETL. . . 565

TheSASinterfaceoperators . . . 566

SpecifyingacharactersetandSASmode . . . 566

ParallelSASdatasetsandSASInternational 568 Specifyinganoutputschema . . . 569

Controllingustringtruncation. . . 569

Generatingaproccontentsreport . . . 570

WebSphereDataStage-insertedpartitionandsort components . . . 570

Longnamesupport . . . 570

Environmentvariables . . . 571

Thesasinoperator. . . 572

Dataflowdiagram . . . 573

Properties . . . 573

Syntaxandoptions . . . 573

Thesasoperator . . . 577

Dataflowdiagram . . . 577

Properties . . . 577

Syntaxandoptions . . . 578

Thesasoutoperator . . . 582

Dataflowdiagram . . . 583

Properties . . . 583

Syntaxandoptions . . . 583

Thesascontentsoperator . . . 585

Dataflowdiagram . . . 585

Properties . . . 586

Syntaxandoptions . . . 586

Examplereports . . . 587

Chapter

16.

The

Oracle

interface

library

.

.

.

.

.

.

.

.

.

.

.

.

.

. 589

AccessingOraclefromWebSphereDataStage. . . 589

Changinglibrarypaths . . . 589

Preservingblanksinfields . . . 589

Handling#and$charactersinOraclecolumn names. . . 590

NationalLanguageSupport . . . 590

ICUcharactersetoptions . . . 590

MappingbetweenICUandOraclecharactersets 591 Theorareadoperator. . . 591

Dataflowdiagram . . . 591

Properties . . . 592

Operatoraction. . . 592

Wheretheorareadoperatorruns . . . 593

Columnnameconversion . . . 593

Datatypeconversion. . . 594

Oraclerecordsize . . . 594

Targetingthereadoperation . . . 594

Joinoperations . . . 595

Syntaxandoptions . . . 595

Example1:readinganOracletableand modifyingafieldname . . . 598

Example2:readingfromanOracletablein parallelwiththequeryoption. . . 599

Theorawriteoperator . . . 599

Writingtoamultibytedatabase . . . 599

Dataflowdiagram . . . 600

Properties . . . 600

Operatoraction. . . 600

Datatypeconversion. . . 602

Writemodes. . . 602

Matchedandunmatchedfields . . . 603

Syntaxandoptions . . . 604

Example1:writingtoanexistingOracletable 610 Example2:creatinganOracletable . . . 611

Example3:writingtoanOracletableusingthe modifyoperator . . . 611

Theoraupsertoperator . . . 612

Dataflowdiagram . . . 613

Properties . . . 613

OperatorAction . . . 613

Associatedenvironmentvariables . . . 614

Syntaxandoptions . . . 615

Example . . . 617

Theoralookupoperator . . . 617

Dataflowdiagram . . . 619

Properties . . . 619

Syntaxandoptions . . . 619

Example . . . 621

Chapter

17.

The

DB2

interface

library

623

ConfiguringWebSphereDataStageaccess . . . . 623

EstablishingaremoteconnectiontoaDB2 server . . . 624

Handling#and$charactersinDB2column names. . . 624

Usingthe-padcharoption . . . 625

RunningmultipleDB2interfaceoperatorsina singlestep . . . 625

NationalLanguageSupport . . . 626

Specifyingcharactersettings . . . 626

Preventingcharacter-setconversion . . . 626

Thedb2readoperator . . . 627

Dataflowdiagram . . . 627

Properties . . . 627

Operatoraction. . . 627

ConversionofaDB2resultsettoaWebSphere DataStagedataset. . . 628

Targetingthereadoperation . . . 629

Specifyingopenandclosecommands . . . . 630

Syntaxandoptions . . . 631

Example1:readingaDB2tablewiththetable option. . . 633

Example2:readingaDB2tablesequentially withthe-queryoption . . . 634

Example3:readingatableinparallelwiththe -queryoption . . . 634

Thedb2writeanddb2loadoperators . . . 635

(12)

Properties . . . 635

Actionsofthewriteoperators. . . 636

HowWebSphereDataStagewritesthetable:the defaultSQLINSERTstatement . . . 636

FieldconventionsinwriteoperationstoDB2 637 Datatypeconversion. . . 637

Writemodes. . . 638

Matchedandunmatchedfields . . . 639

Syntaxandoptions . . . 639

db2loadspecialcharacteristics. . . 646

Example1:AppendingDatatoanExistingDB2 Table . . . 647

Example2:writingdatatoaDB2tablein truncatemode . . . 648

Example3:handlingunmatchedWebSphere DataStagefieldsinaDB2writeoperation . . . 649

Example4:writingtoaDB2tablecontainingan unmatchedcolumn . . . 650

Thedb2upsertoperator . . . 651

Partitioningfordb2upsert . . . 651

Dataflowdiagram . . . 651

Properties . . . 651

Operatoraction. . . 652

Syntaxandoptions . . . 653

Thedb2partoperator. . . 655

Syntaxandoptions . . . 656

Example . . . 657

Thedb2lookupoperator. . . 658

Dataflowdiagram . . . 659

Properties . . . 659

Syntaxandoptions . . . 659

Example . . . 661

ConsiderationsforreadingandwritingDB2tables 662 Datatranslationanomalies . . . 662

Usinganodemap. . . 662

Chapter

18.

The

Informix

interface

library

.

.

.

.

.

.

.

.

.

.

.

.

.

. 665

ConfiguringtheINFORMIXuserenvironment . . 665

ReadoperatorsforInformix . . . 665

Dataflowdiagram . . . 666

Readoperatoraction . . . 666

Executionmode . . . 667

Columnnameconversion . . . 667

Datatypeconversion. . . 667

Example1:ReadingalldatafromanInformix table . . . 668

WriteoperatorsforInformix . . . 669

Dataflowdiagram . . . 670

Operatoraction. . . 670

Executionmode . . . 670

Columnnameconversion . . . 670

Datatypeconversion. . . 671

Writemodes. . . 671

MatchingWebSphereDataStagefieldswith columnsofInformixtable . . . 672

Limitations . . . 672

Example2:Appendingdatatoanexisting Informixtable . . . 673

Example3:writingdatatoanINFORMIXtable Example4:HandlingunmatchedWebSphere DataStagefieldsinanInformixwriteoperation . 675

Example5:WritingtoanINFORMIXtablewith anunmatchedcolumn . . . 676

hplreadoperator . . . 677

Specialoperatorfeatures . . . 677

Establishingaremoteconnectiontothehplread operator . . . 677

Dataflowdiagram . . . 678

Propertiesofthehplreadoperator . . . 678

Syntaxandoptions . . . 679

Example . . . 680

hplwriteoperatorforInformix. . . 680

Specialoperatorfeatures . . . 680

Dataflowdiagram . . . 681

Propertiesofthehplwriteoperator . . . 681

Syntaxandoptions . . . 681

Examples. . . 683

infxreadoperator . . . 683

DataFlowDiagram . . . 683

Properties . . . 684

SyntaxandOptions . . . 684

Example . . . 685

infxwriteoperator . . . 686

Dataflowdiagram . . . 686

Propertiesoftheinfxwriteoperator . . . 686

Syntaxandoptions . . . 686

Examples. . . 688

xpsreadoperator . . . 688

Dataflowdiagram . . . 689

Propertiesofthexpsreadoperator . . . 689

Syntaxandoptions . . . 689

Example . . . 691

xpswriteoperator . . . 691

Dataflowdiagram . . . 691

Propertiesofthexpswriteoperator . . . 692

Syntaxandoptions . . . 692

Examples. . . 693

Chapter

19.

The

Teradata

interface

library

.

.

.

.

.

.

.

.

.

.

.

.

.

. 695

Nationallanguagesupport . . . 695

Teradatadatabasecharactersets . . . 695

Japaneselanguagesupport . . . 695

SpecifyingaWebSphereDataStageustring characterset. . . 696

Terareadoperator . . . 696

Dataflowdiagram . . . 697

Properties . . . 697

Specifyingthequery . . . 697

Columnnameanddatatypeconversion . . . 698

terareadrestrictions . . . 699

SyntaxandOptions . . . 699

TerawriteOperator . . . 701

Dataflowdiagram . . . 702

Properties . . . 702

ColumnNameandDataTypeConversion. . . 702

Correctingloaderrors . . . 703

Writemodes. . . 704

(13)

Restrictions . . . 705

SyntaxandOptions . . . 705

Chapter

20.

The

Sybase

interface

library

.

.

.

.

.

.

.

.

.

.

.

.

.

. 709

AccessingSybasefromWebSphereDataStage. . . 709

Sybaseclientconfiguration . . . 709

NationalLanguageSupport . . . 710

TheasesybasereadeandsybasereadeOperators . . 710

Dataflowdiagram . . . 710

Properties . . . 710

OperatorAction . . . 711

WhereasesybasereadeandsybasereadeRun . . 711

Columnnameconversion . . . 711

Datatypeconversion. . . 711

Targetingthereadoperation . . . 712

JoinOperations. . . 713

SyntaxandOptions . . . 713

Example1:ReadingaSybaseTableand ModifyingaFieldName . . . 715

TheasesybasewriteandsybasewriteOperators . . 716

WritingtoaMultibyteDatabase . . . 716

Dataflowdiagram . . . 716

Properties . . . 717

OperatorAction . . . 717

WhereasesybasewriteandsybasewriteRun . . 717

DataconventionsonwriteoperationstoSybase 718 Datatypeconversion. . . 718

WriteModes . . . 719

Matchedandunmatchedfields . . . 719

SyntaxandOptions . . . 720

Example1:WritingtoanExistingSybaseTable 723 Example2:CreatingaSybaseTable . . . 724

Example3:WritingtoaSybaseTableUsingthe modifyOperator . . . 725

TheasesybaseupsertandsybaseupsertOperators 726 Dataflowdiagram . . . 727

Properties . . . 727

OperatorAction . . . 727

SyntaxandOptions . . . 728

Example . . . 730

TheasesybaselookupandsybaselookupOperators 731 Dataflowdiagram . . . 732

Properties . . . 732

SyntaxandOptions . . . 732

Example . . . 737

Chapter

21.

The

SQL

Server

interface

library

.

.

.

.

.

.

.

.

.

.

.

.

.

. 739

AccessingSQLServerfromWebSphereDataStage 739 UNIX . . . 739

Windows. . . 739

NationalLanguageSupport . . . 739

Thesqlsrvrreadoperator . . . 739

Dataflowdiagram . . . 740

Properties . . . 740

Operatoraction. . . 740

Wherethesqlsrvrreadoperatorruns . . . . 741

Columnnameconversion . . . 741

Datatypeconversion. . . 741

SQLServerrecordsize . . . 742

Targetingthereadoperation . . . 742

Joinoperations . . . 742

Syntaxandoptions . . . 743

Example1:ReadingaSQLServertableand modifyingafieldname . . . 744

Thesqlsrvrwriteoperator . . . 745

Writingtoamultibytedatabase . . . 745

Dataflowdiagram . . . 746

Properties . . . 746

Operatoraction. . . 746

Wherethesqlsrvrwriteoperatorruns . . . . 746

DataconventionsonwriteoperationstoSQL Server. . . 747

Writemodes. . . 748

Syntaxandoptions . . . 749

Example1:WritingtoanexistingSQLServer table . . . 751

Example2:CreatingaSQLServertable . . . 752

Example3:WritingtoaSQLServertableusing themodifyoperator . . . 753

Thesqlsrvrupsertoperator . . . 754

Dataflowdiagram . . . 754

Properties . . . 754

Operatoraction. . . 754

Syntaxandoptions . . . 755

Example . . . 757

Thesqlsrvrlookupoperator. . . 758

Dataflowdiagram . . . 759

Properties . . . 759

Syntaxandoptions . . . 760

Example . . . 762

Chapter

22.

The

iWay

interface

library

763

AccessingiWayfromWebSphereDataStage . . . 763

NationalLanguageSupport . . . 763

Theiwayreadoperator . . . 763

Dataflowdiagram . . . 764

Properties . . . 764

Operatoraction. . . 764

Datatypeconversion. . . 764

Syntaxandoptions . . . 765

Example:ReadingatableviaiWay . . . 767

Theiwaylookupoperator . . . 767

Dataflowdiagram . . . 769

Properties . . . 769

Syntaxandoptions . . . 769

Example:lookingupatableviaiWay . . . . 772

Chapter

23.

The

Netezza

Interface

Library

.

.

.

.

.

.

.

.

.

.

.

.

.

. 773

Netezzawriteoperator . . . 773

Netezzadataloadmethods. . . 773

nzloadmethod . . . 773

Externaltablemethod . . . 773

Writemodes. . . 773

Limitationsofwriteoperation. . . 774

Charactersetlimitations. . . 774

Badinputrecords . . . 774

(14)

Syntaxfornzwriteoperation . . . 775

Chapter

24.

The

Classic

Federation

interface

library

.

.

.

.

.

.

.

.

.

. 777

AccessingthefederateddatabasefromWebSphere DataStage . . . 777

Nationallanguagesupport . . . 777

Internationalcomponentsforunicodecharacter setparameter . . . 778

MappingbetweenfederatedandICUcharacter sets. . . 778

Readoperationswithclassicfedread. . . 778

classicfedreadproperties. . . 779

classicfedreadoperatorsyntax. . . 779

Columnnameconversion . . . 781

Datatypeconversion. . . 781

Readingexternaldatasourcetables . . . 782

Writeoperationswithclassicfedwrite . . . 782

Matchedandunmatchedfields . . . 783

classicfedwritesyntax . . . 783

Writingtomultibytedatabases . . . 786

Insertandupdateoperationswithclassicfedupsert 787 classicfedupsertProperties . . . 788

classicfedupsertsyntax . . . 788

Exampleofafederatedtablewhena classicfedupsertoperationisperformed . . . 789

LookupOperationswithclassicfedlookup . . . . 790

classicfedlookupproperties. . . 791

classicfedlookupsyntax . . . 792

Exampleofaclassicfedlookupoperation . . . 793

Chapter

25.

WebSphere

DataStage

development

kit

(job

control

interfaces)

.

.

.

.

.

.

.

.

.

.

.

.

. 795

WebSphereDataStagedevelopmentkit . . . 795

Thedsapi.hheaderfile . . . 795

Datastructures,resultdata,andthreads . . . 795

WritingWebSphereDataStageAPIprograms 796 BuildingaWebSphereDataStageAPI application . . . 796 Redistributingapplications . . . 797 APIfunctions . . . 797 DSAddEnvVar . . . 798 DSAddProject . . . 799 DSCloseJob . . . 800 DSCloseProject . . . 800 DSDeleteEnvVar . . . 801 DSDeleteProject . . . 801 DSFindFirstLogEntry. . . 802 DSFindNextLogEntry. . . 803 DSGetProjectList . . . 804 DSGetJobInfo . . . 805 DSGetLastError. . . 807 DSGetLastErrorMsg . . . 807 DSGetLinkInfo . . . 808 DSGetLogEntry. . . 809 DSGetNewestLogId . . . 810 DSGetParamInfo . . . 811 DSGetProjectInfo . . . 812 DSGetProjectList . . . 813 DSGetReposInfo . . . 813 DSGetReposUsage. . . 814 DSGetStageInfo. . . 815 DSGetProjectList . . . 817 DSListEnvVars . . . 818 DSListProjectProperties . . . 819 DSLockJob . . . 820 DSLogEvent. . . 820 DSMakeJobReport. . . 821 DSOpenJob . . . 822 DSOpenProject . . . 823 DSRunJob . . . 823 DSSetEnvVar . . . 824 DSSetGenerateOpMetaData. . . 825 DSSetJobLimit . . . 826 DSSetParam. . . 827 DSSetProjectProperty. . . 828 DSSetServerParams . . . 829 DSStopJob . . . 830 DSUnlockJob . . . 830 DSWaitForJob . . . 831 DataStructures. . . 831 DSCUSTINFO . . . 832 DSJOBINFO. . . 832 DSLINKINFO . . . 835 DSLOGDETAIL . . . 836 DSLOGEVENT. . . 837 DSPARAM . . . 837 DSPARAMINFO . . . 839 DSPROJECTINFO . . . 840 DSREPOSINFO. . . 841 DSREPOSUSAGE . . . 842 DSSTAGEINFO. . . 842 DSLINKINFO . . . 844 Errorcodes . . . 845

WebSphereDataStageBASICInterface . . . 850

DSAttachJob. . . 851 DSCheckRoutine . . . 852 DSDetachJob . . . 852 DSExecute . . . 853 DSGetCustInfo . . . 853 DSIPCPageProps . . . 854 DSGetJobInfo . . . 854 DSGetJobMetaBag. . . 857 DSGetLinkInfo . . . 857 DSGetLinkMetaData . . . 858 DSGetLogEntry. . . 859 DSGetLogSummary . . . 860 DSGetNewestLogId . . . 861 DSGetParamInfo . . . 861 DSGetProjectInfo . . . 863 DSGetStageInfo. . . 863 DSGetStageLinks . . . 865 DSGetStagesOfType . . . 866 DSGetStagesTypes. . . 866 DSGetVarInfo . . . 867 DSLogEvent. . . 867 DSLogFatal . . . 868 DSLogInfo . . . 868 DSLogToController . . . 869 DSLogWarn. . . 869

(15)

DSMakeJobReport. . . 869 DSMakeMsg. . . 870 DSPrepareJob . . . 871 DSRunJob . . . 871 DSSendMail. . . 872 DSSetGenerateOpMetaData. . . 873 DSSetJobLimit . . . 873 DSSetParam. . . 874 DSSetUserStatus . . . 874 DSStopJob . . . 875 DSTransformError. . . 875 DSTranslateCode . . . 876 DSWaitForFile . . . 876 DSWaitForJob . . . 877

JobStatusMacros . . . 877

Commandlineinterface. . . 878

CommandsforcontrollingWebSphere DataStagejobs . . . 878

CommandsforadministeringWebSphere DataStage . . . 888

Commandsforsearchingjobs . . . 892

XMLschemasandsamplestylesheets . . . 896

Accessing

information

about

the

product

.

.

.

.

.

.

.

.

.

.

.

.

.

. 897

ContactingIBM . . . 897

Accessibledocumentation . . . 897

Providingcommentsonthedocumentation . . . 897

Notices

and

trademarks

.

.

.

.

.

.

. 899

Notices . . . 899

Trademarks . . . 901

(16)
(17)

Chapter

1.

Introduction

Thismanual isintended fortheIBM®WebSphere®DataStage®user whohasmastered thebasics of paralleljobdesignandnowwantstoprogressfurther.

Themanualcoversthefollowingtopics:

v JobDesign Tips.Thischaptercontainsmiscellaneous tipsaboutdesigningparalleljobs, fromuseofthe

WebSphereDataStageDesignerinterface tohandlinglargevolumesofdata.

v ImprovingPerformance.Thischapterdescribes methodsbywhichyoucanevaluatetheperformanceof

yourparalleljobdesigns andcomeupwith strategiesforimprovingthem.

v LinkBuffering. Thischaptercontainsanin-depthdescriptionofwhenandhow WebSphereDataStage

buffersdatawithin ajob,andhow youcanchangetheautomaticsettingsifrequired.

v SpecifyingYourOwnParallelStages.ThischapterdescribetheinterfaceWebSphereDataStageprovides

fordefiningyourown paralleljobstagetypes.

v EnvironmentVariables.Thischapterlistalltheenvironmentvariablesthatareavailablefor affecting

thesetupandoperationofparalleljobs.

v Operators.Anumberofchaptersdescribetheoperatorsthatunderlietheparallel jobstages.

v

WebSphereDataStageDevelopmentKit(JobControlInterfaces).Thischapterliststhevariousinterfaces

thatenableyoutorunandcontrolWebSphereDataStagejobswithoutusingtheWebSphereDataStage Directorclient.

Terminology

Becauseofthetechnicalnatureof someof thedescriptionsinthis manual,itsometimestalksabout detailsof theengine thatdrivesparalleljobs.Thisinvolvestheuseoftermsthatmaybeunfamiliarto ordinaryparallel jobusers.

v Operators.TheseunderliethestagesinaWebSphereDataStagejob.Asinglestagemaycorrespondtoa

singleoperator, oranumberofoperators,dependingonthepropertiesyouhaveset, andwhetheryou havechosento partitionorcollect orsortdataontheinputlinktoa stage.Atcompilation,WebSphere DataStageevaluatesyour jobdesignandwillsometimesoptimizeoperatorsoutif theyarejudgedto besuperfluous,or insertotheroperatorsiftheyareneededforthelogicofthejob.

v OSH.ThisisthescriptinglanguageusedinternallybytheWebSphereDataStageparallelengine.

v Players.Playersaretheworkhorseprocessesina paralleljob.Thereisgenerallyaplayer foreach

operatoroneachnode.Playersarethechildrenofsectionleaders;thereisonesectionleaderper processingnode.Sectionleadersarestartedbytheconductor processrunningontheconductornode (theconductornodeisdefinedintheconfigurationfile).

(18)
(19)

Chapter

2.

Job

design

tips

Thischaptergivessomehintsandtipsforthegooddesignofparalleljobs.

WebSphere

DataStage

Designer

interface

Thefollowingaresometipsforsmooth useof theWebSphereDataStageDesignerwhenactuallylaying outyourjobonthecanvas.

v Tore-arrangeanexistingjobdesign,orinsert newstagetypesintoanexistingjobflow,first disconnect

thelinksfromthestagetobechanged,then thelinkswillretainanymetadataassociatedwith them. v ALookupstagecanonlyhaveoneinputstream,oneoutputstream, and,optionally,onerejectstream.

Dependingonthetypeoflookup,itcanhaveseveralreferencelinks.Tochangetheuseofparticular Lookuplinksinanexistingjobflow,disconnectthelinksfromtheLookupstage andthenright-click to changethelinktype, forexample,Stream toReference.

v TheCopystageisagoodplaceholderbetweenstagesifyouanticipatethatnew stagesorlogicwillbe

neededinthefuturewithoutdamagingexistingpropertiesandderivations.Wheninsertinga new stage,simplydragtheinputandoutputlinksfromtheCopyplaceholdertothenewstage.Unlessthe ForcepropertyissetintheCopystage,WebSphereDataStage optimizestheactualcopyoutat runtime.

Processing

large

volumes

of

data

Theabilitytoprocesslargevolumesofdatainashortperiodoftimedependsonall aspectsoftheflow andtheenvironmentbeingoptimizedformaximumthroughputand performance.Performancetuning andoptimizationareiterativeprocessesthatbeginwithjobdesignandunittests, proceedthrough

integrationandvolumetesting, andcontinuethroughouttheproductionlifecycleoftheapplication.Here aresomeperformancepointers:

v Whenwriting intermediateresultsthatwillonlybeshared betweenparallel jobs,alwayswriteto

persistentdatasets(usingData Setstages).Youshouldensurethatthedataispartitioned,andthatthe partitions,andsortorder,areretainedateverystage.AvoidformatconversionorserialI/O.

v DataSetstagesshouldbeusedtocreaterestartpointsintheevent thata joborsequenceneedstobe

rerun.But,becausedatasetsareplatformandconfigurationspecific,theyshouldnotbeusedfor long-termbackupand recoveryofsourcedata.

v Dependingonavailablesystem resources,itmaybepossibletooptimizeoverallprocessingtimeatrun

timebyallowingsmallerjobsto runconcurrently.However, caremustbe takentoplanforscenarios whensourcefilesarrivelaterthanexpected,orneedtobereprocessed intheeventofa failure. v Parallelconfigurationfilesallowthedegreeofparallelismand resourcesusedbyparalleljobstobe set

dynamicallyat runtime.Multipleconfigurationfilesshouldbe usedtooptimizeoverallthroughput andtomatchjobcharacteristicstoavailablehardwareresourcesindevelopment,test,and production modes.

Theproperconfigurationofscratchand resourcedisksandtheunderlyingfilesystemandphysical hardwarearchitecturecansignificantlyaffectoveralljobperformance.

WithinclusteredETLand databaseenvironments,resource-pool namingcanbe usedtolimit processing tospecificnodes, includingdatabase nodeswhenappropriate.

Modular

development

Youshouldaimtousemodulardevelopment techniquesinyourjobdesigns inordertomaximizethe reuseofparalleljobs andcomponentsandsave yourselftime.

(20)

v Usejobparametersinyourdesignandsupplyvaluesat runtime.Thisallowsa singlejobdesignto

processdifferentdataindifferentcircumstances,ratherthanproducingmultiple copiesofthesame job withslightlydifferentarguments.

v UsingjobparametersallowsyoutoexploittheWebSphereDataStageDirector’smultiple invocation

capability.Youcanrunseveralinvocationsofa jobatthesametimewithdifferentruntimearguments. v Useshared containerstosharecommonlogicacrossanumber ofjobs.Rememberthatshared

containersare insertedwhena jobiscompiled.Ifthesharedcontainer ischanged,thejobsusingitwill needrecompiling.

Designing

for

good

performance

Hereare sometipsfordesigninggoodperformance intoyourjobfromtheoutset.

Avoid

unnecessary

type

conversions.

Becarefultousepropersourcedatatypes,especiallyfromOracle.Youcansetthe

OSH_PRINT_SCHEMASenvironmentvariable toverifythatruntimeschemasmatchthejobdesign columndefinitions.

Ifyouare usingstage variablesona Transformerstage,ensurethattheirdatatypesmatchtheexpected resulttypes.

Use

Transformer

stages

sparingly

and

wisely

Donothavemultiplestageswherethefunctionalitycouldbeincorporatedintoa singlestage,anduse otherstagetypestoperformsimpletransformation operations(see″UsingTransformerStages″formore guidance).

Increase

sort

performance

where

possible

Carefuljobdesigncanimprovetheperformanceofsortoperations,bothinstandalone Sortstagesandin on-linksortsspecifiedintheInputspagePartitioningtabofotherstagetypes.See″Sorting Data″for guidance.

Remove

unneeded

columns

Removeunneededcolumns asearly aspossiblewithinthejobflow.Everyadditionalunusedcolumn requiresadditionalbuffermemory,whichcanimpactperformanceand makeeachrowtransfer fromone stagetothenextmoreexpensive.Ifpossible,whenreadingfromdatabases,useaselectlisttoreadjust thecolumnsrequired,ratherthantheentiretable.

Avoid

reading

from

sequential

files

using

the

Same

partitioning

method.

Unlessyouhavespecifiedmorethanonesourcefile,this willresult intheentirefilebeingreadintoa singlepartition, makingtheentiredownstreamflowrunsequentiallyunlessyouexplicitlyrepartition(see ″UsingSequential FileStages″formoretipsonusingSequential filestages).

Combining

data

Thetwomajor waysofcombiningdataina WebSphereDataStagejobare viaaLookupstageora Join stage.Howdoyoudecidewhichonetouse?

LookupandJoinstagesperformequivalentoperations:combiningtwoormoreinputdatasetsbasedon oneormorespecifiedkeys.Whenoneunsortedinputisverylargeor sortingisnotfeasible,Lookupis preferred.Whenallinputsare ofmanageablesize orarepre-sorted,Joinisthepreferredsolution.

(21)

TheLookupstageismostappropriatewhenthereferencedataforallLookupstagesinajobissmall enoughtofitintoavailable physicalmemory.Eachlookupreferencerequiresacontiguousblockof physicalmemory.TheLookupstagerequiresallbutthefirstinput(theprimaryinput)tofitintophysical memory.

IfthereferencetoalookupisdirectlyfromaDB2®or Oracletableandthenumberofinputrowsis significantlysmallerthanthereferencerows, 1:100ormore,a SparseLookupmaybe appropriate. IfperformanceissuesarisewhileusingLookup,consider usingtheJoinstage.TheJoin stagemustbe usedif thedatasetsarelargerthanavailable memoryresources.

Sorting

data

Lookatjobdesignsand trytoreorderthejobflowtocombineoperationsaroundthesamesortkeysif possible,andcoordinateyour sortingstrategy withyourhashingstrategy.Itissometimespossibleto rearrangetheorderofbusinesslogicwithina jobflowtoleveragethesamesortorder,partitioning,and groupings.

Ifdatahasalreadybeenpartitionedand sortedonaset ofkeycolumns,specifythe″don’tsort,

previouslysorted″option forthekeycolumns intheSortstage.Thisreducesthecostofsortingand takes greateradvantageof pipelineparallelism.

Whenwritingtoparalleldatasets, sortorderand partitioningarepreserved.Whenreadingfromthese datasets,trytomaintain thissortingif possiblebyusingSamepartitioningmethod.

Thestablesort optionismuchmoreexpensivethannon-stablesorts, andshouldonlybe usedif thereis aneedtomaintain roworderotherthanasneededtoperformthesort.

Theperformanceofindividual sortscanbe improvedbyincreasingthememoryusageperpartitionusing theRestrictMemoryUsage(MB)optionoftheSortstage.Thedefaultsettingis20MBperpartition. Note thatsortmemoryusagecanonlybespecifiedforstandalone Sortstages,itcannotbechangedforinline (onalink)sorts.

Default

and

explicit

type

conversions

Whenyouaremappingdatafromsourcetotarget youmayneedtoperform datatypeconversions.Some conversionshappenautomatically,andthesecantakeplaceacrosstheoutputmappingofanyparallel job stagethathasan inputandanoutputlink.Other conversionsneeda functiontoexplicitlyperform the conversion.Thesefunctionscanbe calledfromaModifystage oraTransformerstage,andarelisted in AppendixB ofWebSphereDataStageParallelJobDeveloperGuide.(Modifyisthepreferredstage forsuch conversions-see″UsingTransformerStages″.)

Thefollowingtableshowswhichconversionsareperformedautomaticallyand whichneedtobe explicitlyperformed.″d″ indicatesautomatic(default)conversion,″m″indicatesthatmanualconversion isrequired, ablanksquareindicatesthatconversionisnotpossible:

Source Field

DestinationField

int8 uint8 int16 uint16 int32 uint32 int64 uint64 sfloat dfloat

int8 d d d d d d d d dm

uint8 d d d d d d d d d

int16 dm d d d d d d d d

(22)

Source Field

DestinationField

int8 uint8 int16 uint16 int32 uint32 int64 uint64 sfloat dfloat

int32 dm d d d d d d d d uint32 d d d d d d d d d int64 dm d d d d d d d d uint64 d d d d d d d d d sfloat dm d d d d d d d d dfloat dm d d d d d d d d decimal dm d d d dm d dm dm d dm string dm d dm d d dm d d d dm ustring dm d dm d d dm d d d dm raw m m date m m m m time m m m time stamp m m m SourceField

decimal string ustring raw date time timestamp

int8 d dm dm m m m uint8 d d d int16 d dm dm uint16 d dm dm int32 d dm dm m m uint32 d m m m int64 d d d uint64 d d d sfloat d d d dfloat dm dm dm m m decimal dm dm string dm d m m m ustring dm d m m raw date m m m time m m dm timestamp m m m m

d=defaultconversion;m =modifyoperatorconversion; blank=noconversionneededorprovided Youshouldalso notethefollowingpoints abouttype conversion:

(23)

v TheenvironmentvariableAPT_STRING_PADCHARcanbe usedtochangethedefaultpadcharacter

fromanASCIINULL(0x0)toanothercharacter;forexample,anASCII space(Ox20) oraunicode space(U+0020).

v Asanalternatesolution,thePadStringfunction canbeusedtopada variable-length(Varchar) stringto

aspecified lengthusinga specifiedpadcharacter.NotethatPadStringdoesnotworkwithfixed-length (Char)stringtypes.Youmust firstconvertChartoVarcharbeforeusingPadString.

Using

Transformer

stages

Ingeneral,it isgoodpracticenottousemore Transformerstagesthanyouhaveto.Youshouldespecially avoidusingmultipleTransformerstageswherethelogiccanbe combinedintoa singlestage.

Itisoftenbetter touseotherstagetypesforcertaintypesofoperation: v Usea CopystageratherthanaTransformerforsimpleoperationssuchas:

– Providingajobdesignplaceholderonthecanvas.(Providedyoudo notset theForcepropertyto

TrueontheCopystage,thecopywillbe optimizedoutofthejobatruntime.) – Renamingcolumns.

– Droppingcolumns.

– Implicittype conversions(see″DefaultandExplicit TypeConversions″).

Notethat,ifruntimecolumn propagationisdisabled,youcanalsouseoutputmappingonastage torename,drop,or convertcolumns onastagethathasbothinputsand outputs.

v UsetheModifystageforexplicit typeconversion(see″DefaultandExplicit TypeConversions″)and

nullhandling.

v Wherecomplex,reusablelogicisrequired,orwhereexistingTransformer-stagebased jobflowsdonot

meetperformance requirements,consider buildingyour owncustomstage(seeChapter5,“Specifying yourown parallelstages,”onpage31,)

v Usea BASICTransformerstage whereyouwanttotakeadvantageofuser-definedfunctionsand

routines.

Using

Sequential

File

stages

Certainconsiderationsapplywhenreadingand writingfixed-lengthfieldsusingtheSequentialFilestage. v Ifreadingcolumnsthathaveaninherentlyvariable-widthtype (forexample,integer, decimal,or

varchar)then youshouldsettheFieldWidth propertytospecifytheactualfixed-widthoftheinput column.DothisbyselectingEdit Row...fromtheshortcutmenufora particularcolumninthe Columnstab, andspecifythewidthintheEditColumn MetaData dialogbox.

v Ifwritingfixed-widthcolumnswith typesthatare inherentlyvariable-width,then settheFieldWidth

propertyandthePadcharproperty intheEditColumnMetaDatadialogboxtomatchthewidthof theoutputcolumn.

Otherconsiderations areasfollows: v

Ifacolumn isnullable, youmust definethenullfieldvalue andlengthintheEditColumnMetaData

dialogbox.

v Becareful whenreadingdelimited,bounded-lengthvarcharcolumns(i.e.,varcharswiththelength

optionset).Ifthesourcefilehasfields whicharelongerthanthemaximumvarchar length,theseextra charactersaresilently discarded.

v AvoidreadingfromsequentialfilesusingtheSamepartitioning method.Unlessyouhavespecified

morethanonesourcefile,thiswillresult intheentirefilebeingread intoasinglepartition, makingthe entiredownstreamflowrunsequentiallyunlessyouexplicitlyrepartition.

(24)

Using

Database

stages

Thebestchoiceistouseconnectorstagesifavailable foryourdatabase.Thenext bestchoicearethe Enterprise’databasestagesasthesegive maximumparallelperformanceand featureswhencompared to ’plug-in’stages.TheEnterprisestagesare:

v DB2/UDBEnterprise v Informix ®Enterprise v OracleEnterprise v TeradataEnterprise v SQLServerEnterprise v SybaseEnterprise v ODBCEnterprise v iWayEnterprise v NetezzaEnterprise

YoushouldavoidgeneratingtargettablesinthedatabasefromyourWebSphereDataStagejob(i.e.,using theCreatewritemodeonthedatabasestage)unlesstheyareintended fortemporarystorageonly.Thisis becausethismethoddoesnotallowyouto,forexample,specifytargettablespace, andyoumay

inadvertentlyviolatedata-management policiesonthedatabase.If youwanttocreateatableona target databasefromwithin ajob,usetheOpencommandpropertyonthedatabasestagetoexplicitlycreatethe tableandallocatetablespace,oranyotheroptionsrequired.

TheOpencommandpropertyallowsyoutospecifyacommand(forexamplesomeSQL)thatwillbe executedbythedatabasebefore itprocessesanydatafromthestage.Thereisalso aClosepropertythat allowsyouto specifya commandtoexecuteafter thedatafromthestagehasbeenprocessed.(Notethat, whenusinguser-definedOpenandClosecommands,youmayneedtoexplicitly specifylockswhere appropriate.)

Database

sparse

lookup

vs.

join

Dataread byanydatabasestagecanserveasthereferenceinputtoaLookupstage.By default,this referencedataisloadedintomemorylikeanyotherreferencelink.

WhendirectlyconnectedasthereferencelinktoaLookupstage,bothDB2/UDBEnterpriseandOracle Enterprisestagesallowthelookup typetobechangedto Sparseand sendindividualSQLstatementsto thereferencedatabaseforeachincoming Lookuprow.SparseLookupisonlyavailablewhenthedatabase stageisdirectlyconnectedtothereferencelink, withnointermediate stages.

ItisimportanttonotethattheindividualSQLstatementsrequiredbya SparseLookupare anexpensive operationfromaperformanceperspective.Inmostcases,it isfastertouseaWebSphereDataStageJoin stagebetweentheinputandDB2referencedatathanitistoperforma SparseLookup.

Forscenarioswherethenumber ofinputrowsissignificantlysmaller(1:100ormore)thanthenumberof referencerowsinaDB2or Oracletable,aSparseLookupmaybeappropriate.

DB2

database

tips

Ifavailable,usetheDB2connestor. Otherwise,always usetheDB2Enterprisestageinpreferencetothe DB2/APIpluginstageforreadingfrom,writingto,andperforminglookups againsta DB2Enterprise ServerEditionwiththeDatabasePartitioningFeature(DBF).TheDB2 Enterprisestageisdesignedfor maximumperformanceand scaleabilityagainstverylargepartitionedDB2UNIX® databases.

TheDB2/APIpluginshouldonlybeusedtoreadfromand writetoDB2onother,non-UNIXplatforms. Youmight,forexample,useittoaccessmainframeeditionsthroughDB2Connect™.

(25)

Write

vs.

load

TheDB2EnterprisestageoffersthechoicebetweenSQLmethods(insert,update, upsert,delete)orfast loadermethodswhenwritingtoaDB2database.Thechoicebetweenthesemethodsdependsonthe requiredperformance,database logusage, andrecoverabilityconsiderationsasfollows:

v Thewritemethod(usinginsert,update,upsert,ordelete)communicatesdirectlywithDB2database

nodesto executeinstructionsin parallel.Alloperationsare loggedtotheDB2databaselog,andthe targettable(s) canbeaccessedbyotherusers.Timeandrow-basedcommitintervalsdeterminethe transactionsizeandavailabilityofnew rowstootherapplications.

v

TheloadmethodrequiresthattheuserrunningthejobhasDBADMprivilegeonthetargetdatabase.

Duringaloadoperationanexclusive lockisplacedontheentireDB2tablespaceintowhichthedatais beingloaded,andsothistablespacecannotbe accessedbyanyoneelsewhiletheloadistakingplace. Theloadisalsonon-recoverable:iftheloadoperationisterminatedbeforeitiscompleted,thecontents ofthetableareunusable andthetablespaceisleft intheloadpending state.Ifthis happens,the WebSphereDataStagejobmustbere-run withthestagesetto truncatemodetocleartheloadpending state.

Oracle

database

tips

Whendesigning jobsthatuseOracle sourcesortargets, notethattheparallelengine willuseits interpretationoftheOraclemetadata(forexample,exactdatatypes)based oninterrogationof Oracle, overridingwhatyoumayhavespecifiedintheColumnstab. Forthis reasonitisbesttoimportyour OracletabledefinitionsusingtheImportOrchestrateSchemaDefinitionscommandfromhe WebSphereDataStageDesigner. ChoosetheDatabasetableoption andfollowtheinstructionsfromthe wizard.

Loading

and

indexes

WhenyouusetheLoadwritemethodinanOracle Enterprisestage,youareusingtheParallel Direct Pathloadmethod.Ifyouwanttousethismethodtowritetablesthathaveindexesonthem(including indexesautomaticallygeneratedbyprimarykeyconstraints),youmust specifytheIndexModeproperty (youcansetit toMaintenanceorRebuild).Analternativeistosettheenvironmentvariable

APT_ORACLE_LOAD_OPTIONSto″OPTIONS(DIRECT=TRUE,PARALLEL=FALSE).Thisallowsthe loadingofindexed tableswithoutindexmaintenance,buttheloadisperformedsequentially.

YoucanusetheupsertwritemethodtoinsertrowsintoanOracle tablewithoutbypassingindexesor constraints.InordertoautomaticallygeneratetheSQLneeded,set theUpsertModepropertyto Auto-generatedand identifythekeycolumn(s) ontheColumns tabbyselectingtheKeycheckboxes.

Teradata

Database

Tips

YoucanusetheAdditionalConnections OptionspropertyintheTeradataEnterprisestage(whichisa dependentofDBOptionsMode)tospecifydetailsaboutthenumberofconnectionstoTeradata.The possiblevaluesofthisare:

v sessionsperplayer.ThisdeterminesthenumberofconnectionseachplayerinthejobhastoTeradata.

Thenumbershouldbeselectedsuchthat:

(sessions per player * number of nodes * players per node) = total requested sessions

Thedefaultvalueis2.Setting thistoolowonalargesystem canresult insomanyplayersthatthejob failsdue toinsufficientresources.

v requestedsessions.Thisisanumberbetween1 andthenumberofvprocsin thedatabase.Thedefault

(26)
(27)

Chapter

3.

Improving

performance

Thischapterisintendedtohelpresolve anyperformanceproblems.Itassumesthatbasicstepstoassure performancehavebeentaken: asuitableconfigurationfilehasbeenset up(seeinWebSphereDataStage ParallelJobDeveloperGuide),reasonableswapspaceconfiguredetc.andthatyouhavefollowedthedesign guidelineslaiddowninChapter2,“Jobdesigntips,”onpage3.

Understanding

a

flow

Inordertoresolveanyperformance issuesit isessentialtohaveanunderstandingoftheflowof WebSphereDataStagejobs.

Score

dumps

Tohelpunderstanda jobflowwesuggest youtakea scoredump.Dothis bysettingthe

APT_DUMP_SCOREenvironment variabletrueandrunningthejob(APT_DUMP_SCOREcanbeset in theAdministratorclient,undertheParallelReportingranch).Thiscausesa reporttobe produced whichshowstheoperators,processesanddatasetsinthejob.Thereportincludesinformationabout: v Whereandhowdataisrepartitioned.

v WhetherWebSphereDataStagehadinsertedextraoperatorsintheflow.

v Thedegreeofparallelismeachoperatorrunswith,and onwhichnodes.

v Informationaboutwhere dataisbuffered.

Thedumpscoreinformationisincludedinthejoblogwhenyourunajob.

Thescoredumpisparticularlyusefulinshowing youwhereWebSphereDataStageisinsertingadditional componentsinthejobflow. InparticularWebSphereDataStagewilladdpartitionand sortoperators wherethelogicof thejobdemands it.Sorts inparticularcanbedetrimentaltoperformance andascore dumpcanhelpyoutodetectsuperfluousoperatorsandamendthejobdesigntoremovethem.

Example

score

dump

Thefollowingscoredumpshowsa flowwithasingle dataset,whichhasa hashpartitioner,partitioning onkey″a″.Itshowsthreeoperators:generator, tsort,and peek.Tsortand peekare ″combined″,indicating thattheyhavebeenoptimizedintothesame process.All theoperatorsinthis flowarerunningonone node.

##I TFSC 004000 14:51:50(000) <main_program> This step has 1 data set:

ds0: {op0[1p] (sequential generator)

eOther(APT_HashPartitioner { key={ value=a }

})->eCollectAny

op1[2p] (parallel APT_CombinedOperatorController:tsort)}

It has 2 operators:

op0[1p] {(sequential generator)

on nodes (

lemond.torrent.com[op0,p0] )}

op1[2p] {(parallel APT_CombinedOperatorController:

(tsort) (peek) ) on nodes ( lemond.torrent.com[op1,p0] lemond.torrent.com[op1,p1] )}

References

Related documents

Using information learned from rending about Ancient Greece, have students write about the following prompt:.. &#34;Think about the jobs Ancient Greeks had and how much time

Crewe Wednesday 12.45-1.45pm Jubilee House, Crewe Free of Charge This class is ideal for anyone who is at risk of falling, has fallen or has poor balance and

organisasjonslæring, arbeidsplasslæring, uformell og formell læring, læring gjennom praksis, sosial praksis og så videre vil derfor være nyttige når man skal foreta en studie

- Intelligent Speed Adaptation &amp; Perspectives on Usage Based Insurance?. Niels Agerholm Assistant

Through early nurturing relationships that support the development of trust (Erikson, 1950) and the above named characteristics of high quality care, teachers/caregivers

The orcs continue to smash my team leaving me with few options in my turns other than standing my players up and trying to reposition them to achieve something later in the

Quantitative
Ecology
and
Resource
Management
 Doctoral
Degree
–
Timeline
and
Procedures
 
 Recommended
Timeline:
 Date
 Form
 Submitt ed*



Villar vs TIP (several students were barred from re-enrollment for participating in demonstrations) while the Court upheld the academic freedom of institutions of