• No results found

WebSphere Information

In document IBM Information Server Blue Book (Page 49-65)

Youusedataprofilingandanalysistounderstandyourdataand ensurethatit suitstheintegrationtask.WebSphereInformationAnalyzer isa criticalcomponent of IBMInformationServerthatprofilesandanalyzes datasothatyoucandeliver trusted informationtoyourusers.

WebSphereInformationAnalyzercanautomaticallyscansamplesofyourdatato determinetheirqualityandstructure.Thisanalysisaids youinunderstandingthe inputs toyourintegrationprocess,rangingfromindividualfields tohigh-leveldata entities.Informationanalysisalsoenablesyoutocorrectproblems withstructureor validity beforetheyaffectyour project.

Inmanysituations,analysis mustaddressdata,values,andrulesthatare best understoodbybusinessusers.Particularlyforcomprehensiveenterpriseresource planning, customerrelationshipmanagement,orsupplychainmanagement packages,validatingdataagainstthisbusinessknowledgeisa criticalstep. The businessknowledge,inturn,formsthebasisforongoingmonitoringandauditing of datatoensurevalidity,accuracy,and compliancewith internalstandardsand industryregulations.

Whileanalysisofsourcedataisa criticalfirst stepinanyintegrationproject,you must continuallymonitorthequalityofthedata.WebSphereInformationAnalyzer enablesyoutotreatprofilingand analysisasanongoingprocessandcreate businessmetricsthatyoucanrunand trackovertime.

WebSphere Information Analyzer capabilities

IBM WebSphereInformationAnalyzerautomatesthetaskofsourcedataanalysis byexpeditingcomprehensivedataprofilingand minimizingoverallcostsand resources forcriticaldataintegrationprojects.

WebSphereInformationAnalyzerrepresentsthenextgenerationindataanalysis tools,whicharecharacterizedbytheseattributes:

End-to-end dataprofilingandcontentanalysis

Provides standarddataprofilingfeaturesandqualitycontrols.The repositoryholdsthedataanalysisresultsandprojectmetadatasuchas project-level androle-levelsecurityandfunctionadministration.

Business-orientedapproach

Withitstask-baseduser interface,aidsbusinessusersin easilyreviewing dataforanomalies andchangesovertime,and provideskeyfunctional and designinformationtodevelopers.

Adaptable, flexible,andscalablearchitecture

Handleshighdatavolumesthroughcommonparallelprocessing

technology,andleverages commonservicessuchasconnectivitytoaccessa widerangeofdatasourcesand targets.

Scenarios for information analysis

Thefollowingscenariosshowhow WebSphereInformationAnalyzerhelps organizations understandtheirdatatofacilitateintegrationprojects.

Food distribution:Infrastructurerationalization

AleadingU.S.fooddistributorhad morethan80separatemainframe,SAP, and JDEdwards applicationssupportingglobalproduction,distribution, and CRMoperations.Thisinfrastructurerationalizationprojectincluded customer relationshipmanagement,order-to-cash,purchase-to-pay,human resources,finance,manufacturing, andsupplychainplanning.The

companyneededtomovedatafromthese sourcesystemstoa singletarget system.

ThecompanyusesWebSphereInformationAnalyzerto profileitssource systemsandcreatemasterdataaroundkeybusinessdimensions,including customer,vendor, item(finishedgoods),andmaterial(rawmaterials).They plantomigratedataintoa singlemasterSAPenvironmentand a

companion SAPBWreportingplatform.

Financial services:Data qualityassessment

Amajorbrokerage firmhad becomeinefficientbysupportingdozensof businessgroupswith theirownapplicationsandITgroups. Costswere excessive,regulatory compliancedifficult, anditwasimpractical totarget low-margin, middle-incomeinvestors.Whenthefederalgovernment mandated T+1,aregulation thatchangedindustrystandardpractices,the firm hadtofinda waytoreducethetimetoprocess atradefrom3.5days to1 day,areductionof71.4percent.

Tomeet thefederal mandate,thebrokerage houseusesWebSphere InformationAnalyzerto inventorytheirdata,identifyintegrationpoints, removedataredundancies,and documentdisparitiesbetweenapplications.

Thefirm nowhasarepeatableandauditablemethodologythatleverages automated dataanalysis.Byensuringthatalltransactionsareprocessed quickly anduniformly,thecompanyisbetterable totrackandrespondto riskresultingfromitsclients’and itsowninvestments.

Transportationservices:Dataqualitymonitoring

Atransportationserviceproviderdevelopssystemsthatenableits extensivenetworkofindependentowner-operatorstocompete intoday’s toughmarket.Theowner-operatorswere exposedtocompetitionbecause theycould notreceivedataquickly. Executiveshad littleconfidenceinthe datathattheyreceived. Productivitywasslowedbyexcessive time

reviewingmanualintervention andreconcilingdatafrommultiplesources.

WebSphereInformationAnalyzerallowstheowner-operatorstobetter understandandanalyzetheirlegacydata.Itallowsthemtoquickly increasetheaccuracyoftheirbusinessintelligence reportsandrestore executiveconfidenceintheircompanydata.Movingforward,they implemented adataqualitysolutiontocleansetheircustomerdataand spot trendsovertime, furtherincreasingtheirconfidenceinthedata.

WebSphere Information Analyzer in a business context

Afterobtainingprojectrequirements, aprojectmanagerinitiatestheanalysisphase of dataintegrationtounderstandsourcesystemsanddesigntargetsystems. Too often,analysiscanbe alaborious,manualprocess thatreliesonout-of-date(or nonexistent)sourcedocumentationortheknowledgeof thepeoplewho maintain thesourcesystems. Butsourcesystemanalysisiscrucialtounderstandingwhat dataisavailable anditscurrentstate.

Figure26showstheroleofanalysis inIBMInformationServer.WebSphere InformationAnalyzer playsa keyrole inpreparingdataforintegrationby

analyzing businessinformationtoassurethatitisaccurate,consistent,timely,and coherent.

Profilingandanalysis

Examinesdatatounderstanditsfrequency,dependency,andredundancy and validatedefinedschemaanddefinitions.

Data monitoringandtrending

Uncoversdataqualityissuesinthesourcesystem asdataisextractedand loaded intotargetsystems. Validationruleshelpyoucreatebusiness metricsthatyoucanrunandtrackovertime.

Facilitating integration

Usestables,columns,probable keys,and interrelationshipstohelp with integrationdesigndecisions.

Data analysishelpsyouseethecontentandstructure ofdatabeforeyoustarta projectand continuestoprovideusefulinsightaspartoftheintegrationprocess.

Thefollowingdatamanagement tasksusedataanalysis:

Data integrationor migration

Data integrationormigrationprojects(includingdatacleansingand matching) movedatafromoneormoresourcesystemstoone ormore target systems.Dataprofilingsupportstheseprojects inthreecritical stages:

1. Assessingsourcestosupportordefinebusinessrequirements

2. Designingreferencetablesandmappingsfromsourcetotarget systems 3. Developingand runningteststovalidatesuccessfulintegrationor

migrationof dataintotargetsystems Data qualityassessmentandmonitoring

Evaluates qualityintargetedstaticdatasourcesalongmultipledimensions includingcompleteness,validity(ofvalues),accuracy,consistency,

Figure26.WebSphereInformationAnalyzerhelpsusersunderstandtheirdata.

timeliness,andrelevance.Dataqualitymonitoringrequiresongoing assessmentofdatasources. InformationAnalyzersupportsthese projects byautomatingmanyofthese dimensionsforin-depthsnapshotsovertime.

Asset rationalization

Looksfor waystocutcoststhatareassociatedwith existingdata transformation processes(forexample,processor cycles)ordatastorage.

Asset rationalizationdoesnotinvolvemovingdata,butreviewschanges in dataovertime.WebSphereInformationAnalyzersupportsasset

rationalizationduring theinitial assessmentofsourcecontentandstructure and duringdevelopment andexecutionofdatamonitorstounderstand trendsand utilizationovertime.

Verifyingexternalsourcesforintegration

Validatesthearrivalofneworperiodicexternalsources toensurethat those sourcesstillsupport thedataintegrationprocessesthatusethem.

Thisprocesslooks atstaticdatasources alongmultipledimensions

includingstructuralconformity topriorinstances,completeness,validity of values,validityofformats,andlevelofduplication.WebSphere

InformationAnalyzerautomatesmanyofthesedimensionsovertime.

A closer look at WebSphere Information Analyzer

WebSphereInformationAnalyzerisanintegratedtoolforproviding

comprehensiveenterprise-leveldataanalysis.Itfeaturesdataprofiling,analysis, and designandsupportsongoingdataqualitymonitoring.

The WebSphereInformationAnalyzeruser interfaceperformsavarietyofdata analysis tasks,asFigure27shows.

Figure27.Dashboardviewofaprojectprovideshigh-leveltrendsandmetrics

WebSphereInformationAnalyzercanbeusedbydataanalysts,subjectmatter experts,businessanalysts,integrationanalysts,andbusinessendusers.Ithasthe followingcharacteristics:

Business-driven

Provides end-to-enddatalifecyclemanagement(fromdataaccessand analysis throughdatamonitoring)toreducethetimeandcosttodiscover, evaluate,correct,andvalidatedataacrosstheenterprise.

Dynamic

Drawsonasingle activerepositoryformetadatatogive youa common platformview.

Scalable

Leveragesa high-volume,scalable,parallelprocessingdesigntoprovide highperformance analysisoflargedatasources.

Extensible

Enables youtoreview andacceptdataformatsanddatavaluesasbusiness needschange.

Serviceoriented

LeveragesIBM InformationServer’sservice-orientedarchitecturetoaccess connectivity, logging,andsecurityservices,allowingaccesstoawide range ofdatasources (relational,mainframe,and sequentialfiles)andthesharing ofanalytical resultswith otherIBM InformationServer components.

Robustanalytics

Helps youunderstandembeddedorhiddeninformationaboutcontent, quality,andstructure.

Designintegration

Improvestheexchangeofinformationfrombusinessanddataanalyststo developersbygeneratingvalidationreferencedataand mappingdata, whichreduceserrors.

Robustreporting

Provides acustomizableinterface forcommonreportingservices,which enablesbetterdecisionmakingthrough visualrepresentationofanalysis, trends,andmetrics.

IBM WebSphereAuditStageisa suitecomponentthataugmentsWebSphere InformationAnalyzer byhelpingyoumanage thedefinitionand analysisof businessrules.WebSphereAuditStageexaminessourceand targetdata,analyzing acrosscolumns forvalidvalue combinations,appropriatedataranges,accurate computations,andcorrectif-then-elseevaluations.WebSphereAuditStage establishes metricstoweightthese businessrulesandstoresahistoryofthese analyses andmetricsthatshowtrendsindataquality.

Where WebSphere Information Analyzer fits in the IBM Information Server architecture

WebSphereInformationAnalyzerusesaservice-orientedarchitecturetostructure dataanalysis tasksthatareusedbymanynew enterprisesystem architectures.

WebSphereInformationAnalyzerissupportedbyarangeofshared servicesand reusesseveralIBMInformationServercomponents.

BecauseWebSphereInformationAnalyzerhasmultiplediscreteservices,it hasthe flexibility toconfiguresystemstomatchvariedcustomerenvironmentsandtiered architectures. Figure28showshow WebSphereInformationAnalyzerinteracts with thefollowingelementsofIBM InformationServer:

IBMInformationServerconsole

Provides agraphicaluser interfacetoaccessWebSphereInformation Analyzerfunctionsand organizedataanalysisresults.

Common services

Providegeneralservices thatWebSphereInformationAnalyzerusessuch asloggingandsecurity.Metadataservices provideaccess,query,and analysis functionsforusers.ManyservicesthatareofferedbyWebSphere InformationAnalyzerare specifictoitsdomainofenterprisedataanalysis suchascolumnanalysis,primarykeyanalysisand review,and cross-table analysis.

Common repository

Holds metadatathatisshared bymultiple projects.WebSphereInformation

Figure28.IBMInformationServerarchitecture

Analyzerorganizesdatafromdatabases,files,and othersources intoa hierarchyofobjects.ResultsthataregeneratedbyWebSphereInformation Analyzercanbe sharedwith otherclientprograms suchastheWebSphere DataStageand WebSphereQualityStageDesignerthroughtheirrespective servicelayers.

Common parallelprocessingengine

Addresseshighthroughput requirementsthatare inherentinanalyzing largequantitiesofsourcedatabytakingadvantageofparallelismand pipelining.

Common connectors

Provideconnectivity toalltheimportantexternalresources andaccessto thecommonrepositoryfromtheprocessingengine.WebSphere

InformationAnalyzerusesthese connectionservicesinthreefundamental ways:

v Importingmetadata

v Performingbase analysisonsourcedata v Providing drill-downand querycapabilities

WebSphere Information Analyzer tasks

TheWebSphereInformationAnalyzeruser interfacepresentsanintuitiveset of controlsthatare designedforintegrationdevelopmentworkflow.

TheWebSphereInformationAnalyzeruser interfaceaidsyouinorganizingdata analysis workintoprojects.Thetop-levelview iscalledaDashboardbecauseit reports asummaryof yourkeyprojectanddatametrics,bothina graphicalformat and inastatusgridformat.

Thehigh-levelstatusviewinFigure29onpage50summarizesthedatasources, includingtheirtablesandcolumns,thatwereanalyzed andreviewedsothat managersandanalystscanquickly determinethestatusofwork.Theprojectview of theGlobalCoprojectshowsa high-levelsummaryofcolumnanalysisand an aggregatedsummaryofanomaliesfound,alongwiththeGettingStartedpane.

Whilemanydataanalysistoolsaredesigned torunina strictsequenceand generateone-timestaticviewsofthedata,WebSphereInformationAnalyzer enablesyoutoperform selectintegrationtasks asrequiredorcombinethemintoa largerintegrationflow. Thesetasks fallintothreecategories:

Profilingandanalysis

Provides completeanalysis ofsourcesystemsand targetsystems,and assessesthestructure,content,andqualityofdata,whetheratthecolumn level, thecross-column level,thetableorfilelevel,thecross-tablelevel,or thecross-sourcelevel.Thistaskreportsonvariousaspectsofdata

includingclassification,attributes,formatting,frequencyvalues, distributions, completeness,andvalidity.

Data monitoringandtrending

Helps youassessdatacompletenessand validity,dataformats,and valid-value combinations.Thistaskalso evaluatesnewresultsagainst established benchmarks.By usingtheWebSphereAuditStagecomponent, businessusersdevelopadditionaldatarulestoassessandmeasurecontent and qualityovertime.Rulescanbe simplecolumnmeasures that

incorporate knowledgefromdataprofilingorcomplex conditionsthattest multiple fields.Validationrulesassistincreatingbusinessmetricsthatyou canrunandtrackovertime.

Facilitating integration

Provides sharedanalyticalinformation,validationand mappingtable generation,andtestingofdatatransformationsthroughcross-comparison ofdomains beforeand afterprocessing.

Data profiling and analysis

WebSphereInformationAnalyzerprovidesextensivecapabilitiesforprofiling sourcedata.Thefourmaindataprofilingfunctionsarecolumnanalysis,primary keyanalysis,foreign keyanalysis,andcross-domainanalysis.

Figure29.InformationAnalyzerprojectview

Column analysis

Column analysisgeneratesa fullfrequencydistribution andexaminesallvaluesfor a columntoinferitsdefinitionandpropertiessuchasdomainvalues,statistical measures, andminimumandmaximumvalues.Eachcolumnofeverysourcetable isexaminedindetail.Thefollowingpropertiesareobservedandrecorded:

v Countofdistinctvaluesorcardinality v

Countofemptyvalues,null values,and non-nulloremptyvalues v Minimum,maximum,and averagenumericvalues

v Basicdatatypes,includingdifferentdate-time formats v Minimum,maximum,and averagelength

v Precisionandscalefornumericvalues

WebSphereInformationAnalyzeralsoenablesyouto drilldownonspecific columns todefineuniquequalitycontrolmeasures foreachcolumn.Figure30 showsa closerlookatresultsforatablenamedGlobalCo_Ord_Dtl.Atthetopisa summaryanalysisoftheentiretable.Beneaththesummaryisdetail foreach column thatshows standarddataprofilingresults,includingdataclassification, cardinality, andproperties. Whenyouselecta column,additional tasksthatare relevanttothatlevelofanalysisbecomeavailable.

Another functionofcolumnanalysis isdomainanalysis.Adomainisavalidsetof valuesforanattribute.Domainanalysisdeterminesthedatadomainvaluesfor anydataelement.By usinga frequencydistribution,youcanfacilitatetestingby providing alistofallthevaluesinacolumnand thenumberofoccurrencesof each.Domainanalysischeckswhetheradataelementcorrespondstoa valueina

Figure30.Columnanalysisexampledataview

database tableorfile.Figure31showsa frequencydistributionchartthathelps find anomaliesintheQtyordcolumn.

The barchartshowsdatavaluesonthey-axisand thefrequency ofthosevalues onthex-axis.Thisdetailpoints outdefaultandinvalidvaluesbasedonspecific selection,ranges,orreferencesources,andaids youiniteratively buildingquality metrics.

Whenyouarevalidatingfree-form text,analyzingandunderstandingtheextentof thequalityissuesisoftenverydifficult.WebSphereInformationAnalyzercan showeachdatapatternofthetextforamuchmoredetailedqualityinvestigation.

Ithelpswiththefollowingtasks:

v Uncoveringtrends,potentialanomalies,metadatadiscrepancies,and undocumentedbusinesspractices

v Identifyinginvalidor defaultformatsandtheirunderlyingvalues

v Verifyingthereliabilityoffields thatareproposedasmatching criteriaforinput toWebSphereQualityStageandWebSphereDataStage

Primary key analysis

The primarykeyof arelationaltableisauniqueidentifierthatadatabaseusesto accessa specificrow.Primarykeyanalysisidentifies allcandidatekeysforoneor more tablesandhelpsyoutestacolumnorcombinationof columnstodetermine if itisacandidateforbecomingtheprimarykey.Figure32onpage53showsa single-column analysis.

Figure31.Columnanalysisexamplegraphicalview

Theanalysis presentsallofthecolumns andthepotentialprimarykeycandidates.

Aduplicatecheck validatestheuseofsuchkeys.Youselecttheprimarykey candidatebased onitsprobabilityforuniqueness andyourbusinessknowledgeof thedatainvolved.Ifyouselectamulti-datacolumnastheprimarykey,thesystem willdevelopa frequencydistributionfortheconcatenatedvalues.

Foreign key analysis

Foreignkeyanalysisexaminescontent andrelationshipsacrosstables.Thisanalysis helpsidentifyforeignkeys,check theirintegrity,andcheckthereferentialintegrity betweentheprimarykeyandforeign keys.Forexample,inaBillof Materials structure, theparent-childrelationshipsamongassembliesandsubassemblies would requireyouto identifyrelationshipsbetweenforeignkeysand primarykeys and validatetheirreferentialintegrity.

Acolumnqualifies tobea foreignkeycandidateifthemajority(forexample,98 percent orhigher)ofitsfrequencydistributionvaluesmatchthefrequency

distribution valuesofaprimarykeycolumn.AsFigure33onpage54shows,after youselecta foreignkey,thesystemperformsabidirectionaltest(foreignkeyto primary key,primary keytoforeign key)ofeachforeignkey’sreferentialintegrity and identifiesthenumberofreferentialintegrityviolationsand″orphan″ values (keys thatdo notmatch).

Figure32.Primarykeyanalysis

Cross-domain analysis

Cross-domain analysisexaminescontentand relationshipsacrosstables.This analysis identifiesoverlapsinvaluesbetweencolumns,andanyredundancyof datawithin orbetweentables.Forexample,countrycodesmight existintwo differentcustomer tablesandyouwanttomaintain aconsistentstandardforthese codes. Cross-domainanalysisenablesyoutodirectlycomparethesecodevalues.

WebSphereInformationAnalyzerusestheresultsofcolumnanalysisforeachset of columns thatyouwanttocompare.Theexistenceofa commondomainmight indicatea relationshipbetweentablesorthepresenceofredundantfields.

Cross-domain analysiscancompareanynumberofdomains withinoracross sources.

Data monitoring and trending

With baselineanalysis, WebSphereInformationAnalyzercompareschangestodata fromone previouscolumnanalysis(a baseline)toa new,currentcolumnanalysis.

Figure33.Foreignkeyanalysis

Figure34showstheresultsof comparingtwodistinctanalysesonthe

Figure34showstheresultsof comparingtwodistinctanalysesonthe

In document IBM Information Server Blue Book (Page 49-65)

Related documents