Ab Initio FAQ's Part1.doc

(1)

1) What do you mean by specifying layouts in Ab Initio?

Ans: The Layout is something which determines whether a component runs in a serial or parallel mode. If you specify the path as serial directory the component runs as single stream and if you specify the path as a multi file directory the component runs in parallel mode. Also the path which you specify there is serves as the working directory of the graph where all intermediate files are stored

Layout can be specified as 1) propagate from neighbors 2) URL

3) custom 4) host

Before you can run an Ab Initio graph, you must specify layouts to describe the following to the Co>Operating System:

 The location of files

 The number and locations of the partitions of multifiles

 The number of, and the locations in which, the partitions of program components execute

Layout is one of the following:

 A URL that specifies the location of a serial file

 A URL that specifies the location of the control partition of a multifile  A list of URLs that specifies the locations of:

o The partitions of an ad hoc multifile

o The working directories of a program component

2) What is skew?

Ans: Skew tells about the unbalanced behavior of data partitioning. You can do performance tuning by controlling Skew, Max-Core etc and there are so many ways.

3) How to read only 10 records from i/p file?

Ans: 1) There is a component called Sample in your sort component folder. If you use this after the input file you can specify how many records you would like to pass through

2) In the Input Table component, in the parameters tab, you can specify how many records to read.

3) Also there is a Leading Records component (well in 2.11 anyway) that allows you to specify the number of records to read from a serial or mfs file.

4) One way to do this is with the Read-Raw component available in 2.11 or higher, although pragmatically you will have to describe and process the record structure as it works with raw data.

4) How do you make 4 way to 8 way in a graph

Ans: Put a partition and a gather component... Partition component should be 4 way MFS and the gather should be 8way.

(2)

5) In Ab initio what is the upstream and downstream?

Ans: upstream and downstream are used in conjunction with EME for dependency and impact analysis for the graphs we have developed and saved in to the repository. Basically it helps tracking the changes between the versions and changes in the individual components and variables in the components.

6) To extract files from both Oracle DB and from Mainframes (DB2). Is it possible to extract the data directly from the DB's or do i need to convert them into Flat files & load?

Ans: You can extract directly from the database on each of these. You just have to make sure that you have a config file set up for DB2 and Oracle. You also want to make sure you have your entire login variables set in your run settings You can load the data directly from DB2,Oracle or Informix using Unload DB Table component to my knowledge

7) Does anybody know how many number of columns in the lookup file? What is the maximum data we can have in the lookup file? I am doing code review for my application, i see their 8 to 10 columns in each lookup file with large amount of data

Ans: 1) There is no set column number that a lookup file can contain. There is, however, a limit on the size of the data file. If you believe that 8-10 columns are too large, you might be correct. If the size of the lookup contains anything over 750,000-1M records, I would highly recommend using a join on this. The lookup file will die, if the size gets too large, and you will have to code for a join.

2) Lookups get cached into memory during graph execution; it is always a good idea to keep the data in lookup to bare minimum based on requirement.

Don't keep any columns you don't need or you don't access from the lookup in Lookup file. If the graph is partitioned then try to use lookup_local wherever possible. For this your partition key and lookup key must match or lookup key should be leading subset of

partition key.

Rule of thumb: Trim any fields from the data, which you don't use in the downstream processing.

3) The limit for a lookup file is 2GB. Whether or not it is sensible to use a lookup of that sort of size depends on what it's being used for.

9) How can I stop an executing graph in the middle for some conditions then how to restart it?

Ans: 1) Doing a kill -9 PID1 PID2 will only kill the Ab Initio processes running on the host node. We may still have Ab Initio processes running on different agent nodes. During runtime Ab Inito creates one recovery file by the name <graph_name.rec> in the host directory specified in Run -> Setting parameter in GDE. If the host directory is not specified, then the file gets created in the default $HOME of the user specified in the Run ->Setting of GDE. This recovery file contains pointers to different temporary files created dynamically during runtime. In order to kill an Ab Inito Job and all its associated processes running across all the nodes, you have to execute the following two commands in order as they appear

1. m_kill -9 <recovery_file> 2. m_rollback -d <recovery_file>

(3)

If the graph execution has to be stopped, depending on certain conditions, then use force_error() function.

10) What are the functions used for system date?

a. today() :: Returns the internal representation of the current date on each call b. today1() :: Returns the internal representation of the current date on the first

call.

Note [DML represents dates internally as integer values specifying days relative to January 1, 1900]

iii) now() :: Returns the current local date and time

iv) now1() :: The first time a component calls now1, the function returns the value returned from the system function localtime. The second

and subsequent times a component calls now1, it returns the same value it returned on the first call

11) How to convert a string into date format?

Ans: The string needs to be first casted in the date format. So if you have an input field of string 20031130 and your output field is a date (YYYY-MM-DD),then use this

out.fieldname (date(YYYY-MM-DD))in.fieldname;

Note: [However if any of the i/p field has NULL data, it fails, so use a is_valid() ,is_defined() functions to check the validity of the i/p data ]

12) What is the relation between EME, GDE and Co-operating system?

Ans. EME is said as enterprise metadata env, GDE as graphical development env and Co-operating system can be said as ab initio server relation b/w this CO-OP, EME AND GDE is as fallows Co operating system is the Abinitio Server. This co-op is installed on particular O.S platform that is called NATIVE O.S .coming to the EME, its just as repository in informatica, its hold the metadata, transformations, db config files source and targets informations. coming to GDE its is end user environment where we can develop the

graphs(mapping just like in informatica) designer uses the GDE and designs the graphs and save to the EME or Sand box it is at user side where EME is as server side.

13) What is the use of aggregation when we have rollup as we know rollup component in abinitio is used to summarize group of data record, then where we will use aggregation?

Ans: Aggregation and Rollup both can summarize the data but rollup is much more convenient to use. In order to understand how a particular summarization being rollup is much more explanatory compared to aggregate. Rollup can do some other functionality like input and output filtering of records.

Aggregate and rollup perform same action, rollup display intermediate result in main memory; Aggregate does not support intermediate result

(4)

Ans: Basically there are serial and parallel layouts supported by AbInitio. A graph can have both at the same time. The parallel one depends on the degree of data parallelism. If the multi-file system is 4-way parallel then a component in a graph can run 4 way parallel if the layout is defined such as it's same as the degree of parallelism.

15) How can you run a graph infinitely?

Ans: To run a graph infinitely, the end script in the graph should call the .ksh file of the graph. Thus if the name of the graph is abc.mp then in the end script of the graph there should be a call to abc.ksh. Like this the graph will run infinitely.

16) Do you know what a local lookup is?

Ans: If your lookup file is a multifile and partioned/sorted on a particular key then local lookup function can be used ahead of lookup function call. This is local to a particular partition depending on the key.

Lookup File consists of data records which can be held in main memory. This makes the transform function to retrieve the records much faster than retrieving from disk. It allows the transform component to process the data records of multiple files fastly.

17) What is the difference between look-up file and look-up, with a relevant example? Ans: Generally Lookup file represents one or more serial files (Flat files). The amount of data is small enough to be held in the memory. This allows transform functions to retrieve records much more quickly than it could retrieve from Disk.

A lookup is a component of abinitio graph where we can store data and retrieve it by using a key parameter.

A lookup file is the physical file where the data for the lookup is stored. 18) How many components in your most complicated graph?

It depends the type of components you us. Usually avoid using much complicated transform function in a graph.

19) Explain what is lookup?

Lookup is basically a specific dataset which is keyed. This can be used to mapping values as per the data present in a particular file (serial/multi file). The dataset can be static as well dynamic (in case the lookup file is being generated in previous phase and used as lookup file in current phase). Sometimes, hash-joins can be replaced by using reformat and lookup if one of the inputs to the join contains less number of records with slim record length.

AbInitio has built-in functions to retrieve values using the key for the lookup 20) What is a ramp limit?

The limit parameter contains an integer that represents a number of reject events. The ramp parameter contains a real number that represents a rate of reject events in the number of records processed.

no of bad records allowed = limit + no of records*ramp. ramp is basically the percentage value (from 0 to 1)

This two together provides the threshold value of bad records. 21) Have you worked with packages?

(5)

A multistage transform component by default uses packages. However user can create his own set of functions in a transfer function and can include this in other transfer functions.

22) Have you used rollup component? Describe how.

If the user wants to group the records on particular field values then rollup is best way to do that. Rollup is a multi-stage transform function and it contains the following mandatory functions.

1. initialise 2. Rollup 3. finalise

Also need to declare one temporary variable if you want to get counts of a particular group. For each of the group, first it does call the initialize function once, followed by rollup function calls for each of the records in the group and finally calls the finalize function once at the end of last rollup call.

23) How do you add default rules in transformer?

Add Default Rules Opens the Add Default Rules dialog. Select one of the following: Match Names Match names: generates a set of rules that copies input fields to output fields with the same name. Use Wildcard (.*) Rule Generates one rule that copies input fields to output fields with the same name.

1) If it is not already displayed, display the Transform Editor Grid. 2) Click the Business Rules tab if it is not already displayed. 3) Select Edit > Add Default Rules.

In case of reformat if the destination field names are same or subset of the source fields then no need to write anything in the reformat xfr unless you dont want to use any real transform other than reducing the set of fields or split the flow into a number of flows to achieve the functionality.

24) What is the difference between partitioning with key and round robin?

Partition by Key or hash partition -> this is a partitioning technique which is used to partition data when the keys are diverse. If the key is present in large volume then there can large data skew? But this method is used more often for parallel data processing.

Round robin partition is another partitioning technique to uniformly distribute the data on each of the destination data partitions. The skew is zero in this case when no of records is divisible by number of partitions. A real life example is how a pack of 52 cards is distributed among 4 players in a round-robin manner.

25) How do you improve the performance of a graph?

There are many ways the performance of the graph can be improved. 1) Use a limited number of components in a particular phase

2) Use optimum value of max core values for sort and join components 3) Minimise the number of sort components

4) Minimise sorted join component and if possible replace them by in-memory join/hash join 5) Use only required fields in the sort, reformat, join components

6) Use phasing/flow buffers in case of merge, sorted joins

7) If the two inputs are huge then use sorted join, otherwise use hash join with proper driving port

(6)

8) For large dataset don't use broadcast as partitioner

9) Minimise the use of regular expression functions like re_index in the trasfer functions 10) Avoid repartitioning of data unnecessarily

Try to run the graph as long as possible in MFS. For these input files should be partitioned and if possible output file should also be partitioned.

26) How do you truncate a table?

From Abinitio run sql component using the DDL "truncate table By using the Truncate table component in Ab Initio

27) Have you ever encountered an error called "depth not equal"?

When two components are linked together if their layout doesnt match then this problem can occur during the compilation of the graph. A solution to this problem would be to use a

partitioning component in between if there was change in layout.

28) What is the function you would use to transfer a string into a decimal?

In this case no specific function is required if the size of the string and decimal is same. Just use decimal cast with the size in the transform function and will suffice. For example, if the source field is defined as string (8) and the destination as decimal (8) then (say the field name is field1).

out.field :: (decimal(8)) in.field

If the destination field size is lesser than the input then use of string_substring function can be used like the following.

say destination field is decimal(5).

out.field :: (decimal(5))string_lrtrim(string_substring(in.field,1,5)) /* string_lrtrim used to trim leading and trailing spaces */

29) What is an outer join?

An outer join is used when one wants to select all the records from a port - whether it has satisfied the join criteria or not.

30) What are Cartesian joins?

Joins two tables without a join key. Key should be {}.

31) What is the difference between a DB config and a CFG file?

A .dbc file has the information required for Ab Initio to connect to the database to extract or load tables or views. While .CFG file is the table configuration file created by db_config while using components like Load DB Table.

32) What is the relation between EME, GDE and Co-operating system?

ans. EME is said as enterprise metadata env, GDE as graphical development env and Co-operating system can be said as abinitio server relation b/w this CO-OP, EME AND GDE is as fallows

Co operating system is the Abinitio Server. this co-op is installed on particular O.S platform that is called NATIVE O.S .coming to the EME, its i just as repository in informatica , its hold the metadata, transformations, db config files source and targets informations. coming to GDE

(7)

its is end user environment where we can develop the graphs(mapping just like in informatica) designer uses the GDE and designs the graphs and save to the EME or Sand box it is at user side where EME is as server side.

33) Explain the difference between the truncate and "delete" commands. The difference between the TRUNCATE and DELETE statement is Truncate belongs to DDL command whereas DELETE belongs to DML command. Rollback cannot be performed in case of truncate statement whereas Rollback can be performed in Delete statement. "WHERE" clause cannot be used in Truncate where as "WHERE" clause can be used in DELETE statement.

34) How we can create job sequencer in abinitio i.e running number of graphs at a time?

As such there is no job sequencer supported by Ab initio Until the versions:GDE:1.13.3 and Co>Op:2.12.1 But we can sequence a the jobs by creating Wrapper Scripts in UNIX i.e. a korn shell script which calls the graphs in sequence.

In Abinito it is not possible to create the job sequence. But scheduling of the jobs can be done with the help of scheduling tool called "CONTROL M".In this tool graph corresponding scripts and wrapper scripts are placed as per the sequence of exec and we can monitor the execution of the graphs. There is no sequencer concept in abinitio. suppose you have graphs A,B,C

A o/p is I/p to B and B o/p is Input to C

Then you will write a wrapper script that will call this jobs, script will be like this a.ksh

b.ksh c.ksh

you can use next_in_sequence function which returns sequence of integers 35) How to take the input data from an excel sheet?

There is a Read Excel component that reads excel either from host or from local drive. The dml will be a default one.

make it csv formatted , deliminated file and read it thru input table comp. 36) What is the function you would use to transfer a string into a decimal? use ""reinterpret_as" function to convert string to decimal,or decimal to string. syntax: To convert decimal into string

reinterpret_as(ebcdic string(13),(ebcdic decimal(13))(in.cust_amount)) 37) How to run the graph without GDE?

(8)

In the run directory a graph can be deployed as a .ksh file. Now, this .ksh file can be run at the command prompt as:

ksh <script_name> <parameters if any>

38) How to work with parameterized graphs?

One of the main purpose of the parameterized graphs is that if we need to run the same graph for n number of times for different files, we set up the graph parameters like $INPUT_FILE, $OUTPUT_FILE etc and we supply the values for these in the Edit>parameters. These

parameters are substituted during the run time. We can set different types of parameters like positional, keyword, local etc.

The idea here is, instead of maintaining different versions of the same graph, we can maintain one version for different files.

Have you worked with packages?

Packages are nothing but the reusable blocks of objects like transforms, user defined functions, dmls etc. These packages are to be included in the transform where you use them. For

example, consider a user defined function like /*string_trim.xfr*/

out::trim(input_string)= begin

let string(35) trimmed_string = string_lrtrim(input_string); out::trimmed_string;

end

Now, the above xfr can be included in the transform where you call the above function as include ''~/xfr/string_trim.xfr'';

But this should be included ABOVE your transform function. For more details see the help file in "packages".

What is an outer join?

If you want to see all the records of one input file independent of whether there is a matching record in the other file or not. Then its an outer join.

What is driving port?

In a join, it is sometimes advantageous to have the Sorted-Input parameter set to "Input need not be sorted". This helps, when we are sure that one of the input ports has far less records than the other port, and the data from that port can be held in memory. In this case, we can set the other port as the driving port.

(9)

Say, e.g. Port in0 has 1000 rec and in1 has 1 million records, in this case we set the port in1 as driving port, for which the value would be 1. By default, the driving port value is 0(for in0). Depending on the requirement, sometimes it more advisable to create a lookup instead. But that depends on the requirement and design

What is writing of wrapper can any explain elaborately?

Writing a wrapper script helps u 2 to run the graph in sequence as u want. Example:

when u need to run 3 graphs but the condition is after the first graph ran successfully u need to take the feed generated by it and use it in next graph and so on... graph after it finished u have to check the graph ran successfully then run the second KSh so on...

What is Conditional DML? Can anyone please explain with example

Then u have to right a Unix script in which run the ksh of the first The DML that is used as a condition is known as conditional DML..

Suppose we have data that includes the Header, Main data and Trailer as given below: 10 This data contains employee info.

20 emp_id,emp_name, salary 30 count

So, the DML for the above structure would be: Record decimal (",") id; if (id==10) begin string (",") info; end else if (id==20) begin string (",") emp_id; string (",") name;

(10)

string (",") salary; end else if (id==30) begin decimal (",") count; end end; This is

Could anybody provide me the major UNIX commands for abinitio multi file system? m_mkfs - For creating a multifile

m_ls - to list all the multifiles m_rm - To remove the multifile m_cp - To cpy a multifile

What is meant by vector field? Explain with an exam...

A vector is a sequence of the same type of elements. The element type may be any type including a vector or record type.

It is a field which tell us how many times a particular field is repeated .for example Take this input

Cust_id purchase_amount purchase date 101 1000 29.08.06 101 500 30.08.06 102 1050 31.08.06 103 1140 1.0906 103 1000 02.0906 103 500 30.09.06

Cust_id total_purchase_amount no_purchase purchase_date(1) purchase_date(2 101 1500 2 29.08.06 30.08.06 102 1050 1 31.08.06

103 2640 3 1.09.06 02.09.06 so on Here no_purchase is the vector field which rep the no of times a cust hase done purchases What does dependency analysis mean in Ab Initio?

Dependency analysis will answer the questions regarding data linage that is where does the data come from, what applications produce and depend on this data etc.

(11)

For data parallelism, we can use partition components. For component parallelism, we can use replicate component. Like this which component(s) can we use for pipeline parallelism? When connected sequence of components of the same branch of graph executes concurrently is called pipeline parallelism.

Components like reformat where we distribute input flow to multiple o/p flow using output index depending on some selection criteria and process those o/p flows simultaneously creates pipeline parallelism.

But components like sort where entire i/p must be read before a single record is written to o/p cannot achieve pipeline parallelism

flow:

input file --->reformat--->rollup--->filter by expression--->o/p file 50th record 25 records 10 records

clearly speaking when ever u run any graph we observe the number of records processed on flows ,this is best example for pipeline parallism.

What is .abinitiorc and what it contains?

.abinitiorc is the config file for ab initio. It is found in user's home directory. Generally it is used to contain abinitio home path, different log in information like id encrypted password login method for hosts where the graph connects in time of execution.

.abinitiorc file contains all configuration variables such as AB_WORK_DIR, AB_DATA_DIR etcthis file can be find in "$AB_HOME/Config".

What do you mean by .profile in Abinitio and what...?

.profile is a file which gets executed automatically when that particular user logging in. You can change your .profile file to include any commands that you want to execute whenever u logging in. you can even put commands in your .profile file that overrides settings made in /etc/profile(this file is set up by the system administrator).

You can set the following in your .profile... - Environment settings

- aliases - path variables

- name and size of your history file

(12)

What is semi-join?

in abinitio, there are 3 types of join...

1. inner join. 2. outer join and 3.semi join.

For inner join 'record_required' parameter is true for all in ports. For outer join it is false for all the in ports.

if u want the semi join u put 'record_requiredn' as true for the required component and false for other components.

What is data mapping and data modeling?

Data mapping deals with the transformation of the extracted data at FIELD level i.e.

the transformation of the source field to target field is specified by the mapping defined on the target field. The data mapping is specified during the cleansing of the data to be loaded. For Example:

source;

string(35) name = "Siva Krishna "; target;

string("01") nm=NULL("");/*(maximum length is string(35))*/ Then we can have a mapping like:

Straight move.Trim the leading or trailing spaces. What is driving port? When do you use it?

Driving port in join supplies the data that drives join . That means, for every record from the driving port, it will be compared against the data from non driving port.

We have to set the driving port to the larger dataset so that non driving data which is smaller can be kept in main memory for speeding up the operation.

What is $mpjret? Where it is used in ab-initio?

$mpjret is return value of shell command "mp run" execution of Ab-Initio graph. What is data cleaning? How is it done?

I can simply say it as Purifying the data.

Data Cleansing: the act of detecting and removing and/or correcting a databases dirty data (i.e., data that is incorrect, out-of-date, redundant, incomplete, or formatted incorrectly)

(13)

GDE(Graphical development environment) is look like a GUI to develop the graphs in a simple manner.

Co> Operating system is nothing but distributed operating system, which can run as a backend server

Current Version of GDE is 1.15 and Co>Operating system is 2.14 2. Which process you fallowed to develop a graph?

 Getting the requirements

 Preparing the mapping documents(Mapping document is nothing but mapping

between Input field and output field using some functional logic)

 Then using the design documents I will implement the graph with proper components.

3. Which components you have worked?

 Reformat  Rollup  Join  Sort  Replicate

 Partition by expression and key  Redefine

 Multi update  Lookup  Intermediate

4. Explain About Reformat component?

Reformat can change the record formats by dropping fields or adding or combining Ports:  Input  Output  Reject  Log  Error Specific Parameters:  Select  Output Index

5. What is the difference between output Index and Select parameters in reformat?

Select and output index both are used to filter the data, but using select parameter we cant get the deselected record. But using output index parameter U can filter the data as well as u can connect the deselected record to another output port.

(14)

Reformat can change the record format by dropping, adding, modifying fields.

Using Redefined format copies the records from input to output without changing the record values.

7. Explain about Join component?

 Reads the data from two or more inputs and combines the records with

matching keys and send to output ports Specific parameters:

 Dedup: Set true to remove duplicates before joining

 Driving port: Driving port is the largest input and remain inputs will directly

reads into memory.(Available only when Inmemory: Input need not to sort parameter set to true)

 Join type:

1. Inner join 2. Full outer join 3. Explicit join

 Record required parameter: This will be available when join type is set to Explicit. If

you want left outer join set true to input 0 and false to input 1.If you want right outer join set false to Input 0 and set true to Input 1.

 Key: Matching keys

 Overridden key : Set the alternative names to the particular key fields

 Max memory: Maximum usage of bytes before joining to write the temporary files to

the disk(Available only when(sorted Input In memory: Input need to be sort is set to true), default is 8MB

 Select: To filter the data

 Max-core: Maximum usage of bytes before joining to write the temporary files to the

disk (Available only when (sorted Input In memory: Input need not to be sort is set to true) .The default is 64MB

 Sorted Input:

When set to in memory Input need to be sort, it accept only sorted input and if it is In memory Input need not sort, accepts unsorted data

Specific ports:

Unusedn: We can retrieve the unmatched data using unused ports 9. Can we make a explicit join for more than two inputs?

Yes, we can make join for more than two inputs Ex:

(15)

For three inputs, if you want left outer join set the record required parameter true to input 0 and false to input 1 and input2

For three inputs, if you want right outer join set the record required parameter false to input 0 input 1 and set true toinput2

10. What is the difference between merge and join?

Both components used to join the data based on keys, with join we can combine to input flows, but using merge we can combine the partitioned data.

11. Explain about sort component? Sort component sorts and merge the data Parameters:

 Key

 Maxcore (Default is 100MB)

12. How to determine the Maximum usage of memory of a component? The maximum available value of max core is 231₁

13. Explain about Portion by key and Expression

Portion by Key: Distributes the records to output flow portions according to its key value Partition by Expression: Distributes the record to output flows partitions by expression. 14. What r the different types of partition components?

 Partition by key  Partition by Expression  Partition by round robin  Partition by range  Broadcast

15. Difference between broadcast and replicate?

Broadcast: combines the records it receives into single flow and writes a copy of that flow to each output flow partitions. Broadcast supports data parallelism.

Replicate: combines the records it receives into single flow and write a copy of that flow to each output flows. Replicate supports component parallelism.

16. What is difference between Concatenate and merge?

Concatenate: Appends the multiple flow partitions one after another Merge: Combine the multiple flow partitions that have been sorted by key 16. What are the different de partition components?

 Merge

 Interleave (Combines in round robin fashion)  Concatenate

(16)

 Gather (Combines the data arbitrarily)

16. what is the difference between reformat and Filter by Expression?

In both components we can filter the data based on select expression, but in reformat we cant get the de selected records in a separate port. In filter by expression we have a separate deselect port.

18. Explain the difference between Aggregator and rollup?

Both components used for summarization, but in aggregator dont have the built-in functions. In rollup we have the built-in functions like SUM (), AVG (), COUNT (), MIN (), MAX (), FIRST (), LAST (), PRODUCT ().

19. Explain the difference between rollup and scan?

Rollup component can produce the total control on summarization. Scan component produce only Intermediate summary or cumulative summary records.

20. What are the aggregator functions in rollup?

 Temporary_type (declaring the temporary variable)  Initialize (Initializing needed value)

 Rollup (Doing summarization)  Finalize (Assigning the final value)

21. what are the different types of sort components?

 Sort

 Sort with groups  Checkpointed sort  Partition by key and sort

23. What is a multifile and how we can create through command line?

AbInitio multifiles are nothing but a partition of a large serial file into tree structure and runs parallel way.

We can create the multifile in command line using the command M_MFKS fallowed by URL. Of that particular file.

24. What is the difference between phase and check point?

 Phases are used to break up a graph into blocks for performance tuning.  Check point is used for recovery

(17)

25. Explain about different types of parallelisms supported by Ab Initio? Ab Initio supports three types of parallelisms:

 Component parallelism  Pipeline parallelism  Data parallelism

Component parallelism:

Component parallelism occurs when program components execute simultaneously on different branches of a graph.

Pipeline parallelism:

Pipeline parallelism occurs when a connected sequence of program components on the same branch of a graph execute simultaneously.

Data parallelism:

Data parallelism occurs when you separate data into multiple divisions, allowing multiple copies of program components to operate on the data in all the divisions simultaneously.

(18)

26. Explain about flow partitions in AbIntio?

 Straight flow  Fan-in flow( )  Fan-out flow( )  All to All flow( )

Straight flow: This flow connects the two components with the same depth of parallelism

Fan-in flow: A fan-in flow connects a component with a greater depth of parallelism to one with a lesser depth in other words; it follows a many-to-one pattern.

(19)

Fan-out flow:

A fan-out flow connects a component with a lesser number of partitions to one with a greater number of partitions in other words, it follows a one-to-many pattern.

(20)

An all-to-all flow is used:

 To connect components with different numbers of partitions, when the result of dividing the greater number of partitions by the lesser number is not an integer  For repartitioning of data using components with the same or different numbers of

partitions (see Repartitioning)

28. Do u have worked on conditional components?

You can make any component or sub graph conditional by specifying a conditional expression that the GDE evaluates at runtime to determine whether or not the component runs.

If the conditional expression evaluates to true, the GDE runs the subgraph or component. If the conditional expression evaluates to false, the GDE either disables the component and any flows connected to its ports, or replaces it with a flow, depending on your choice on the Properties dialog: Condition tab.

29. What is a subgraph?

A subgraph is a graph fragment. Just like graphs, subgraphs contain components and flows. A subgraph groups together components that perform a subtask in a graph. The subgraph creates a reusable component that performs the subtask.

30. What sort of functions have u worked?

 Enquiry and error functions  String functions

(21)

 Date functions

31. which Enquiry and error functions have u used?

 Is_defined (Test whether the expression is not null)

Syntax: Is_defined (expr)

 Is_Null(Test whether the expression is null)

Syntax: Is_defined (expr)

 Is_error (Tests whether the error will occur while the time of evaluating the

expression) Syntax: Is_error (expr)

Is_valid (Tests whether the expression is valid or not) Syntax: Is_valid (expr)

 Force error(Causes an error and returns a message)

Syntax: force_error (string msgr)

32. What sorts of String functions u have been worked?

 Decimal_lpad:  Decimal_lrpad  String_compare  String_substring  String_concat  String_Index  String_length  String_lpad  String_lrpad

(Note: please go through the help document for the description) 33. How can we generate sequence of numbers in Ab Intio?

(22)

Syntax: int next_in_sequence( )

34. How can we get the log information in AbInitio?

Using write_to_log function we can write to log port of a component Syntax: write_to_log(string event_type, string event_text) 35. What is the use of Lookup file component?

Lookup File represents one or more serial files or a multifile. The amount of data is small enough to be held in main memory. This allows a transform function to retrieve records much more quickly than it could retrieve them if they were stored on disk.

Lookup File associates key values with corresponding data values to index records and retrieve them.

Parameters for Lookup:

 Key

 Record format

How to Use Lookup File

Unlike other dataset components, Lookup File is not connected to other components in graphs. In other words, it has no ports. However, its contents are accessible from other components in the same or later phases.

You use the Lookup File in other components by calling one of the following DML functions in any transform function or expression parameter: lookup, lookup_count, or lookup_next. The first argument to these lookup functions is the name of the Lookup File. The remaining arguments are values to be matched against the fields named by the key parameter. The lookup functions return a record that matches the key values and has the format given by the RecordFormat parameter. For details, see the Data Manipulation Language Reference. A file you want to use as a Lookup File must fit into memory. If a file is too large to fit into memory, use Input File followed by Match Sorted or Join instead.

Information about Lookup Files is stored in a catalog, which allows you to share them with other graphs.

36.Have u worked on Lookup functions? I worked on the fallowing functions:

 Lookup  Lookup_count  Lookup_Local  Lookup_next

(Note please go through help document for the description)

(23)

By clicking the Add to catalog check box.

39. Explain the performance tuning in your current project?

There are many ways the performance of the graph can be improved. 1) Use a limited number of components in a particular phase

2) Use optimum value of max core values for sort and join components 3) Minimize the number of sort components

4) Minimize sorted join component and if possible replace them by in-memory join/hash join 5) Use only required fields in the sort, reformat, join components

6) Use phasing/flow buffers in case of merge, sorted joins

7) If the two inputs are huge then use sorted join, otherwise use hash join with proper driving port

8) For large dataset don't use broadcast as partitioner

9) Minimise the use of regular expression functions like re_index in the trasfer functions 10) Avoid repartitioning of data unnecessarily

40.What is DB CONFIG file and how to create it?

Db config file has the information required for the AbInitio to connect the database

Creation: In Input or output table components select dbconfig file/new/then u should give the Db name, Db node,database version and user_id and password and click create.

41. How do u migrate ur project from one env to another env? We have two options like

 Check-In  Check-Out

(Note :Please go through the help document for more Information) 42. How can do Version control in Ab Initio?

Once Check- In has done Graph automatically updated to new version

Whenever u checkout the graph u need to give Tag information in the Tag tab(It represents the version)

If u want to view total versions, you need to give the fallowing command in the command line: AIR_OBJECT_VERSION_VERBOSE.

43 .How can we debug AbInitio graph?

Ans: Using file watchers we can debug the graph, watcher will add an Intermediate file on the flow. So you can view the data that passes through the flow when you run a graph.

Two types of watchers are there:

 Non-phased  Phased

(24)

Phased: with phase break.

44. How do we add a watcher to the flow? Add watchers on flows by doing the following:

1. Turn on debugging mode if it is not on.

2. Select the flows on which you want to place watchers. 3. Do one of the following:

o On the menu bar of the GDE, choose Debugger > Add Watcher to Flow. o On the GDE Debugging toolbar, click the Add Watcher to Flow button . o Right-click the flow and choose Add Watcher from the shortcut menu. Watchers appear on the selected flows.

The actions in step 3 will remove watchers if there are watchers on all selected flows. When you run the graph the watchers turn blue, and you can view the data that has passed through the flows.

45.How to run a graph through command line?

Ans: We can deploy the graph as a .ksh file and using that file can run the graph through command line.

46.what is a sandbox?

A sandbox is a collection of graphs and related files that are stored in a single directory tree, and treated as a group for purposes of version control, navigation, and migration.

Sandbox contains fallowing sub directories:

 DML (Holds the Record format Information)  XFR (Holds the Transformation logic files)  DB (Holds the database connection information)  MP (Holds the graphs)

 RUN (Holds the ksh files)

47.What will happen we you create a sandbox in Ab Initio?

When you create the sandbox, Automatically the tree structure (DML.XFR,DB,MP,RUN Folders) ,parameters and environment variables will create. Along with these the ABPROJECTSETUP.KSH file will create in the sandbox.

48..What sort of error messages you have got In your project?

 Bad value found error  Null value assignment  Depth is not equal

(25)

 Too many files open or max core error

49.When we can get the depth is not equal message?

When the depth of parallelism (partitions of a layout) mismatched between up stream and down stream components.

50.When we get the too many files opened error?

When the max core value is too low while executing a component this error will occur, so we need to set the appropriate max core value for that component.

51.How does the job recovery works in Ab Initio? Job recovery can done in the fallowing ways:

 If you set the checkpoint phase .rec file will create automatically. Once failure occur

for graph, while the time of rerunning of that graph, It will automatically recover the data till last check point

 If you want to run the from the beginning, you need perform the manual rollback from

the command line

 The command is m_rollback

53.What is local variable?

A local variable is a variable declared within a transform function. You can use local variables to simplify the structure of rules or to hold values used by multiple rules.

Declaration:

Here is the syntax of a local variable declaration:

let type_specifier variable_name [not NULL] [ = expression ] ;

NOTE: The declaration of a local variable must occur before the statements and rules in a transform function.

Let Keyword for declaring a variable. type_specifier The type of the variable.

variable_name The name of the variable.

[not NULL] Optional. Keywords indicating that the variable cannot take on the value of null. These must appear after the variable name and before the initial value. NOTE: If you create a local variable without the not NULL keywords, and do not assign an initial value, the local variable initially takes on the value of null.

(26)

expression ]

; A semicolon must end a variable declaration.

For example, the following local variable definitions define two variables, x and y. The value for x depends on the value of the amount field of the variable in, and the value of y depends on the value of x:

let int x = in.amount + 5; let double y = 100.0 / x; 54.What is Global variable?

With in a package you can create and use the global variable to all the transformation functions, which are present in the package, but u should declare the global variable outside the transformation function.

Declaration:

let type_specifier variable_name [not NULL] [ = expression ] ; Let Keyword for declaring a variable.

type_specifier The type of the variable. variable_name The name of the variable.

[not NULL] Optional. Keywords indicating that the variable cannot take on the value of null. These must appear after the variable name and before the initial value. NOTE: If you create a global variable without the not NULL keywords, and do not assign an initial value, the global variable initially takes on the value of null.

[ =

expression ] Optional. An expression that provides an initial value for the variable. ; A semicolon must end a variable declaration.

55. Have you ever used any m commands?

Yes, I used the commands like M_rollback, M_cleanup,m_dump 56. What is the difference between m_rollback and m_cleanup?

m_rollback rolls back a partially completed graph to its beginning state. m_cleanup cleans up files left over from unsuccessfully executed graphs and manually recovered graphs.

57. How to use m_cleanup?

To find temporary files and directories before cleaning them up, you use the m_cleanup command. You can run this utility with or without arguments:

(27)

 m_cleanup prints usage for the command.  m_cleanup -help prints usage for the command.

 m_cleanup -j job_log_file [job_log_file... ] lists the temporary files and directories listed in the log file specified by job_log_file. To specify multiple files, separate each filename with a space.

Log files have either a .hlg or .nlg suffix. A log file ending in .hlg is on the control, or host, machine of a graph. A log file ending in .nlg is on a processing machine of a graph.

The job_log_file can be an absolute or relative pathname. Paths have the following syntax: o On the control machine AB_WORK_DIR/host/job_id/job_id.hlg

o On a processing machine AB_WORK_DIR/vnode/job_id-XXX/job_id.nlg, where the XXX on a processing machine path is an internal ID assigned to each

machine by the Co>Operating System.

58.How can I generate DML for a database table from command line? Using the m_db command line utility we can generate the dml. Syntax is

m_db gendml dbc_file [options] -table tablename

59.Can we do check-In and Check-Out through Command line?

Yes, we can do check-in and check-out using the air commands like AIR_OBJECT_IMPORT and AIR_OBJECT_EXPORT.

60.What sort of issues you solved in the production support?

 Data quality issues  Max core issues.

1) What is EME & EME DataStore?

Ans) EME is short for Enterprise Meta>Environment. The EME is a high performance object-oriented storage system that manages Ab-Initio applications (including data formats and business rules) and related information. It provides an integrated and consolidated view of your business. It is used for the purpose of VERSION CONTROLLING, NAVIGATION & MIGRATION. An EME datastore is a specific instance of the EME: the term denotes the specific EME storage that you are currently connected to through the GDE, there can be many such datastore instances resident in an environment in which the EME has been installed. But you can only be connected to one datastore at a time: this is determined by your GDE's current EME datastore settings.

2) What is Sandbox?

Ans) A sandbox is a collection of graphs and related files that are stored in a single directory tree, and treated as a group for purposes of version control, navigation, and migration. A sandbox can be a file system copy of a datastore project.

(28)

Ans) The Co>Operating System is core software that unites a network of computing

resourcesCPUs, storage disks, programs, datasetsinto a production-quality data processing system.

The Co>Operating System is layered on top of the native operating systems of a collection of computers. It provides a distributed model for process execution, file management, process monitoring, checkpointing, and debugging.

The Graphical Development Environment (GDE) provides a graphical user interface into the services of the Co>Operating System.

4) What are the differences between the various GDE connection methods?

Ans) There are a number of communication methods used to communicate between the GDE and the Co>Operating System, including:

 Ab Initio Server/REXEC:  Ab Initio Server/TELNET:  DCOM:  REXEC:  RSH:  TELNET:  SSH(/Ab Initio)

When using the GDE to connect to the Co>Operating system, the normal process for a connection differs depending upon which communication method is selected. In broad terms, two things tend to happen: files are transferred from the GDE to the target host (or from the host to the GDE), and processes are started/executed on the host.

When using telnet, rexec and rsh, the basic steps are as follows. A. The GDE transfers the execution script to the server via FTP. B. The GDE connects to the server by means of the selected method.

C. The GDE executes that script on the server by means of the connection set up in step B.

The process is differerent for connection methods that use the Ab Initio Server, however. These methods include Ab Initio Server/Telnet and Ab Initio Server/Rexec, as well as SSH and DCOM. The use of the Ab Initio Control Server replaces the need for FTP and adds enhanced server-side services. When the Ab Initio Control Server is involved, the basic steps are as follows:

 The GDE connects to the server by means of the selected method.  This connection initiates startup of the Ab Initio Control Server.  The GDE initiates a connection to the Control Server.

 All file transfer occurs across the same Control Server connection.

 Script execution is accomplished through a new connection using the selected connection method.

(29)

5) What is Meta data?

Ans) Meta data is Data about the Data, It will give the description about the data. Metadata associated with graphs. This includes the information needed to build a graph, such as record formats, key specifiers, and transform functions.

6) What is the configuration file in Ab-initio?

Ans) The Co>Operating System accepts either of two names for the per-user Ab Initio configuration file. In addition to .abinitiorc, the Co>Operating System now also accepts abinitio.abrc in order to conform to Windows file name conventions. Other supported platforms also recognize the new name. Only one configuration file is permitted, however. Using both .abinitiorc and abinitio.abrc results in an error.

7) What are different file extensions in Ab-initio? Ans)

.cfg Database table configuration files for use with 2.1 Database Components .dat Data files (either serial files or multifiles)

.dbc Database configuration files

.dml Data Manipulation Language files or record format definitions. .mdc Dataset or custom dataset components

.mp Stored Ab Initio graphs or graph components .mpc Program components or custom components .xfr Transform function definitions or packages .aih Host Settings

.aip Project Settings

8) What does GDE do automatically?

Ans) The GDE provides default settings and behaviors for several features. Flow Buffering and Deadlock Layout and Record Format Propagation

9) What kind of flat file formats supports by Ab Initio Graphical Design Interface (GDE)? Ans) The Ab Initio Graphical Design Interface (GDE) supports these flat file formats: All file types use the .dat extension.

 Serial Files  Multifiles

(30)

 Ad-hoc Multifile Serial Files

A serial file is a flat, non-parallel file also known as one-way parallel. You create serial files using a Universal Resource Locator (URL) on the component's Description tab. The URL starts with file

Multifiles:

A multifile is a parallel file consisting of individual files called partitions and often stored on different disks or computers. A multifile has a control file that contains URLs pointing to one or more data files. You can divide data across partition files using these methods: random or roundrobin partitioning, partitioning based on ranges or functions, and replication or

broadcast, in which each partition is an identical copy of the serial data. You create multifiles using a URL on the components Description tab.

Ad-hoc Multifile :An ad-hoc multifile is a also a parallel file. Unlike a multifile, however, the content of an ad-hoc multifile is not stored in multiple directories. In a custom layout, the partitions are serial files. You create an ad-hoc multifile using partitions on the component's Description tab.

10) What is dbc file contains?

Ans) File with a .dbc extension which provides the GDE with the information it needs to connect to a database. A configuration file contains the following information:

 The name and version number of the database to which you want to connect

 The name of the computer on which the database instance or server to which you want to connect runs, or on which the database remote access software is installed

 The name of the database instance, server, or provider to which you want to connect 11) What are the default parameters in sandbox?

Ans) The default sandbox parameters in a GDE-created sandbox are these six:  PROJECT_DIR absolute path to the sandbox directory

 DML relative sandbox path to the dml subdirectory  XFR relative sandbox path to the xfr subdirectory  RUN relative sandbox path to the run subdirectory  DB relative sandbox path to the db subdirectory  MP relative sandbox path to the mp subdirectory

These six parameters are automatically created (and assigned their correct value) whenever you create a sandbox.

12) What is the difference b/w sandbox parameters & graph parameters? Ans) The difference between sandbox parameters and graph parameters is:

(31)

 Graph parameters are visible only to the particular graph to which they belong  Sandbox parameters are visible to all the graphs stored in a particular sandbox 13) What is standalone Sandbox?

Ans) A sandbox that is not associated with a project is simply a special directory. 14) What is the difference b/w EME & Sandbox?

Ans) The big difference between the contents of a sandbox and its corresponding project in the EME is that the project contains, for each file, each and every version that has ever been checked in by anybody. The sandbox, on the other hand, contains only the latest version of each file checked out into that sandbox.

A sandbox can be associated with only one project. However, there is no limit (other than the physical one of disk space) to the number of sandboxes that a user can have. Although a given sandbox can be associated with only one project, a given project can have any number of sandboxes.

15) What are formal graph parameters?

Ans) A formal graph parameter is a parameter you substitute for a path and/or filename when you create a graph. This allows you to specify the value of that parameter at runtime.

16) What is the order of evolution of parameters?

Ans) When you run a graph, parameters are evaluated in the following order:  The host setup script is run.

 Common (that is, included) sandbox parameters are evaluated.  Sandbox parameters are evaluated.

 The project-start.ksh script is run.  Formal parameters are evaluated.  Graph parameters are evaluated.  The graph Start Script is run. 17) What is Transform function?

Ans) A transform function (or transform) is the logic that drives data transformation most commonly, transform functions express record reformatting logic. In general, however, you can use transform functions in data cleansing, record merging, and record aggregation.

To be more specific, a transform function is a collection of business rules, local variables, and statements. The transform expresses the connections between the rules, variables, and statements, as well as the connections between these elements and the input and output fields.

(32)

Transform functions are always associated with transform components; these are components that have a transform parameter: Aggregate, Denormalize Sorted, Fuse, Join, Match Sorted, MultiReformat, Normalize, Reformat, Rollup, and Scan components.

18) What is Prioritizing rule?

Ans) The order of evaluation of rules in a transform function by assigning priority numbers to the rules. The rules are attempted in order of priority, starting with the assignment of lowest-numbered priority and proceeding to assignments of higher-lowest-numbered priorities, then finally to an assignment for which no priority has been given.

19) What are local variables?

Ans) A local variable is a named storage location in an expression or transform function. You declare a local variable within the transform function in which you want to use it. The local variable is reinitialized each time the transform function is called, and it persists for one single evaluation of the transform function.

20) What Is a Package?

Ans) A package is a named collection of related DML objects. A package can hold types, transform functions, and variables, as well as other packages. Packages provide a means of locating in one place DML objects that are needed more than once in a given graph, or needed by multiple developers. Packages allow developers to avoid redundant code; this makes maintenance of DML objects more efficient.

Packages are very useful in these types of situations:

 The record formats of multiple ports use common record formats and/or type specifiers  Multiple components use common transforms

21) Explain Multi-Stage transform Components?

Ans) The multi-stage transform components require packages because, unlike other transform components, they are driven by more than single transform functions. These components each take a package as a parameter and, in order to process data, look for particular variables, functions, and types in that package. For example, a multi-stage component might look for a type named temporary_type, a transform function named finalize, or a variable named count_items.

22) What is a Phase?

Ans) A phase is a stage of a graph that runs to completion before the start of the next stage. By dividing a graph into phases, you can save resources, avoid deadlock, and safeguard against failures. To protect a graph, all phases are checkpoints by default.

23) What is a Checkpoint?

Ans) A checkpoint is a phase that acts as an intermediate stopping point in a graph and saves status information to allow you to recover from failures. By assigning phases with checkpoints to a graph, you can recover completed stages of the graph if failure occurs.

(33)

24) How will use the subgraph of graph A in the Graph B?

Ans) When you build a subgraph, it becomes a part of the graph in which you build it. If you want to use it in other graphs, or in other places in the original graph, save it in the

Component Organizer of the GDE.

25) Is there a way to make my graph conditional, so that certain components may not run? Ans) You can enter a Condition statement on the Condition tab of graph components. This is an expression that evaluates to the string value for true or false (see details below). The GDE then evaluates the expression at runtime. If the expression evaluates to true, the component or subgraph is executed. If it is false, then the component or subgraph is not executed, and is either removed completely or replaced with a flow between two user-designated ports. The correct syntax for if statements in the Korn shell is as follows:

$( if [[ condition ]]; then_statement; else_statement; fi)

26) How to improve GDE is performence, when it's running slow?

Ans) If the GDE is performing slowly, you can improve performance with one or more of these methods:

 Turn off Undo by choosing File > Autosave/Undo on the GDE menu bar and clearing the selection of Undo/Redo Enabled.

 Turn off Propagation by choosing Edit > Propagation on the GDE menu bar and clearing the selection of Record Format and Layout.

 Increase the Tracking Interval by choosing Run > Default Settings on the GDE menu bar, clicking the Code Generation tab, and increasing the Tracking Interval to 60 seconds. 27) What is lookup file?

Ans) Lookup File represents one or more serial files or a multifile. The amount of data is small enough to be held in main memory. This allows a transform function to retrieve records much more quickly than it could retrieve them if they were stored on disk. Lookup File associates key values with corresponding data values to index records and retrieve them.

28) What is Two-stage routing?

Ans) When an all-to-all flow connects components with layouts containing a large numbers of partitions, the Co>Operating® System uses many networking resources. If the number of partitions in the source and destination components is N , an all-to-all flow uses resources proportional to N*N(N square) .

To save network resources, you can mark an all-to-all flow as using two-stage routing. With two-stage routing, the all-to-all flow uses only resources 2*N*√N (2*N*root N).

For example, an all-to-all flow with 25 partitions uses 25*25 = 625 resources, but with two-stage routing uses only 2*25*5 = 250 resources.

(34)

Ans) There are three types of parallelism employed by the Co>Operating System:  Component Parallelism

 Pipeline Parallelism  Data Parallelism

30) What is Component Parallelism?

Ans) Component parallelism occurs when program components execute simultaneously on different branches of a graph.

Component parallelism scales to the number of branches of a graph the more branches a graph has, the greater the component parallelism. If a graph has only one branch, component parallelism cannot occur.

31) What is Pipeline Parallelism?

Ans) Pipeline parallelism occurs when a connected sequence of program components on the same branch of a graph execute simultaneously.

32) What is Data Parallelism?

Ans) Data parallelism occurs when you separate data into multiple divisions, allowing multiple copies of program components to operate on the data in all the divisions simultaneously. 33) What are Multifiles and Multifile Systems & Multi directories?

Ans) Ab Initio multifiles are parallel files composed of individual files, typically located on different disks and usually, but not necessarily, on different systems. These individual files are the partitions of the multifile.

Ab Initio multifiles reside in parallel directories called multidirectories, which are organized into multifile systems. An Ab Initio multifile system consists of multiple replications of a directory tree structure containing multidirectories and multifiles. Each replication constitutes a partition of the multifile system.

Each partition holds a subset of the data contained in the multifile system, and the system has one additional partition that contains control information. The partitions containing data are the data partitions of the system, and the additional partition is the control partition. The control partition contains no user data, only the information the Co>Operating System needs to manage the multifile system.

34) How to create multifile system?

Ans) To create a multifile system, issue the m_mkfs command, using as arguments the URLs of the partitions of the multifile system you want to create. The first URL creates the control partition, and each subsequent URL creates the next partition of the multifile system. Similarly use m_mkdir for multi directories.

(35)

Ans) A layout is one of the following:

 A URL that specifies the location of a serial file

 A URL that specifies the location of the control partition of a multifile  A list of URLs that specifies the locations of:

 The partitions of an ad hoc multifile

 The working directories of a program component 36) What Is Dependency Analysis?

Ans) Using the EME, you can conduct project analyses of the dependencies within and between graphs. The EME examines the project and develops an analytical survey of it in its entirety, tracing how data is transformed and transferred, field by field, from component to component. 37) What are different kinds of Analysis?

Ans):

Choice Checkin Wizard Action

None Turns off all translation and dependency analysis during checkin.

Translation Only

Translates graphs from GDE format to datastore format, but does not do error checking and does not store results in the datastore.

Tip We recommend that at minimum you do translation only, since it is required for analysis, which you can run anytime.

Translation with Checking

Translates graphs from GDE to datastore format and checks for errors that will interfere with dependency analysis. See Checked-for Errors.

Full Dependency Analysis (Default)

Performs full dependency analysis on the graph and saves the results in the datastore.

Tip We recommend that you do not do analysis now, as it can greatly prolong checkin.

What to Analyze

The What to Analyze group of checkboxes allow you to specify which files will be subjected to the level of analysis you specified in Analysis Level. The following table explains the four choices:

Choice Checkin Wizard Analyzes ... All Files All files in the project.

(36)

All Unanalyzed Files

All files in the project that have changed or those that are dependent on or are required by files that have changed since the last time they were analyzed regardless of whether or not the files were checked in by you.

Only My Checked In Files

Only the files checked in by you. This group can include files you

checked in earlier which are still on the analysis queue and have not yet been analyzed.

Only the File Specified

(Default) Only the specified file(s).

Analysis Scope

The Analysis Scope group of checkboxes allow you to specify how far the specified level of analysis will be extended to files dependent on those being analyzed, both in the current project and in other projects. The following table describes the three choices.

Choice Checkin Wizard Analyzes...

Dependent Files from All

Projects (Default) Files in other projects common to (included in) the one you are checking, if they are dependent on the files being analyzed. Dependent Files from

Specified Project (Default)

Only the dependent files that are in the same project as the file(s) being analyzed.

No Dependent Files No dependent files.

38) What is switch parameter?

Ans) A switch parameter has a fixed set of string values which you define when you create the parameter. The purpose of a switch parameter is to allow you to change your sandbox's

context: its value determines the values of various other parameters that you make dependent on that switch. For each switch value, each of the dependent parameters has a dependent value. Changing the switch's value thus changes the values of all its dependent parameters. 39) What are the types of project parameters?

There are four types of project parameters:  Standard Parameters

 Switch Parameters  Dependent Parameters  Common Project Parameters 40) What is max-core parameter?

(37)

Ans) The value for the max-core parameter determines the maximum amount of memory, in bytes, that the component can use. If the component is running in parallel, the value of max-core represents the maximum memory usage per partition, not the sum for all partitions. If you set the max-core value too low, the component runs more slowly than expected. If you set the max-core value too high, the component might use too many machine resources, slow the process drastically, and cause hard-to-diagnose failures.

41) What is ordered flow?

Ans) The Ordered attribute is a port attribute. It determines whether the order in which you attach flows to a port, from top to bottom, is significant to the definition and purpose of the component. If a port is ordered, the order in which flows are attached determines the result of the processing the component does: if you change the order in which you attach the flows, you create a different result.

Note: GDE indicates the difference between a port that is ordered and one that is not by drawing them differently. If you inspect the ordered port of Concatenate in the graph, you see a line dividing the port between the two flows; that line is not present in the port of Gather, which is not ordered.

42) What will be the record order in the flows?

Ans) Components maintain the ordering of the input data records unless their explicit purpose is to reorder records. For most components, if record x appears before record y in an input flow partition, and if record x and record y are both in the same output flow partition, then record x appears before record y in that output flow partition.

For example, if you supply sorted input to a Partition component, it produces sorted output partitions.

Exceptions are:

 The components that explicitly reorder records, such as Sort, Sort within Groups, and Partition by Key and Sort.

 The components that have fan-in flows, such as the Departition components. They each define their own record order.

43) What is loging parameter?

Ans) The transform components and some other components have a logging parameter. This parameter specifies whether or not you want the component to generate log records for certain events. The value of the logging parameter is True or False. The default is False. If you set the logging parameter to True, you must also connect the component's log port to a component that collects the log records.

44) Explain about multistage transform components?

Ans) A multistage transform is a Transform Component that modifies records in up to five stages: input selection, temporary initialization, processing, finalization, and output selection.