Figure 36: Basic Data Flow in BI 7.0
Before we look into the detail of the data flow lets take a quick look at a very basic flow so that we can understand the journey the data follows.
The basic flow includes a source system connection, a DataSource followed by a Transformation and finally a Data Target (in our case an InfoCube). The data is
(PSA) which is associated with the DataSource. Next a Data Transfer Process (DTP) is triggered and loads the data to the Data Target via a Transformation.
The Transformation supplies the mapping rules between the PSA and the target, here the data could be modified, ignored or merged – the rules are very flexible.
There are a number of different targets in BI, in our example we are using an InfoCube. Both the Infopackage and the DTP are usually triggered in sequence by a background job known as a Process Chain. The Process Chain ensure that all job steps are executed in the correct order and at the correct time.
Now let's get to the detail. For the extraction of data from ECC and taking it to BI, there are a number of steps that need to be followed:
• First, the data structures need to be created that map on to the ECC database tables and the respective fields from where data has to come in. The data structure is known as the extraction structure and is a BI object which belongs to the DataSource. The DataSource is present on both the ECC system and BI. Initially the DataSource is created i nthe ECC system and is then automatically created on the BI system via a synchronisation mechanism known as a replication.
• Secondly, the data targets, i.e. the place where the data has to be loaded finally in the BI system is created. These are InfoCubes etc. Then the Transformation is created between the DataSource and the data targets.
• Then, data can be extracted onto the data targets. There will be an initial upload followed by delta uploads from time to time to update the data targets with the most recent data.
The above graphic illustrates the data flow for the logistics data extraction by means of an application example from sales. When a sales order is created it updates the respective tables in ECC system such as VBAK and VBAP. Through extraction structures, this data is taken to the DataSources.
The diagram above illustrates in detail how the data flows from the logistics transaction tables to BI. Notice there is a separate flow for the initial load of data versus the ongoing delta loads.
Note: SAP BI is capable of extracting data in FULL mode or DELTA mode. Delta mode provides only the data that is new or has been changes since the last load. This makes most sense in nearly all logistics scenarios where large volumes of data can quickly grow and you only want to load the changes.
The DataSource in the ECC system is replicated in BI system so that the two systems agree on what is being sent and what is being received. The next object in the flow is the transformation rules that feed the data into targets such as InfoCubes. Once the data is loaded into the InfoCubes you are then ready to create BI Queries in order to build reports based on Excel and on the web.
Figure 38: Logistics Data Extraction Cockpit
The extraction structure and DataSource are maintained in the Logistics Extraction Structures Customizing Cockpit in the transaction SBIW. This is a task always carried out in ECC system. The following functionality can be performed:
Every extraction structure can be maintained by you and by SAP. The extraction structure is filled with the assigned communication structures. You can only use selected fields from the communication structures. SAP already delivers extraction structures. You can enhance them as well. After you set up the extraction structure, the system automatically generates it. This completes missing fields (associated units and compounded characteristics). The extraction structure is created hierarchically according to the communication structures. Every communication structure leads to the generation of a substructure that belongs to the actual extraction structure.
• Maintenance of DataSources
At this point, you call the general maintenance of DataSources, where you can set up the selection of selectable fields and the ability of fields to be negated.
• Activation of updating
When you activate, data is written to the extraction structures immediately - both online and when you fill the tables for restructuring.
• Job control
• Update mode
Here you can set which type of accrued data is to be updated during delta updates.
The various update methods are described in more detail later in the lesson.
Once the DataSource has been defined in the ECC system you must then replicate this to the BI system.
Note: For extraction from SAP ECC to BI using the standard SAP supplied extractor (rather than flat file etc.), Datasources are always created in the ECC system and never in BI.
For replicating metadata, we have to go to the Data Warehousing Workbench→ DataSources. On executing Replicate Metadata all of the DataSources for this application component from ECC that are not yet present in BI are transferred to BI . Replication of the entire sources system can also be carried out but would take a large amount of time and hence it is advisable to replicate only the concerned DataSources.
Next the DataSource is assigned to the InfoCube. This is done by the Transformation.
The InfoPackage and the DTP are triggered using the BI Process Chain. Using the Process Chain means that you are able to define a strict sequence in which the loading tasks must be performed. The Process Chain is set to run at a
Figure 39: Executing the data load
The design of the data flow uses metadata objects such as DataSources,
Transformations, InfoSources and InfoProviders. Once the data flow is designed, the InfoPackages and the Data Transfer Processes take over to actually manage the execution and scheduling of the actual data transfer. There are two processes that need to be scheduled.
• The first process is loading the data from the source system. This involves multiple steps that differ depending on which source system is involved.
For example, if it is a SAP source system, a function call must be made to the other system, and an extractor program associated with the DataSource might be initiated. An InfoPackage is the BI object that contains all the settings directing exactly how this data should be uploaded from the source system. The target of the InfoPackage is the Data Source associated with the InfoPackage. In a production environment, the same data in the same source system should only be extracted once, with one InfoPackage; from there, as many data transfer processes as necessary can push this data to as many InfoProviders as necessary.
• The second process identified in the figure is the Data Transfer Process (DTP). It is this object that controls the actual data flow (filters, update mode (delta or full) for a specific transformation. You might have more than one data transfer process if you have more than one transformation step or target in the ETL flow. If you include more than one InfoProvider, you need more than one DTP.