• No results found

Chapter 2. Advanced Datacap capabilities

2.2 Multichannel input

2.3.3 Rule processing

One of the strengths of Datacap is its ability to perform operations in the background. This section explains how the Datacap workflow and rules are processed with the Document Hierarchy.

Rulerunner engine

To run background tasks, Datacap relies on the

Rulerunner

engine and an extensive library of

rules

and

actions

assembled into

rulesets

, which are functional blocks that run on the objects in the Document Hierarchy. Actions can be invoked manually, such as when a human operator validates field values.

In most cases, however, they are invoked automatically by the Rulerunner engine, which is set up to monitor a job queue and run tasks automatically as batches move forward through the Datacap process. Figure 2-3 illustrates the process.

Figure 2-3 Rulerunner executing the rulesets specified in each task

In addition to the Document Hierarchy, Datacap has a

workflow hierarchy

that describes the relationship between a

job

,

task

,

task profile

,

ruleset

,

rule

,

function

, and

action

. Creating a Datacap application entails defining these two hierarchies and the interplay between them.

Job, task, and task profile

A

job

is a particular combination and sequence of discrete tasks in the workflow of a given application to address a specific operational scenario. For example, we could set up a “mailroom scan job” with specific tasks:

򐂰 Process large scan runs of credit card applications from a production scanner.

򐂰 Classify and separate documents with separator sheets.

򐂰 Recognize, extract, and verify the data.

򐂰 Export the applications to FileNet Content Manager.

We could also set up another job called “MFP scan job” with similar tasks for capturing the credit card documents from an MFP, but with tasks modified to receive the documents from the MFP server, rather than the scanner, and to classify and separate the documents without separator sheets, because each batch contains only the documents from a single application. When a task is run, Datacap executes the

rulesets

that were defined in the corresponding

task profile

. This profile is a sort of template that is used by Datacap as an entry point to invoke a task at run time.

Scan Import Clean up Classify Separate Recognize Verify Export Clean up Identify Separate Recognize Verify lookup lookup Rulerunner Job Batch 1 Batch 4 Database Virtual Scan Ruleset Task Batch 3 Batch 2

Ruleset

A task profile is made up of several

rulesets

that are arranged in a particular sequence to produce the desired processing results. They can be thought of as “processing building blocks” that you apply to particular objects in the Document Hierarchy.

For example, a task profile called Extract can be set up to include all the functions to capture data from the batch in one high-level task. However, the capture process within that task must be performed in a logical order. To get good recognition results, you will typically need to clean up the images first, so you will assemble a ruleset called Enhance that works at page level. It applies image-processing rules to deskew and remove smears and borders, and might adjust contrast on all the images of the batch. You then will want to set up a ruleset called Identify to run at batch level to determine the types of pages and how they should be separated into documents and to drive recognition. Next, you will want to set up a ruleset called Recognize that runs optical character recognition at page level and populates the fields associated with the pages. Also, you need a Validate ruleset to apply validation rules at field level against the data has been extracted.

Compiled rulesets

Datacap includes a collection of preassembled rulesets, called

compiled rulesets

, which are self-contained building blocks of functions that can be easily assembled into an application and configured using FastDoc or Datacap Studio. They add the following benefits:

򐂰 Reduce the expertise needed to create applications.

򐂰 Reduce application complexity by standardizing how core functions are implemented.

򐂰 Reduce the occurrence of nonstandard or poorly designed capabilities.

򐂰 Make applications more consistent and easy to understand and support.

Each compiled ruleset is a full implementation of core Datacap functions and comes complete with its own user interface to display configuration parameters and options.

Compiled rulesets support inheritance and automatic binding to objects of the Datacap document hierarchy (batch, document, page, field).

The rulesets, in their un-compiled form, can be copied and edited using Datacap Studio to be customized and extended. Compiled Rulesets are available for all major functions of Datacap, such as file import, page identification, image enhancement, data extraction, fingerprint matching, and export, and, if needed, additional ones can be developed using Datacap Studio and compiled using a Microsoft Visual Studio template project available in the Datacap Technical Mastery community of IBM

developerWorks:

http://ibm.co/1MwxWxW

Rule

A ruleset groups one or more

rules

, or lower-level processing capabilities, that are bound together to the objects in the application’s Document Hierarchy. They are run on demand when Rulerunner opens or closes objects as it walks through the Document Hierarchy at run time.

For example, if they have been selected and configured as part of the application’s rulesets, the rules of the Enhance ruleset are run every time a page is opened to run deskewing and smear removal.

The rules within a ruleset run only when they are mapped to specific objects of the Document Hierarchy. In addition, they run only when the ruleset they belong to is included in the task profile being run. The execution order of rules in a ruleset is dictated firs, by the order in which the parent ruleset appears in the task profile and second by the processing sequence of the objects in the runtime Document Hierarchy.

Function and action

A rule is made up of one or more

functions

. A functionconsists of one or more

actions

. An action represents the code that runs a particular elemental operation on the objects of a document. A function is started in the order in which it appears in the rule. If an action fails, the function that called it exits unsuccessfully, and the next function in the sequence gets executed. If the action succeeds, the next action in the function gets executed. If all actions of a function run successfully, the rule that called the function exits successfully.

By using this approach, you can construct efficient processing rules without coding. Additional information about actions, including how to create your own

custom action

, is provided in Chapter 13, “Datacap scripting” on page 295.

For example, in a rule that is used to identify the

type

of page (“Page identification” rule), several functions can be assembled in a fallback sequence, from the most to the least processing-intensive or efficient. Each function implements a specific recognition technology. We can set up the rule to call the functions such as these:

򐂰 Identify using fingerprint

򐂰 Identify using text match

򐂰 Identify manually

Manual identification, which is merely flagging the page for a subsequent user-attended task, is called only after fingerprint and text matching fail. If the fingerprint matching function succeeds (all of the actions in it succeed), the “Page identification rule” exits, and the subsequent functions are not run.

Processing of the Document Hierarchy at run time

When a task is invoked, Datacap recursively processes each object in the runtime Document Hierarchy. It starts at batch level and proceeds to open the first document, then the first page within it, then all of the fields on that first page, and then on to the next page, and so on, as shown in Figure 2-4. It repeats this process with the next document. As it processes each object in this manner, it calls the rulesets that are bound to it. Rulesets can be configured to run on opening or on closing the object.

Figure 2-4 Workflow and document hierarchies and processing sequence