Editing Extractors - 6.1. Data Governance Edition - Classification Module. User Guide

If your rules are not finding the results you require, the extractors you are using may need refining. Re-fining extractors can help pinpoint the patterns you want to match, and reduce false positives. Extrac-tors may be shared across multiple rules and categories, so when editing them, you should understand where they are in use. If you require similar versions of an extractor, copy the <TextExtractor> ele-ment within the XML file, provide a new name and unique ID for it, and then make your adjustele-ments.

In order to export and edit an extractor, it must be referenced in a rule which is associated with a cate-gory. If needed, you can create a dummy rule and category for testing a new extractor you are intro-ducing into your environment. Another way to do this is to work with the original sample extractors in-cluded in Quest One Identity Manager Data Governance Edition - Classification Module, and re-import them. Use caution doing this however, as you may accidentally overwrite changes that have been made to extractors in use in your environment.

A good starting point for understanding extractors is to view the extractors included in Quest One Identity Manager Data Governance Edition. You can see a variety of implementations in the XML. Typi-cally, the file is located in the C:\Program Files\Quest Software\ QCS\Templates folder. Note that for ease of import, the file uses a similar XML structure to a template, but the taxonomy and category tags are just placeholders. You can also export any taxonomy that references these extractors. When you export a taxonomy, the extractor XML is self-contained, and does not use the placeholder tags.

The following table outlines the basic XML structure used to define an extractor:

ELEMENT DESCRIPTION

<Text extractor> Sets the type of extractor (Eduction or RegEx) and the ID.

The ID will be referenced in any rule that uses the extractor.

<Property id> There are a number of different properties you can set on an extractor. Generally, these do not need to be edited.

For an Eduction extractor, you can only match one entity (the eduction#match property id), and may need to use other entities as building blocks to get the desired results. This is done using a custom grammar (the eduction#grammarxml property id).

For a RegEx extractor, the regular expression is defined using the regex#regular-expression property. There are no other elements required for a RegEx extractor.

<grammars> The grammar element allows you to define custom gram-mars. You can combine grammars in an extractor by adding multiple <grammar> tags. Only Eduction extractors use this tag.

To edit an extractors used in a taxonomy

1. Determine the ID of a taxonomy that has a rule that references the extractor you want to edit.

See Finding a Taxonomy or Category ID using PowerShell on page 35 for details.

If you are introducing a new extractor to your environment but want to edit it before using it in your production environment, you can create an unpublished dummy category or taxonomy, create a rule referencing the new extractor and associate it with the category. Export the dum-my taxonodum-my.

2. Run the Export-QTaxonomy cmdlet with the following parameters:

a) ServerAddress

Provide the name of the computer hosting the Data Governance server, and the port. En-ter in the form compuEn-tername:port number. The default port is 8723.

b) TaxonomyId

c) IncludeEntityExtractors Set this to $true.

d) OutputFile

Provide the path to a file to store the template XML.

The taxonomy will be output to the screen if you skip this step.

3. Locate the desired extractor in the XML output.

Extractors are located at the top of the XML file.

4. Edit the XML as desired.

To copy an extractor to make a new version, copy the entire <TextExtractor> element, and paste below the extractor. Make sure you provide a new ID for the extractor.

5. Save the file.

6. To implement the change, run the Import-QTaxonomy cmdlet with the following mandatory pa-rameters:

a) ServerAddress

<grammar name> Names the grammar you are using. This allows you to share entities between extractors. Entities have full path names that include the grammar name. A grammar named “num-ber” may be a reference to a custom entity called “cc/delim”

which details a delimited credit card. The full name of the entity, if referenced from another extractor, is “number/cc/

delim”. Within the same extractor, the full path is not neces-sary.

<entity name> Names the entity that is being built. You can use entities as building blocks for the entity that is referenced in the educ-tion#match property id. You can reference entities from other grammars by using the full path. You may want to edit the entities included in an extractor.

<pattern> Identifies the exact patterns that make up the entity. You can use multiple patterns within an entity. You can use pat-terns that are included with Quest One Identity Manager Data Governance Edition - Classification Module, regular expressions, or a combination. You can also use other enti-ties: this is how you combine patterns and grammars into the single entity that you reference from your educ-tion#match property.

ELEMENT DESCRIPTION

Provide the name of the computer hosting the Data Governance server, and the port. En-ter in the form compuEn-tername:port number. The default port is 8723.

b) TemplateXmlFile

Provide the full path and name of the file containing the template.

4

Working with Categorized

In document 6.1. Data Governance Edition - Classification Module. User Guide (Page 74-77)