• No results found

Pervasive Data Parser for Unstructured Text Online Help - Table of Contents

N/A
N/A
Protected

Academic year: 2021

Share "Pervasive Data Parser for Unstructured Text Online Help - Table of Contents"

Copied!
45
0
0

Loading.... (view fulltext now)

Full text

(1)

Pervasive Data Parser for Unstructured Text Online Help - Table of

Contents

Pervasive DataTools

Data Parser for Unstructured Text User

s Guide

Pervasive Software Inc. 12365 Riata Trace Parkway

Building B Austin, TX 78727 USA

About This Manual

This manual is currently a work in progress and therefore is incomplete. Documentation for the Data Parser for Unstructured Text is also available by clicking on the Help button on the right end of the button bar whenever the product is running.

This manual leads you through the operation of the Data Parser for Unstructured Text user interface. The Pervasive Data Parser for Unstructured Text allows you to extract useful data from report files and convert that data to a CSV Text file. You must have a non-expired Data Parser for Unstructured Text license to run this application.

Refer to the license.txt file in the default installation directory for disclaimers and information about trademarks and credits.

Table of Contents

Getting Started with Data Parser for Unstructured Text

l Introduction to Data Parser for Unstructured Text l Data Extraction Basics

l Feature Segmentation

Tutorials

l About the Tutorials l Tutorial 1 - The Basics

l Tutorial 2 - Tagged Data and Automatic Features l Tutorial 3 - Columnar Data

l Tutorial 4 - Floating Tags

l Tutorial 5 - Columnar Data with a Footer

l Tutorial 6 - Variable Length Multi-Line Data Fields l Tutorial 7 - Multiple Accept Records

Using Data Parser for Unstructured Text

l Introduction to Basic Elements l Some Helpful Tips

l Open a Text File or URI l Extract Tuning Tips

All About Line Styles

l Defining Line Styles l Recognition Rules

l Suggested Approach - Defining Line Styles

All About Data Fields

l Defining Data Fields

(2)

Viewing the Extracted Data

l Internal Data Browser l External Data Browser

Exporting the Extracted Data

l Exporting the Data

Saving and Reusing Extract Scripts

l How to Save an Extract Script l How to ReUse a Saved Extract Script

Reference - User Interface

l Tool Bar Buttons

l Extract Script Manager Window l Extract Script Designer l Source Options Window l Debug Extract Design Window l ACCEPT Record Definition Window l Accept Record Reorder

l Record Browser Window l Multi-Record Browser l Pattern Builder Window l Line Order for Extract Window l All Fields Window

l Edit Fields Window l Export Field Order Window l Find Text

l Pop-up Menu - Line Style Column l Pop-up Menu - Data Panel l Line Style Definition Window l Data Field Definition Window

Appendix

l How to Create a Report File l URI Support

Introduction to Data Parser for Unstructured Text

The Data Parser for Unstructured Text is a software product with the ability to read complex text files of many kinds. The amount of computer data grows vastly each year, and much of it is provided in raw text formats. Some examples of the many sources handled by the Data Parser for Unstructured Text follow:

l Printouts from programs captured as disk files l Reports of any size or dimension

l ASCII or any type of EBCDIC text files l Spooled print files

l Fixed length sequential files l Complex multi-line files

l Downloaded text files (e.g., news retrieval, financial, real estate) l HTML and other structured documents

l Internet text downloads l Email header and body l Online textual databases l CD-ROM textbases l Files with tagged data fields l XML

l HL7 l Swift

l And many others...

Using Data Parser for Unstructured Text, you can extract the desired data fields from various lines in the text file, and assemble those fields into a flat record of data. Thus, whole records of structured data can be extracted and presented in a conventional tabular (row and column) format that is needed before mapping and converting the data to a popular target format. Some of the features that make the Data Parser for Unstructured Text so complete are:

l No practical limits on file size

l Reads almost any kind of report architecture as long as there are rules l Support for large fields and records

l Handles floating headers, footers and details

l Can automatically detect and propose recognition patterns l Handles tagged data fields

(3)

l Autoparses columnar and tagged data l Powerful debugging tools

l Structured data browser to see results prior to export l Built on an extensible, extremely rich scripting language

The extraction of desired fields from the source text file is accomplished by visually marking up the file in the Data Parser for Unstructured Text user interface. The mouse is employed to select the desired fields from various lines displayed on the screen. Dialog boxes on the screen allow you to express a rich set of pattern recognition rules and actions to assist in the extraction of clean data.

Several techniques are available to view samples of extracted data. Apart from scrolling the full text of the data, a debug window can be used to search for all lines satisfying certain extraction criteria. For details, see **Debug Extract Design Window**. In addition, users can pop up a data browser that assembles all the fields and records in a grid format to give the user an idea of how the data will export. For details, see **Record Browser Window**.

Data Extraction Basics

The Data Extractor is a tool for extracting data that would otherwise be inaccessible. Consider these scenarios:

l Your company is attempting to migrate several years’ worth of data from a legacy application. The data files for this application are stored in an unknown

proprietary format, possibly with compressed or encrypted fields. Although the data cannot be accessed directly, your legacy application can generate reports.

l Your agency needs to merge data from several disparate sources into a single, easily accessible format. For example, you receive listings of real-estate

properties from several different electronic sources that you want to combine into one standard listing format for your web site.

l One of your clients needs to extract specific data from many large log files and aggregate that data into a database for statistical analysis.

In each of these scenarios, the Data Extractor can extract valuable data from standard formated text files with lots of irrelevant information, such as headers and comments.

The Data Extractor exports the extracted data to CSV (Comma Separated Values) Text file format. If you want to convert the data to another format, or you want to manipulate the data further after you have extracted it, the Data Loaders can accomplish this. The Data Loaders support over 100 different file types, allowing you to convert your data to the vast majority of databases used throughout the world.

To Use Data Extractor

First you need to have a report file. Most applications on nearly every type of platform give you the option of creating and printing reports. Have the program print the report in a text only format, either ASCII or any standard EBCDIC code page. For more information, see How to Create a Report File.

1. Start a new script in the Data Extractor and select the report file.

2. Look at the report in the Data Extractor. Notice the overall pattern of the report when it repeats, the page layout, and the style used to organize information. Locate the data that you want to extract.

3. Input the structural information. The Data Extractor needs patterns and structural rules to identify important data. a. Define line styles by marking which lines have important information and how they can be recognized. b. Define data fields by marking the data that you want collected, and where it can be found.

c. Specify line actions. While you are defining line styles and data fields, select options that specify how you want the data to be assembled into records and fields. The default action is to collect the fields. You must find the end of the first record, or the beginning of the second, and change the action for that line to Accept Record. This stops the collection process for the first record and begins the collection process for the second, thus setting exactly which fields are included in the eventual output for that record. If you want to define more than one type of record in a single report file, you can do that by defining more than one Accept Record line style.

d. Assign the fields to each record type, according to how you want the data to be exported.

4. Browse your data. Once you have entered all the information Data Extractor needs to find your data, and specified how you want it structured, the Data Extractor automatically builds that structure internally. You can open the data browser and see it in a grid. If the fields or records are not structured the way you want them, go back and adjust the data field and/or line style definitions.

5. Finally, save the script. By saving your script, you can use it again if you need to extract data from a report with the same style in the future. Additional details about each of these steps are described in this documentation.

Feature Segmentation

The following list presents some of the features available in the Data Extractor:

l EBCDIC code page translation

l Recognizes special characters and invisible characters l Multiple record Accepts

l Mailing Label Template autoparse l Validates scripts automatically

l View source data in external applications l Grid lines on Data Panel

l Extract/Mine data from irregular text files l Auto New Line Style menu option l Auto New Data Field menu option

Some additional automatic menu options:

(4)

l Parse Columnar Data w/ Heading l Parse on Field Separator l Parse Tagged Data l Parse Standards Data l Parse XML/HTML Data l Parse HL7 Data l Parse Swift Data l Parse LDIF Data l Parse EDI Data

About the Tutorials

Seven step-by-step tutorials are available to help you learn how to use the Data Extractor. We recommend that you complete the tutorials in the order in which they appear, as each tutorial builds upon the concepts covered in the previous tutorial.

Common Tasks

The Data Extractor tutorials all have several tasks in common. Those tasks are described here, and you may refer to them as needed.

l Select Correct Tutorial File and Set Basic Options l Browse Data Records

l Rearrange Data Fields l Save and Close Extract Design

Select Correct Tutorial File and Set Basic Options

Before starting each tutorial, you must select the matching tutorial file and set some basic options and file properties. Use the following Data Extractor tutorial files with the tutorials:

l Tutorial 1 - TUTOR1.REP l Tutorial 2 - TUTOR1.REP l Tutorial 3 - TUTOR3.REP l Tutorial 4 - TUTOR4.REP l Tutorial 5 - TUTOR5.REP l Tutorial 6 - TUTOR6.REP l Tutorial 7 - TUTOR7.TXT

To select a tutorial file and set basic options do the following: 1. In the Data Extractor, click New Extract.

2. In the Select the Text File window, navigate to the desired tutorial file in your default installation directory (Common800). 3. Click Open. The report opens in the Data Extractor Data Panel.

4. Open the Source Options window.

5. In the Source Options window, select options that match the type and format of your text/report file. 6. Close the Source Options window.

7. Select Preferences from the menu and make sure "Close Definition Dialogs on Add/Update" is enabled.

Browse Data Records

Browse the data when you want to determine how your design choices have affected the data. To browse the data records:

1. Click Browse Data Record in the toolbar. If there is only one Accept record, a message window appears saying something similar to "Fields Assigned to Accept Record Category". If there are multiple Accept Records, you will be prompted to assign specific data fields to each Accept Record.

2. Click OK. All of the Data Fields appear in the Data Browser window for you to preview your data in a tabular (row and column) format and to verify you have defined everything correctly. If you wish, you may rearrange the data fields.

Rearrange Data Fields

If the fields in the Data Browser window are not in the order you want them to appear in the export data file, change the export field order. To rearrange data fields:

1. Select Field > Export Field Layout from the menu.

2. Click and drag a field name to the desired position. A special symbol displays while you are dragging.

3. Reopen the Data Browser window to view your export fields in the order they will appear in the export data file. 4. Once you are satisfied with the appearance of the data, save and close your extract script design.

Save and Close Extract Design

After you have completed your Extractor script, save and close it for later use. To save your script and close Data Extractor:

(5)

1. Save your script by clicking Save Extract in the toolbar. The Save Extract window appears. New extract scripts that have not been previously saved in Data Extractor display as "Extract: Extract1" in the title bar. Note: Notice that the extract file name defaults to your workspace; however, the extension has been changed to .cxl. This is consistent with standard naming for Data Extractor scripts. You can change the Extract File Name, but the extension must remain .cxl. 2. Navigate to your default installation directory (Common 800) and name the extract script file (for example, Tutor1.cxl).

3. Enter a description of the tutorial, if desired, and click OK. 4. Click the Close Extract icon in the toolbar.

5. Exit Data Extractor by selecting File > Exit.

The Data Extractor Tutorials

The following is a list and brief description of each of the Data Extractor tutorials.

l Tutorial 1 - The Basics

l Tutorial 2 - Tagged Data and Automatic Features l Tutorial 3 - Columnar Data

l Tutorial 4 - Floating Tags

l Tutorial 5 - Columnar Data with a Footer

l Tutorial 6 - Variable Length Multi-Line Data Fields l Tutorial 7 - Multiple Accept Records

Data Parser for Unstructured Text - Tutorial 1 - The Basics

Tutorial 1 guides you through the basic steps to create and save a script file in Data Parser for Unstructured Text. Later tutorials are more detailed.

This tutorial presents the fundamental concepts for using the Data Parser for Unstructured Text. It is recommended that you do this tutorial first. The example file is a tagged list, but the procedure is useful regardless of the type of report. The best way to use this tutorial is to print a hard copy so you can follow the sequential steps.

Tutorial Goals

In this tutorial, you will learn:

l The basic process of creating an extract script l How to save the script design

l New terms located throughout the documentation

Procedure

This tutorial is divided into three sections that should be completed in the order shown. Define the Line Style - Accept Record

After selecting the tutorial file and setting up basic options, the first step in defining many extract scripts is to determine the line of data that marks the end of a record. In this case, the line with the string "Category:" is the last line of the first record.

After you identify the end of the record, define the line style for that line by marking the information that makes that line unique. In every record in this data file, the last line contains the string "Category:".

1. Highlight the string "Category:" (including the colon following it).

2. Right-click anywhere in the Data Panel, (the large white area of the screen) and select Define Line Style > New Line Style. The Line Style Definition window appears. Notice Data Parser has already formed line recognition rules based on the information you highlighted. It searches for all lines that contain the string "Category:" in columns 15 through 23.

3. To indicate that Data Parser should accept the record at this point, ending one record and beginning the next, click the Line Action tab. 4. Select ACCEPT Record.

5. Click Add and proceed to Define the Line Style - Collect Fields. The line style name, Category, now appears in the Line Style Column (the yellow column on the left of your screen) to mark that line as matching the Category: Line Style pattern. A bold green arrow displays designating that this is the Accept Record line. Scroll down in the data panel and notice that each line that matches the pattern you defined was automatically marked with the "Category" Line Style.

Define the Line Style - Collect Fields

In the TUTOR1 file, the first line of text that contains pertinent data is the line with the report date "13-Jul-95" (10th line). The dashes (and their positions) in this line make it unique and are likely to remain consistent even if the date changes in later reports.

1. Highlight the first dash.

2. Right-click in the Data Panel and select Define Line Style > New Line Style. The Line Style Definition window appears. Notice that a pattern was created based on what you highlighted. Data Parser looks for any line that contains a dash in column 13.

3. Type a more descriptive Line Style Name, such as "Report_Date".

4. Click the Line Action tab and leave the option set to COLLECT Field Contents. The COLLECT Field Contents option causes any fields defined on this line to be included in the final output. COLLECT Field Contents is the action you want for the majority of the lines in this type of report.

5. Click Add.

6. Locate and highlight the string "Problem No:".

7. Right-click in the Data Panel and select Define Line Style > New Line Style. The Line Style Definition window appears. Notice that Data Parser generated a Line Style recognition pattern based on the highlighted string "Problem No:". Data Parser also used the string "Problem_No:" to name the Line Style. You may rename the Line Style if you wish. Data Parser automatically selects COLLECT Field Contents as the line action. Since this is the option you want on most of the lines in this report, accept the default.

(6)

9. Click Add.

10. Repeat steps 6 through 9 for each of the remaining lines in the first record. Remember, the "Category" Line Style has already been defined, and it is the Accept Record line.

11. Proceed to Define Data Fields. Define Data Fields

After defining line styles for 14 lines of the first record, define the Data Fields. You have given Data Parser the pattern information it needs to identify the lines in the report, now define what part of each line you consider to be useful data.

1. Locate the line containing the date of the report. The line shows only the report date, so all of the text on that line is important.

2. Highlight the entire date. The highlighted text is 1 row by 9 columns. The column and row numbers show at the bottom right part of the screen. Columns 11 through 19 on row 10 contain the date.

3. Right-click in the Data Panel and select Define Data Field - New Data Field. The Field Definition window appears. Notice the Field Definition option is set to Fixed Column in both the Start Rule and End Rule tabs. The Data Field starts in column 11 and ends in column 19, exactly where you highlighted.

4. The Field Name defaults to "Report_Date_1" indicating that this is the first field on the Report_Date line. Change the default name to a more descriptive name, Report_Date, by typing it in the Field Name box.

5. Click Add.

6. Define the remaining Data Fields:

a. On the Problem_No line, highlight from column 25 to 30. This grabs enough space to include any larger numbers that might occur in later records. b. Right-click in the Data Panel and select Define Data Field > New Data Field.

c. The default Field Name is Problem_No_1. Problem_No is a descriptive name, but there is only one field on this line so the "_1" is unnecessary. d. Click in the Field Name box and backspace twice to delete the number and underscore. Notice the Field Definition defaults to Fixed Column in both the

Start Rule and End Rule tabs, starting in column 25 and ending in column 30. e. Click Add.

7. Repeat step 6 for each remaining line of text on page 1 in TUTOR1.REP containing tagged data. See Table 3-2 below. 8. Proceed to Browse Data Record in order to see how your data has changed.

9. Rearrange Data Fields as needed. 10. Save and Close your script.

Table 3-2: Tutorial 1 - Data Field Start and End Rules

Data Parser for Unstructured Text - Tutorial 2 - Tagged Data and

Automatic Features

Tutorial 2 guides you through the steps to create and save a script file using Data Parser’s automatic processes. The source file for this tutorial is the same tagged-list used in Data Parser for Unstructured Text Tutorial 1.

This tutorial introduces some of the useful timesaving features of Data Parser that read and flatten a data file that contains tagged data. It is useful to anyone ready to learn about more advanced Data Parser features. Tutorial 2 examines some quicker, more automatic ways to parse the same tagged-data used in Data Parser Tutorial 1.

Things to remember when defining Data Fields and Line Styles in tagged data:

l When Data Parser automatically creates Data Fields, it uses the positions you have highlighted to determine the length of the Data Field. Be sure and allocate

enough space for data in subsequent records that are wider than the text you are currently selecting. For example, the Techie Name in the first record is "John". In a subsequent record it could be "Alexander Graham Bell IV".

l For tagged data, everything in the selection to the left of the Tag Separator is the Field Tag and everything to the right of the Tag Separator is the Data Field. l When a Line Style is created, it is not just for the line you are working on but also for any line that matches the Line Style definition. This means that when you

create a Line Style that looks for "Techie:" in columns 17 to 24, and there is a Data Field defined for that Line Style in columns 26 to 55, all lines that have "Techie:" in columns 17 to 24 have a Data Field in columns 26 to 55.

Tutorial Goals

In this tutorial, you will learn:

Data Field Starting Column Ending Column

Report_Date 11 19 Problem_No 25 30 Techie 25 52 Status 25 52 MMDDYY 25 32 Time 25 32 Serial_No 25 39 Version 25 52 Customer_Name 25 52 Company_Name 25 52 Phone_No 25 52 Source_Type 25 52 Target_Type 25 52 Category 25 52

(7)

l How to create an extract script using automatic processes l How to save the extract design as a script file

l New terms used throughout the Data Parser documentation

Procedure

These steps should be completed in the order shown. Define Data Fields

After selecting the tutorial file and setting up basic options, the first step in defining most extract scripts is to determine the line of data that marks the end of a record. In the TUTOR1 data file, the line of text that contains "Category:" marks the end of each record.

1. Highlight the line that contains the string "Category:", up to column 45. Check the indicator in the lower right corner of the screen for column locations. 2. Right-click anywhere in the Data Panel (the large white area of the screen) and select Define Data Field > Parse Tagged Data. Note: Data Parser automatically

defines a Line Style with the string "Category:" in columns 15 through 23 as the recognition pattern, and a Line Action of Collect Fields, and names it "Category". It also creates a Data Field that collects any data on that line beginning at column 24, one space after the colon, and going to column 45, and names it

"Category". The field is now defined, and the text turns red on the screen.

3. If you wish to check the Data Field definition, you can double-click on the field itself (the red text) and the Field Definition window opens. Make any necessary changes, then click Update.

4. Proceed to Define the Line Style - Accept Record.

Define the Line Style - Accept Record

Since the Category Line Style is the last line of the record, the Line Action should be Accept Record. When Data Parser creates a line style automatically, it makes the line style Collect Fields, so the line action needs to be changed.

1. Double-click on the Line Style Name, "Category" in this case, in the Line Style Column, the yellow column on the left part of your screen. The Line Style Definition window appears.

2. Click on the Line Action tab and select ACCEPT Record [Including] This Line’s Fields from the list of choices. 3. Click Update.

4. View the data record by clicking on the Browse Data Record button in the button bar. 5. Proceed to Adjust Data Field Definition.

Adjust Data Field Definition

1. Select the entire Problem No line by left clicking on that line in the Line Style Column (the left yellow column).

2. Right-click in the Data Panel (the large white part on the right) and select Define Data Field > Parse Tagged Data. The Line Style pattern that Data Parser automatically creates looks for Problem No: in positions 13 through 23.

3. Double-click on Problem_No if you want to check it. 4. Click Close to close the Line Style Definition window.

5. To display the Field Definition window to view the information for the Problem_No: Data Field that was automatically generated, double-click anywhere in the Data Field where the text is red.

6. Click the End Rule tab. Notice that the end rule is 52. This is larger than the Problem No: Data Field needs to be, because it is defining the size of the Data Field all the way to the right margin of the report.

7. Change the end rule of the Problem_No: field to 30.

8. Click Update. Notice the selected area on the Data panel for the Problem_No: Data Field is much smaller after the update. 9. Proceed to Define the Header Information.

Define the Header Information

For this exercise, assume that the first line of the report contains information you want. 1. Highlight the report name WINTECH on line 8 in positions 11 through 17.

2. Right-click in the Data panel, and select Define Data Field > New Data Field. The Field Definition window appears.

3. The default Data Field Name is highlighted. Since there is no tag on this line, Data Parser used the data itself as the Line Style name and Data Field name. Change the field name to ReportName by typing it in the Field Name box.

4. Click Add.

5. To define the report date Data Field, repeat steps 1 through 4, except highlight from columns 11 to 19 and name the field ReportDate. 6. Proceeed to Update Line Style.

Update Line Style

The purpose of this exercise is to update the automatically generated "Jul95" Line Style to make it more generic for different report dates.

1. To edit the "Jul95" Line Style, double-click on Jul95 in the Line Style Column. The Line Style Definition window diplays. Notice that the Pattern for this Line Style looks for 13-Jul-95 in columns 11 to 19.

2. Size the cells in the grid to view the information better, by following these steps:

a. Position the mouse over the line in the header row of the grid where the column headings are. The mouse pointer becomes a bold vertical bar with arrows pointing to the left and right.

b. Hold down the mouse button and drag the edge of the column to the left or right. c. Release the mouse button when the column is the desired size.

d. If desired, adjust the height the same way using the gray border to the left where the triangle and asterisk are located.

3. To change the pattern to look for a line with any date with the dd-mmm-yy format, click once in the Look For? cell on the first row of the grid where 13-Jul-95 is currently displayed. A down arrow appears on the right side of that cell.

(8)

4. Click on that arrow and the Pattern Builder window appears. 5. TAB to the Value cell, delete the original value, and type a dash (-).

6. Change the values of both the Begin and End cells to 13 by tabbing to them and typing in the correct number.

7. Click OK. Notice that the Look For?, Begin, and End values have changed in the Line Style Definition window to reflect the changes made in the Pattern Builder window.

8. Add a new row to the Line Style Definition grid by clicking in the And/Or cell in the second row. Accept the value default of And. 9. Click in the Search What? cell of the second row and click the down arrow.

10. Select Column Range (m-n) from the displayed list.

11. Select Contains from the list displayed in the Operator cell of the second row.

12. Click on the arrow in the Look For? cell of the second row to display the Pattern Builder window again. 13. to the Value cell, delete the original value, and type in a dash (-).

14. Change the Begin and End values to 17. Be careful to only enter a dash in the Value cell and do not leave any spaces around it. 15. Click OK. The line style definition should now match any line with a dash in position 13 and 17.

16. Click Update to save the changes to the ReportDate Line Style. 17. Proceed to Define Remaining Data Fields and Line Styles. Define Remaining Data Fields and Line Styles

In this exercise, you will Define Data Fields and Line Styles for the Techie, Status, MM/DD/YY, Time, Ser #, Version, Customer Name, Company Name, Phone #, Source Type, and Target Type Tagged Data Fields.

1. Highlight the Field Tag, the Tag Separator, and the data by dragging the mouse with the left mouse button depressed from the beginning of the Tag to the end of the Data Field. Remember to extend out to the right to catch wider data in subsequent records.

2. Right-click in the Data Panel and select Define Data Field > Parse Tagged Data. Data Parser creates a Line Style Definition and a Data Field Definition for you. OR

3. Click the line in the Line Style column to select it. 4. Select Parse Tagged Data.

5. Open the Field Definition window. 6. Adjust settings.

7. Click Update and Close.

Note:Data Parser named the MM/DD/YY, Ser #, and Phone # Data Fields and corresponding Line Styles MMDDYY, Ser, and Phone. Also, Data Fields with embedded spaces are named with the spaces removed. This was done because Field Names can only contain letters, digits and underscores. Scroll down in the Data panel and see how the rest of the data is being defined.

8. Browse the data records to see how your data has changed.

9. If desired, rearrange the data fields as needed to meet your export file requirements. 10. Save and close your script.

Tip: This file can be parsed even more automatically. If you wish to try it, follow these steps: 1. Click the Clear Line Styles icon in the button bar.

2. Highlight all the tagged data lines in the entire first record, beginning with the Problem No line and highlighting all the way down and including the Category line. Be sure to catch all the field tags and data plus some extra space to the right.

3. Right-click in the data panel and select Define Data Field > Parse Tagged Data. The Data Parser creates several new line styles and data fields at once. This method only works in cases of highly structured and consistent data. And it can be a great time saver when conditions are ideal.

Data Parser for Unstructured Text - Tutorial 3 - Columnar Data

Tutorial 3 guides you through the steps to create and save a script file in Data Parser for Unstructured Text that reads and flattens a report containing columnar data. In Tutorial 3, you convert the data in a columnar report file, to a flattened format, using the more automatic features of Data Parser.

This tutorial introduces more of the time-saving features of Extract Schema Designer. Since a great many report formats contain columnar data of some kind, it is highly useful to anyone who wants to use Data Parser.

By following the steps outlined below, you become familiar with both the process of creating an extract script and the terms used throughout the documentation. Unlike the previous tutorials, this file has multiple Accept Record line styles in a single page of the report. The primary data record information is in the table detail lines. Each line is essentially a record. Each of these is an Accept Record line.

Tutorial Goals

In this tutorial, you will learn:

l How to create a script that reads and flattens a report with columnar data l How to use more automatic features of Data Parser

l How to save the script file

Procedure

The following steps should be completed in the order shown: Define Line Styles and Data Fields for Detail Lines

(9)

After selecting the tutorial file and setting up basic options, define line styles and data fields. Data Parser does the following when you complete this task:

l Divides the line into seven Data Fields using spaces as a column separator. The Data Fields are given default field names SALES/ MARKETING_1 through

SALES/MARKETING_7.

l Creates a Line Style for the line. The Line Style that is automatically created has a default Line Name of SALESMARKETING. It identifies all lines in the report

that have the string SALES/MARKETING in positions 1 through 16. To define the line styles and data fields for detail lines:

1. Select the first detail line (it begins with SALES/MARKETING) by clicking in the Line Style column (the narrow yellow stripe on the left) immediately to the left of the line. This highlights the entire line of text.

2. Right-click in the Line Style Column (the yellow part of the screen on the left) and select Parse Columnar Data. 3. From the menu, select Preferences and click once on Close Definition Dialogs on Add/Update to disable the option.

4. To view the definitions of the Data Fields created, double-click on the colored sections of the line. For example, double-clicking on the green numbers 75,249 in the Data Panel brings up the SALES/MARKETING_2 Data Field in the Field Definition window. SALES/MARKETING_2 is the default Data Field name given to the second Data Field in the SALESMARKETING line. It starts in position 20 and ends in position 27. Since it is defined for the Line Style SALESMARKETING, only lines that match that recognition pattern contain this Data Field in positions 20 through 27.

5. Proceed to Change Data Field Names. Change Data Field Names

The Browse Data Record uses the Data Field names as column headings for the Data Fields, so it is a good idea to change the Data Field names for SALES/MARKETING_1 through SALES/MARKETING_7 to more descriptive field names.

To change Data Field names:

1. Double-click on one of the Data Fields in the SALESMARKETING line to open the Field Definition window.

2. In the Field Definition window, highlight the default Field Name and replace it with a corresponding descriptive name. See table 3-3 below. 3. Click Update.

4. To select the next Data Field, click the Field Name arrow to display a drop-down list of Data Fields that have been defined for the current Line Style. 5. Select the next Data Field and continue until you have renamed all the fields. Close the Field Definition window when finished.

6. Proceed to Change Line Style Name and Definition. Table 3-3: Tutorial 3 - Suggested Data Field Names

Change Line Style Name and Definition

To view the new Line Style SALESMARKETING, double-click on the name SALESMARKETING in the Line Style column (the yellow column on the left of your screen). The Line Style Definition window appears.

Notice the SALESMARKETING Line Style is recognized by a pattern where columns 1 to 16 contain the string SALES/MARKETING. To change the Line Style Name and Line Action:

1. In the Line Style Definition window, change the Line Style Name by highlighting SALESMARKETING in the Line Style Name box and replacing it with Detail. 2. Also in the Line Style Definition window, click the Line Action tab and select the ACCEPT Record Including option.

3. Click Update.

4. Proceed to Define Line Recognition Rules. Define Line Recognition Rules

The Detail Line Style only matches lines that have SALES/MARKETING in columns 2 through 16. That is the recognition pattern that Data Parser created

automatically, but it is not the pattern that is needed in this case. The pattern needs to be general enough to match all of the detail lines in the text, but specific enough to match ONLY the detail lines. Update the Line Pattern so that the Line Style match all of the detail lines excluding the TEAM TOTALS line.

Analyze the detail lines to find what makes them unique in comparison to other lines in the text. Things to look for are position of the Data Fields, contents of the Data Fields, anything that is consistent for each of the detail lines but not contained in non-detail lines. For example, the detail lines contain:

l Commas in positions 24, 34 and 75 on every line

l Only letters, white space, and a "/" in columns 2 through 18 l Only digits, white space, and commas in columns 20 through 79 l A digit in position 78

l An upper case letter in each of the first 5 positions

Of all of the above observations, creating a pattern to look for uppercase letters in the first five positions is the best way to go. Here are some reasons why: Default Name Suggested Name

SALES/MARKETING_1 Department SALES/MARKETING_2 Team1 SALES/MARKETING_3 Team2 SALES/MARKETING_4 Team3 SALES/MARKETING_5 Team4 SALES/MARKETING_6 Team5 SALES/MARKETING_7 DepartmentTotal

(10)

l Defining a pattern that would check for commas in positions 24, 34, and 75 requires three pattern lines and probably would not match every detail line in

subsequent reports. Suppose in this same report (created a week later) Team 2 of the Development department went to a pre-paid weeklong class and they only spent 100 dollars on supplies for the class. This means that a comma would not be in position 34 of that detail line so it would not match the Line Style, and the essential data on that line would be lost.

l Defining a pattern to check for letters, white space, and a "/" in columns 2 through 18 would require three pattern lines and would also match the column heading

line.

l Defining a pattern to match lines that contains at least one digit in positions 20 through 79 and do not contain letters or "/" would require three pattern lines and it

would match the detail lines. However, it also matches the Team Totals line.

l Defining a pattern to match lines that contain a digit in position 79 would match the detail lines and the Team Totals line.

To define a pattern that looks for upper case letters in positions 2 through 6:

1. Click the Line Recognition Rules tab in the Line Style Definition window.

2. Click once in the Look For? cell in the first row of the grid and click the down arrow. The Pattern Builder window appears. 3. In the Pattern Builder window, click in the Type cell and click the down arrow to display the allowable values for the Type field. 4. Select character class from the list. This tells Extract Schema Designer what kind of data it needs to match for that line style to be valid. 5. Tab to the Value cell and click the arrow to display the allowable values for the Value field.

6. Select upper case letters from the list. This tells Data Parser the specific data it needs to match for that line style to be valid. 7. Change the value in the Count cell to 5 by highlighting the value there and typing a 5.

8. Change the value of the Begin field to 2 and the End field to 6. This tells Data Parser where to look for the data you specified and how many of that particular data must be found for the line style to match that line.

9. Click OK.

10. Click Update to save the modified line style definition. 11. Proceed to Define Data Fields.

Define Data Fields

In this part of the exercise, you will define the rest of the data in the record, starting with the report title. To define the ReportTitle Data Field:

1. Select the report title ABC CORPORATION BUDGET on line 1 by highlighting it in the Data Panel.

2. Right-click in the Data Panel and select Define Data Field > New Data Field. The Field Definition window appears. 3. Change the default name to ReportTitle.

4. Click Add. Data Parser takes the selected text and Data Field name to automatically define a Data Field named ReportTitle and a line style as well, named ABC_CORPORATION_BUDG.

5. Click Close.

6. Double-click on ABC_CORPORATION_BUDG in the Line Style Column to display the Line Style Definition window. Notice that Data Parser automatically creates a recognition pattern that looks for the literal ABC CORPORATION BUDGET in positions 27 through 48.

7. In the LineStyleName field, type ReportTitle. 8. Click Update and Close.

9. Proceed to Define Line Styles. Define Line Styles

1. Select the report date 10/26/95 on line 2 by highlighting the text with the mouse.

2. Right-click in the Data Panel and select Define Data Field > New Data Field. The Field Definition window appears. 3. Change the default name to ReportDate.

4. Click Add and then Close. Data Parser takes the selected text and enters Data Field name and automatically define a Data Field and a Line Style.

5. Double-click Style1 in the Line Style Column to display the Line Style Definition window. Notice that Data Parser automatically created a recognition pattern that looks for the literal "/" in positions 35 and 38.

6. Rename the Line Style to ReportDate. 7. Click Update and Close.

8. Browse the data Records to see how your data has changed.

9. If desired, rearrange the data fields as needed to meet your export file layout requirements. 10. Save and close your script.

Data Parser for Unstructured Text - Tutorial 4 - Floating Tags

Tutorial 4 guides you through the steps to create and save a script in Data Parser for Unstructured Text that reads and flattens a data file that containing floating tag data in a variable-length ASCII report.

This tutorial is useful to anyone likely to be working with floating tag data. By following the steps outlined below, you become familiar with both the process of creating an extract script and the terms used throughout the documentation.

Tutorial Goals

In this tutorial, you will learn:

l How to create a script that reads and flattens an ASCII report with floating tag data l How to save the script file

l New terms located throughout the documentation

(11)

The steps in this tutorial should be completed in the order shown. Define Line Styles

After selecting the tutorial file and setting up basic options, find the patterns in this file and build recognition patterns (Line Style Definitions) so that Data Parser can identify the lines with data.

The first characteristic of this report to notice is that each data record uses two lines of text. Another important characteristic is that several characters on each line are repeated consistently in the same position. These consistent patterns make it easy for you to build Line Style Definitions.

Data Parser automatically creates a Line Style using ATTDOC in columns 19 through 24 as the Recognition Pattern and ATTDOC as the Line Style Name when you complete this task. Each line of text that matches this Line Style now displays the Line Style Name ATTDOC and a green arrow in the Line Style Column to the left of the text line.

To define Line Styles:

1. Highlight the letters TRN in columns 15 through 17. The letters TRN are in the same position in the first line of every record in the report. We could use the slash ( / ) in the third column or the colon ( : ) in the ninth column or any of several other consistent characters to identify the line, but the TRN is fine. Data Parser needs only one consistent characteristic to identify a line.

2. Right-click in the Data Panel and select Define Line Style > Auto New Line Style > Action - Collect Fields.

3. In the second line of text, highlight the string ATTDOC in columns 19 through 24. These six letters appear in the same position in each second line of every record in the report.

4. Right-click in the Data Panel, and select Define Line Style > Auto New Line Style > Action-Accept Record since this is the last line of every record. 5. Proceed to Define Data Fields.

Define Data Fields

Notice that only the first couple of Data Fields in each of the TRN lines falls within the same columns from record to record. Define these fields first: 1. In the TRN line, highlight the logged date and time data from columns 1 through 13.

2. Right-click in the Data Panel, and select Define Data Field > New Data Field.

3. In the Field Definition window, overwrite the default by typing Log in the Field Name box. 4. Click Add.

5. Highlight the string TRN from columns 15 through 17.

6. Right-click and select Define Data Field .. New Data Field. The Field Definition window appears. 7. Overwrite the default field name by typing Trans_Type in the box.

8. Click Add. The TRN text changes to green all through the report indicating that it is the second data field defined on that line. 9. Highlight the 12-digit number from columns 19 through 30.

10. Right-click and select Define Data Field > New Data Field. The Field Definition window appears. 11. In the Field Definition window, type Trans_No in the Field Name box.

12. Click Add. The numeric text changes to blue in each of the TRN lines within the report indicating that it is the third field defined on that line. 13. Proceed to Change Vertical Positioning.

Change Vertical Positioning

Because the patient and doctor names are different lengths, you cannot use Fixed Position to define the remainder of the Data Fields on the TRN lines. But because all of the fields other than the names have field labels with colons and spaces, Field Tags, you can define those fields as Floating Tag. "Floating" means that the Field Tags are not in the same position on the line in every record. If there were no Field Tags, you could still define the fields using Relative Word Position.

The fourth field starts in the same column in each of the TRN lines so you can define Start Rule as Fixed Position for this field. To change the Vertical Positioning Bar:

1. Click Vertical Positioning Bar.

2. Click at the beginning of the field to confirm that the field does indeed start in the same position in every record.

3. Click Vertical Position Bar again to remove the red line. The End Rule is Floating Tag because TIM:, the tag for the next field, always occurs at the end of this field.

4. Proceed to Set Floating Tags - First Line of Text. Set Floating Tags - First Line of Text

1. Highlight the patient’s name from columns 32 through 47.

2. Right-click and select Define Data Field > New Data Field. The Field Definition window appears. 3. Type Patient in the Field Name box.

4. Click the End Rule tab.

5. Click on the Floating Tag option. Notice that the cursor is now blinking in the box to the right of the option. 6. Type TIM: in the box. This tells Data Parser that this Data Field ends when the TIM: Field Tag is encountered. 7. To prevent truncation, click the End Rule tab and set the Default FldLength to 30 bytes.

8. Click Add. Notice that the patient’s name does not change to colored text in the report. Fields defined as Floating Tag or Relative Word Position do not appear in colored text, nor are they underlined even if you have Underline Fields enabled in the Preferences menu. This is because those field positions are not the same in all records.

9. Highlight the date and time data from columns 54 through 71. 10. Right-click and select Define Data Field > New Data Field. 11. Type Date_Time in the Field Name box.

12. At the Start Rule tab select the Floating Tag option.

(12)

14. Click the End Rule tab and select the Floating Tag option.

15. Type TYP: in the box. This tells Data Parser that this Data Field ends when the TYP: Field Tag is encountered. 16. Click Add.

17. Repeat the task for all except the last field (RATE). 18. Proceed to Set End of Line - First Line of Text. Set End of Line - First Line of Text

The RATE field at the end of the TRN line starts with a Floating Tag, but ends at the end of the line of text. Define this Data Field accordingly. 1. Highlight the rate data from columns 93 through 97.

2. Right-click and select Define Data Field > New Data Field. 3. Type Rate in the Field Name box.

4. On the Start Rule tab, select the Floating Tag option. Type RATE: in the box. 5. Click the End Rule tab and click the End of Line option.

6. Click the Data Collection/Output tab and set the Default FldLength in bytes for the field. 7. Click Add.

8. Proceed to Set Floating Tags - Second Line of Text. Set Floating Tags - Second Line of Text

Look at the ATTDOC line of text in the records. Notice that the Data Fields in this line are also Floating Tag data. Follow these steps to define all the Data Fields except the last field.

1. Highlight the attending doctor number from columns 29 through 34. 2. Right-click and select Define Data Field .. New Data Field. 3. Type Attdoc_No in the Field Name box.

4. On the Start Rule tab, select the Floating Tag option. Type ATTDOC NO: in the box. 5. Click the End Rule tab.

6. Click the Floating Tag option. Type ATTDOC: in the box.

7. Click the Data Collection/Output tab and set the Default FldLength in bytes for the field. 8. Click Add.

9. Repeat the steps (using the appropriate field names and tags) for the remainder of the Data Fields on the ATTDOC line, except the last field (BY). 10. Proceed to Set End of Line - Second Line of Text.

Set End of Line - Second Line of Text

The BY field at the end of the ATTDOC line starts with a Floating Tag, but ends at the end of the line of text just like the RATE field in the first line. So, use the same steps as before, except use End of Line as the End Rule for that field.

1. Highlight the field.

2. Right-click and select Define Data Field .. New Data Field. 3. Name the field.

4. Set the Start Rule.

5. Click the End Rule tab and click the End of Line option.

6. Click the Data Collection/Output tab and det the Default FldLength in bytes for the field. 7. Click Add.

8. Browse the data records to see how your data has changed.

9. If desired, rearrange the data fields as needed to meet the requirements of the export data file. 10. Save and close your script.

Data Parser for Unstructured Text - Tutorial 5 - Columnar Data with a

Footer

Tutorial 5 guides you through the steps to create and save a script file in Data Parser for Unstructured Text that reads and flattens a data file containing both detail lines and a footer line with data to extract.

This tutorial is useful to anyone likely to be working with columnar data with footer. Before doing this tutorial, it is recommended that you do Data Parser Tutorial 3 - Columnar Data first.

By following the steps outlined below, you become familiar with both the process of creating an extract script and the terms used throughout the documentation.

Tutorial Goals

In this tutorial, you will learn:

l How to create a script that reads and flattens a data file with detail and footer lines l How to save the script file

l New terms located throughout the documentation

Procedure

(13)

The primary data record information is in the table detail lines. This data is highly structured in neat consistent columns. Data Parser can build recognition patterns for Line Styles and Data Fields with this type of data automatically, saving you a lot of time and effort.

Define Line Styles and Data Fields

After selecting the tutorial file and setting up basic options, define line styles and data fields.Data Parser automatically creates a Line Style for the line and gives it a default Line Name of SALESMARKETING when you complete this task. Data Parser also automatically parses the line into 7 Data Fields using spaces as a column separator. The Data Fields are given default names of SALESMARKETING_1 through SALESMARKETING_7.

To define line styles and data fields:

1. Select the first detail line (it begins with SALES/MARKETING) by clicking in the Line Style column immediately to the left of the line to highlight the entire line of text.

2. Right-click in the Line Style Column (the yellow stripe on the left part of the screen), and select Parse Columnar Data. 3. Proceed to Change Data Field Names.

Change Data Field Names

Since the Data Field names are used in the Browse Data Record as column headings for the Data Fields, change the Data Field names for SALES/MARKETING_1 through SALES/MARKETING_7 to more descriptive field names.

See the new, and more descriptive, names for the Data Fields in Table 3-4 below. Table 3-4 Tutorial 5 - Suggested Data Field Names

To change Data Field names:

1. In the Preferences menu, disable Close Definition Dialogs on Add/Update by unchecking it.

2. Double-click on one of the Data Fields in the SALESMARKETING line to open the Field Definition window.

3. In the Field Definition window, select the default field name, highlight it, and replace it with the corresponding descriptive name given above. 4. Click Update.

5. To select the next Data Field, click the Field Name arrow and a list of Data Fields that have been defined for the current Line Style is displayed. Select the next Data Field.

6. Name the remaining Data Fields until you have named all the fields. 7. Click Close.

8. Double-click on the name, SALESMARKETING, in the Line Style column on the left of the screen. The Line Style Definition window appears. 9. Type in a new name, Detail.

10. Click Update.

11. Proceed to Define Recognition Patterns. Define Recognition Patterns

The SALESMARKETING Line Style is recognized by a pattern where columns 2 to 16 contain the text SALES/MARKETING. This pattern matches only the first detail line. It needs be general enough to match all of the detail lines in the text, but specific enough to match only the detail lines, not the TEAM TOTALS line. Analyze the detail lines to find what makes them unique in comparison to other lines in the text. Things to look for are position of the Data Fields, contents of the Data Fields, anything that is consistent for each of the detail lines but not contained in non-detail lines. For example, the detail lines contain:

l Commas in positions 24, 34 and 75 on every line. l Only letters, white space, and a / in columns 2 through 18. l Only digits, white space, and commas in columns 20 through 79. l A digit in position 78.

l An upper case letter in each of the first 5 positions.

Of all of the above observations, creating a pattern to look for uppercase letters in the first 5 positions is the best way to go. Here are some reasons why:

l Defining a pattern that checks for commas in positions 24, 34, and 75 would require 3 pattern lines and probably would not match every detail line in subsequent

reports. Suppose in this same report (created a week later) Team 2 of the Development department went to a pre-paid weeklong class and they only spent 100 dollars on supplies for the class. This means that a comma would not be in position 34 of that detail line so it would not match the Line Style.

l Defining a pattern to check for letters, white space, and a / in columns 2 through 18 would require three pattern lines and also matches the column heading line. l Defining a pattern to match lines that contains at least one digit in positions 20 through 79 and does not contain letters or / would require 3 pattern lines. It also

matches the Team Totals line.

l Defining a pattern to match lines that contain a digit in position 79 would match the detail lines and the Team Totals line.

So, the best pattern to use is one that looks for capital letters in columns 2 through 6.

Default Name Suggested Name SALES/MARKETING_1 Department SALES/MARKETING_2 Team1 SALES/MARKETING_3 Team2 SALES/MARKETING_4 Team3 SALES/MARKETING_5 Team4 SALES/MARKETING_6 Team5 SALES/MARKETING_7 DepartmentTotal

(14)

To define a recognition pattern:

1. In the Line Style Definition window, click once in the Look For? cell in the first row of the grid, then click the arrow to display the Pattern Builder window. 2. Change the value of the Type field from literal to character class by clicking in the Type cell, then clicking the arrow to display the allowable values for the Type

field. Select character class from the list.

3. Click in the Value cell, then click the arrow to display the allowable values for the Value field. Select upper case letters from the list.

4. Highlight the value in the Count field and change it to 5. The value of the Begin field should be 2. Change the value of the End field to 6 and click OK. 5. Click in the empty cell in the seecond row of the And/Or column. The string And automatically displays in that cell.

6. Click in the first empty cell in the Search What? Column. Then click on the down arrow and select Column Range (m-n) from the list. 7. Click in the first empty cell in the Operator column. Then click on the down arrow and select Does Not Contain from the list. 8. Click in the first empty cell in the Look For? column. Then click on the down arrow. This opens the Pattern Builder window.

9. If the Value column is not empty, delete the contents of that cell. Press or place the mouse pointer in the cell in the Value column and click once to position the blinking cursor in that cell. Type a capital P.

10. Change the values in the Count, Begin and End cells to 1. 11. Click OK in the Pattern Builder window.

12. Click the Update and Close in the Line Style Definition window. Notice that Detail now appears beside each of the detail lines in the Line Style column, and not next to the Processing Date line.

13. Proceed to Modify Data Fields. Modify Data Fields

To modify the Data Fields in the Detail lines so the Data Parser can extract the data on the last line of the report:

1. Double-click in the Department Data Field, the red text at the beginning of each detail line. The Field Definition window opens. 2. Click on the Data Collection/Output tab, and click on the Array Field option to enable it.

3. Click Update.

4. Click the Data Field Name arrow and choose the next Data Field. 5. Repeat this process for each field in any one line of text.

6. Click Close.

7. Proceed to Define Line Style. Define Line Style

To define the PROCESSING DATE line: 1. Highlight PROCESSING DATE:.

2. Right-click in the Data Panel and select Define Line Style > Auto New Line Style > Accept Record. Data Parser defines the Line Style using the first word of the highlighted text in that position as the recognition pattern and named the Line Style PROCESSING.

3. Proceed to Define Data Field. Define Data Field

To define the Data Field on the PROCESSING_DATE line: 1. Highlight the date from columns 17 through 24.

2. Right-click in the Data Panel and select Define Data Field > New Data Field. 3. When the Field Definition window opens, change the default Field Name to Date. 4. Click Add in the Field Definition window.

5. Browse the data Records to see how your data has changed.

6. Rearrange the data fields as needed to meet the requirements of your export data file. 7. Save and close your script.

Data Parser for Unstructured Text - Tutorial 6 - Variable Length

Multi-Line Data Fields

Tutorial 6 guides you through the steps to create a script that reads and flattens a data file containing data that extends across multiple lines of text and where the end of each record varies.

This tutorial is useful to anyone who has a report with fields that extend across more than one line, or has no consistent end of record line.

By following the steps outlined below, you become familiar with both the process of creating an extract script and the terms used throughout the documentation.

Tutorial Goals

In this tutorial, you will learn:

l How to create a script that reads and flattens a data file with varied record lengths l How to save the script file

l New terms located throughout the documentation

Procedure

The steps in this tutorial should be completed in the order shown

(15)

identify and define the Line Styles in this report. Define Line Styles

After selecting the tutorial file and setting up basic options, define your line styles. 1. In the third line, highlight DATE: from columns 1 through column 5.

2. Right-click with the mouse positioned anywhere in the Data Panel (the white part of the screen) and select Define Line Style > Auto New Line Style > Action-Collect fields. Data Parser automatically defines the Line Style with a Recognition Pattern of DATE: in columns 1 through 5 and name the Line Style DATE. 3. Repeat the same basic procedure in step 2 for each of the following Field Tags in the first record. See Table 3-5 below.

Table 3-5 Tutorial 6-Field Tag Columns

Note: You may also, if you wish, go to the second record and define the LEGAL DESCRIPTION: field in columns 1-18. 4. In the 23rd line of the report use the mouse to highlight UNIT PRICE: from column 1 through column 11.

5. Right-click in the Data Panel and select Define Line Style > Auto New Line Style > Action-Accept Record. Data Parser automatically defines the Line Style with a Recognition Pattern of UNIT PRICE: in columns 1 through 11 and name the Line Style UNIT_PRICE. To verify that the Line Style Definitions match the appropriate lines of text throughout the report, scroll down and see that each of the lines that contain a Field Tag has the corresponding Line Style Name in the Line Style Column to the left of the text line.

6. Proceed to Define Data Fields. Define Data Fields

1. Highlight November 12, 1993 from columns 26 through column 42.

2. Right-click and select Define Data Field > New Data Field. The Field Definition window appears. The Field Name defaults to DATE_1. 3. Type DATE to overwrite the default or click in the Field Name box and backspace over the _1.

4. Click Add.

5. Highlight Jeff County from columns 26 through 36.

6. Right-click and select Define Data Field > New Data Field. The Field Name defaults to RECORDATION_1. 7. Type in something else if you wish or click with the mouse and backspace twice to remove _1.

8. Click Add.

9. Proceed to Set Continuation Rule. Set Continuation Rule

Notice that some of the data you want to extract resides within a single line of text in one record but continues across multiple lines of text in other records. For example, the data in the CONSIDERATION field in the first record is on a single line of text, but in the second record, the data in the CONSIDERATION field continues across nine lines of text. This is easily defined using the Data Parser feature called Continuation Rule.

1. Highlight $333,000 Cash from columns 26 through 38. 2. Right-click and select Define Data Field > New Data Field. 3. Change the field name, if you wish.

4. Click the End Rule tab and select the End of Line option.

5. Click the Continuation Rule tab and select the Until Next Line Style option. There is one extra step necessary for fields that are not fixed in length, which is setting the Default FldLength to prevent data truncation.

6. Set the Default FldLength to 500 bytes on either the End Rule tab or the Data Collection/Output tab. 7. Click Add.

8. Ensure that Data Parser is picking up all the data by clicking Browse Data Record, and widening the CONSIDERATION column. 9. Define all of the remaining Data Fields.

10. Browse the data records to see how your data has changed.

11. Rearrange the data fields as needed to meet the requirements of the export data file. 12. Save and close your script.

Data Parser for Unstructured Text - Tutorial 7 - Multiple Accept

Records

Tutorial 7 guides you through the steps to create and save an extract script file in Data Parser that uses multiple Accept Records.

You parse the data in a report file, TUTOR7 (supplied during installation), into a format suitable for exporting. By following the steps outlined below, you become familiar with both the process of creating an extract script and the terms used throughout the documentation.

Tutorial Goals

In this tutorial, you will learn:

Field Tag Beginning Column Ending Column RECORDATION 1 12 CONSIDERATION 1 14 SITE DIMENSIONS 1 16 SITE AREA 1 10 ZONING 1 7 REMARKS 1 8

(16)

l How to create a script that parses a report file into a multiple record file l How to use multiple Accept Records in your script

l How to save the script file

l Terms used throughout the documentation

Procedure

The steps in this tutorial should be completed in the order shown. Define a Line Style and Data Fields

After selecting the tutorial file and setting up basic options, begin creating line styles for the first Accept Record. To create a Line Style and Data Fields for these lines:

1. Select the first detail line (Parmer Lane Animal Hospital) by clicking in the Line Style column immediately to the left of the line. This highlights the entire line of text.

2. Right-click anywhere in the Line Style column and select New Line Style. The Line Style Name defaults to Parmer_Lane_Animal_H. 3. Rename it HospitalLine.

4. Click Add.

5. Highlight Parmer Lane Animal Hospital in the Data Panel.

6. Right-click in the Data Panel and select Define Data Field > New Data Field. The Field Name defaults to HospitalLine_1. 7. Change the name to Hospital.

8. Click Add.

9. Right-click in the Line Style column to the left of April 1, 1999, and select New Line Style.

10. Change the Line Style Name to ReportDateLine and click Add. The data on this line is always centered beneath the HospitalLine. Depending on what month and day the report is run on, the data field may be longer or shorter than the current date.

11. To make sure that your data field is wide enough, highlight the data from positions 31 through 57. 12. Right-click in the Data Panel and select Define Data Field > New Data Field.

13. Change the field name to ReportDate. 14. Click the Data Collection/Output tab.

15. Make sure that the Trim Leading and Trailing Spaces box under Other Collection Options is checked.

16. Click Add. The first repeating Line Style that we want Extract Schema Designer to find is the Account Number.

17. Click in the Line Style column to the left of 1101-01, then right-click anywhere in the Line Style column and select New Line Style. 18. The Line Style Name defaults to STYLE1. Change it to AccountLine.

19. Proceed to Define Line Recognition Rules. Define Line Recognition Rules

Notice the entry under the Look For? column. Its default is in position 5. While this catches all pertinent lines in our example, it might not catch all instances in a larger record example.

To update the Line Recognition Rules:

1. Click in the first cell under Look For?, then click the arrow.

2. In the Pattern Builder window, click in the first cell under Type and select Mask from the list.

3. Click in the first cell under Value. Keep the hyphen and type a pound sign (#) for each numeral, for example ####-##. This tells Data Parser that there are four digits followed by a hyphen, and then two more digits.

4. Change the Begin position from 5 to 1. 5. Change the End position from 5 to 7. 6. Click OK.

7. Click Add.

8. Highlight the account number (1101-01) and right-click in the Data Panel. 9. Select Define Data Field > New Data Field. A Field Definition window appears. 10. Change the Field Name from AccountLine_1 to AccountNo.

11. Click Add.

12. Before you continue, select Source Options from the tool bar and select the Flush Field Contents on Accept Default box under the Script Design Choices tab. This flushes the data from the remaining fields in your report, unless you manually change a specific field to propagate the data.

13. Click OK.

14. Proceed to Define Line Styles and Data Fields. Define Line Styles and Data Fields

1. Select the first detail line under the first account number.

2. Right-click in the Line Style column to the left of Robertson and select New Line Style. A Line Style Definition window appears. 3. Rename the Line Style from Robertson to LastNameLine.

4. Click the Recognized By arrow and select Relative Position. Note: The information under Line Recognition Rules changes. The default Base Line is AccountLine.

5. If it is not, click in the first cell under Base Line and select Account Line from the drop-down list. The default Line Count from Account Line is 1. However, there is a blank line between the AccountLine and the LastNameLine.

6. Change the count from 1 to 2. 7. Click Add.

8. Highlight Robertson and continue out to position 35, in case someone further in the file has a very long last name. 9. Right-click in the Data Panel and select Define Data Field > New Data Field. A Field Definition window appears. 10. Change LastNameLine_1 to LastName and click Add.

References

Related documents