Successfully automating record creation and capture

Chapter 5. Records capture, creation, and retrieval

5.1 Why automation is the goal

5.1.1 Successfully automating record creation and capture

Successfully automating record creation and capture for an enterprise requires the use of methods to automatically archive records, and the ability to

automatically identify and classify the record accordingly, as generally, not all records archived will be of a single type of record in the enterprise. When the methods for auto-archiving are configured and defined, and the classification tools have set up and configured, the enterprise can benefit from automated record creation and capture.

Auto-archiving

Successfully automating archiving in an enterprise requires understanding the enterprise’s records, and how or where they exist before capture into IBM Enterprise Records. Based on our experience, many customers have policies to define what a record is, under what conditions a document becomes a record, where it can be held in the organization, and when and how it should be captured into the records management system.

IBM understands records might arrive into the enterprise through many mechanisms including these examples:

򐂰 Hardcopy documents

򐂰 Faxes

򐂰 Multifunction device and printer (MFDs or MFPs)

򐂰 Emails

򐂰 Files on a file system

򐂰 Files in a system of engagement such as IBM Connections or Microsoft SharePoint

򐂰 Office productivity tools, such as Microsoft Office

򐂰 Document Management Systems

򐂰 Enterprise resource planning (ERP) systems such as SAP

򐂰 From mobile devices

By noting the diversity of channels which might be used in an enterprise to receive or generate records, it becomes apparent that multiple methods for auto-archiving records might be required by an enterprise. These tools often remove the requirement for users to nominate when an item should be

auto-archived, but are flexible enough to support use cases which provide for a level of automation which might be based on some user discretion.

The following are some examples of auto-archiving to IBM Enterprise Records:

򐂰 An enterprise might state that not all emails received are records, but those emails which are records must be flagged by a user, or moved to a location to allow them to be automatically archived into IBM Enterprise Records.

򐂰 Alternatively, an enterprise might state that all emails received in the enterprise or in a specific mailbox or set of mailboxes must automatically be archived into IBM Enterprise records in a non-discretionary manner.

IBM supports both of these use case examples, automatically archiving emails into IBM Enterprise Records through use of IBM Content Collector for Email.

򐂰 An enterprise might stipulate items stored on a file system for longer than six months should be captured and managed as records.

IBM supports auto-archiving of data on the file system by referring to data such as date created, or date last modified or accessed into IBM Enterprise Records through use of either IBM StoredIQ Policy Assessment and Compliance or IBM Content Collector for File Systems.

For further information about IBM supported auto-archiving tools, see IBM Value-Based Archiving website at the following web address:

http://www.ibm.com/software/products/en/value-based-archiving

Auto-classifying

Automatically classifying records into IBM Enterprise Records requires the ability to identify what the record is and then to appropriately classify the item according to the File Plan in IBM Enterprise Records. IBM Content Classification can provide this ability by analyzing the full text of documents and emails and applying rules that automate classification decisions.

Classification can also be used to determine whether a content item must be classified or not. By filtering out email about lunch appointments or documents that do not hold any business value, for example, you can reduce costs and ensure that only documents that must be retained are classified and archived. Embedded with natural language processing and semantic analysis capabilities, IBM Content Classification determines the true intent of words and then uses that knowledge to automate decision making. Unlike other classification systems that are based on rules only, IBM Content Classification combines rules and contextual analysis to incorporate real-time learning that adapts to changing business needs. As a result, classification becomes even more accurate over time.

A content classification corpus

To automate the classification of records into IBM Enterprise Records, it is necessary to train IBM Content Classification with your enterprises records and file plan. You do this by providing a predefined corpus of your enterprises records and your enterprise file plan. IBM Content Classification reads the records in each category of the file plan, and creates the association between the category and records which will provide the most accurate results. After the results are tested and refined, a Content Classification knowledge base is created, which can be to identify new records and classify the records appropriately.

It is important to use the reports generated by IBM Content Classification to ensure an optimal result is achievable by the knowledge base. The reports will identify overlapping categories, where the categories and records are too similar and will result in sub-optimal auto-classification results.

Using the classification knowledge base

After the IBM Content Classification knowledge base is published and released, it can be configured to be used within the following:

򐂰 IBM Content Collector: If you use IBM Content Collector, content can be classified by IBM Content Classification when it is captured by IBM Content Collector. IBM Content Classification returns information that helps IBM Content Collector determine the appropriate action to take.

򐂰 IBM Classification Center: You can use the Classification Center to manage the classification of content that is stored in the repository. You use this web application to select the content to be classified, configure classification options (such as the decision plan to use and various run time preferences), monitor classification activity, and view the classification results. If necessary, you can also use the Classification Center to reclassify documents if you determine that different knowledge base categories or decision plan actions are more applicable.

򐂰 Custom applications: IBM Content Classification runs a set of server-side processes on one or more servers, and provides several client libraries for remote access that are designed for various development environments, including:

– C/C++ development – Java development

– Visual Basic (ASP) development – .NET development

Your choice of a client library is based on your application programming language and development environment. You might use more than one client library in the same application. For example, you can have server-side components written in C++ that interact with the system by using the C client. At the same time, Microsoft Internet Information Server IIS) server-side scripts in ASP (VBScript) can communicate with the system by using the COM client, which is easily accessible from VBScript.

The software development kit (SDK) includes everything that is needed to develop applications for IBM Content Classification: client libraries, online reference guides, and sample application

Auto-creation

Application of methods for auto-archiving and auto-classification together provide a proven means for automating the creation and capture of records into IBM Enterprise Records.

5.1.2 The complexities of manual record creation and capture

In document Using IBM Enterprise Records (Page 145-149)