IBM SmartCloud Analytics - Log Analysis Version 1.1. Extending IBM SmartCloud Analytics - Log Analysis

(1)

IBM SmartCloud Analytics - Log Analysis

Version 1.1

Extending IBM SmartCloud Analytics

-Log Analysis

(2)

(3)

IBM SmartCloud Analytics - Log Analysis

Version 1.1

-Log Analysis

(4)

Note

Before using this information and the product it supports, read the information in “Notices” on page 103.

Edition notice

This edition applies to IBM SmartCloud Analytics - Log Analysis and to all subsequent releases and modifications until otherwise indicated in new editions.

References in content to IBM products, software, programs, services or associated technologies do not imply that they will be available in all countries in which IBM operates. Content, including any plans contained in content, may change at any time at IBM's sole discretion, based on market opportunities or other factors, and is not intended to be a commitment to future content, including product or feature availability, in any way. Statements regarding IBM's future direction or intent are subject to change or withdrawal without notice and represent goals and objectives only. Please refer to the developerWorks terms of use for more information.

(5)

Notices . . . 103

(6)

(7)

About this publication

This guide contains information about how to use IBM SmartCloud Analytics - Log Analysis.

Audience

This publication is for users of the IBM SmartCloud Analytics - Log Analysis product.

Publications

This section provides information about the IBM SmartCloud Analytics - Log Analysis publications. It describes how to access and order publications.

Accessing terminology online

The IBM Terminology Web site consolidates the terminology from IBM product libraries in one convenient location. You can access the Terminology Web site at the following Web address:

http://www.ibm.com/software/globalization/terminology.

Accessibility

Accessibility features help users with a physical disability, such as restricted mobility or limited vision, to use software products successfully. In this release, the IBM SmartCloud Analytics - Log Analysis user interface does not meet all

accessibility requirements.

Accessibility features

This information center, and its related publications, are accessibility-enabled. To meet this requirement the user documentation in this information center is provided in HTML and PDF format and descriptive text is provided for all documentation images.

Related accessibility information

You can view the publications for IBM SmartCloud Analytics - Log Analysis in Adobe Portable Document Format (PDF) using the Adobe Reader.

IBM and accessibility

For more information about the commitment that IBM®has to accessibility, see the IBM Human Ability and Accessibility Center. The IBM Human Ability and

Accessibility Center is at the following web address: http://www.ibm.com/able (opens in a new browser window or tab)

Tivoli technical training

For Tivoli®technical training information, refer to the following IBM Tivoli Education Web site at http://www.ibm.com/software/tivoli/education.

(8)

Providing feedback

We appreciate your comments and ask you to submit your feedback to the IBM SmartCloud Analytics - Log Analysis community.

Conventions used in this publication

This publication uses several conventions for special terms and actions, operating system-dependent commands and paths, and margin graphics.

Typeface conventions

This publication uses the following typeface conventions: Bold

v _{Lowercase commands and mixed case commands that are otherwise} difficult to distinguish from surrounding text

v _{Interface controls (check boxes, push buttons, radio buttons, spin} buttons, fields, folders, icons, list boxes, items inside list boxes,

multicolumn lists, containers, menu choices, menu names, tabs, property sheets), labels (such asTip:, andOperating system considerations:) v _{Keywords and parameters in text}

Italic

v _{Citations (examples: titles of publications, diskettes, and CDs} v _{Words defined in text (example: a nonswitched line is called a}

point-to-point line)

v _{Emphasis of words and letters (words as words example: "Use the word} thatto introduce a restrictive clause."; letters as letters example: "The LUN address must start with the letterL.")

v _{New terms in text (except in a definition list): a}_view_{is a frame in a} workspace that contains data.

v _{Variables and values you must provide: ... where}_myname_{represents....}

Monospace

v _{Examples and code examples}

v _{File names, programming keywords, and other elements that are difficult} to distinguish from surrounding text

v _{Message text and prompts addressed to the user} v _{Text that the user must type}

(9)

Extending IBM SmartCloud Analytics - Log Analysis

This section describes how to extend the features of IBM SmartCloud Analytics -Log Analysis using the guidance and tools provided.

Overview

This section provides an overview of IBM SmartCloud Analytics - Log Analysis and outlines how you can extend IBM SmartCloud Analytics using tools and techniques outlined in this guide. You can extend IBM SmartCloud Analytics to ingest new log data and to develop custom applications to visualize the indexed data.

You can extend IBM SmartCloud Analytics to ingest new log data and to develop custom applications to visualize the indexed data. A set of related artifacts to ingest data or to develop applications will be packaged together as an installable package called an Insight Pack.

The information contained in this section is intended for developers who want to understand how to the extend IBM SmartCloud Analytics - Log Analysis to provide support for a new log file source, modify support for an existing log source, or to develop a custom application. An Insight Pack is a set of artifacts packaged together to allow IBM SmartCloud Analytics - Log Analysis to ingest data or used to develop custom applications. An Insight Pack contains a complete set of artifacts required to process a log source. You can install, uninstall, or upgrade an Insight Pack as a stand-alone package. The Insight Pack defines: v _{The type of log data that is to be consumed.}

v _{How data is annotated. The data is annotated to highlight relevant information.} v _{How the annotated data is indexed. The indexing process allows you to}

manipulate search results for better problem determination and diagnostics. v _{How to render the data in a chart.}

IBM SmartCloud Analytics - Log Analysis includes the Insight Packs: WebSphere Application Server Insight Pack

This Insight Pack includes support for ingesting and performing metadata searches against the following WebSphere Application Server V7 and V8 log files

DB2 Insight Pack

The Insight Pack includes support for ingesting and performing metadata searches against the DB2 version 9.7 and 10.1 db2diag.logfiles.

Generic Annotator Insight Pack

This Insight Pack is not specific to any particular log data type. It can be used to annotate tokens so that you can analyze log files for which a log-specific Insight Pack is not available.

Workflow for creating an Insight Pack

(10)

Before you begin

Create a Log Source using the IBM SmartCloud Analytics - Log Analysis Generic Annotation to determine whether the default annotations provided by IBM SmartCloud Analytics - Log Analysis are sufficient to process your log file data. If the results are not sufficient for your requirements, you can develop an Insight Pack for your log file type by completing these steps:

Procedure

1. Acquire a representative sample of log files. Choose log files with as many different log record patterns as possible.

2. If you are using the IBM Tivoli Monitoring Log File Agent to push data to IBM SmartCloud Analytics - Log Analysis, create IBM Tivoli Monitoring Log File Agent configuration artifacts for the new data source.

3. Identify the log file record boundaries, patterns, and so on.

4. Identify fields for annotation within logical record patterns.

5. Use the Insight Pack tools to:

a. Create and test Annotation Query Language (AQL) rules to split log file records and extract relevant pieces of data that you want to index.

b. (Optional), Create custom logic to perform the split and annotate functions.

c. Develop the index configuration which describes the characteristics of fields to be indexed.

d. Create the administrative configuration artifact definitions that are installed with the Insight Pack.

e. Generate the Insight Pack for testing.

6. Use IBM SmartCloud Analytics - Log Analysis to test that log records, from the log file type, are split, annotated, and indexed correctly.

7. Validate that the data is split, annotated, and indexed and perform some searches on the indexed fields to verify the results.

Prerequisite knowledge

To create an IBM SmartCloud Analytics - Log Analysis Insight Pack, you must have knowledge and experience in a number of areas. This topic describes the prerequisite skills and knowledge required to develop an Insight Pack.

Before you begin, ensure that you understand the use and workflows for IBM SmartCloud Analytics - Log Analysis. In particular, ensure that you understand how to:

v _{Configure IBM SmartCloud Analytics - Log Analysis using the Administrative} Settings User Interface.

v _{Configure the IBM Tivoli Monitoring Log File Agent, including understanding} how to create regular expressions to control the log file records sent to IBM SmartCloud Analytics - Log Analysis. Alternatively, configure the REST data collector client.

In addition to these topics, knowledge of one or more of these might be required: v _{IBM InfoSphere BigInsights 2.0 tools for Eclipse}

v _{Annotation Query Language (AQL)} v _{Java Script Object Notation (JSON)} v _{Java Database Connectivity (JDBC)}

(11)

v _{Structured Query Language (SQL)} v _Java

v _Python

v _{Regular expressions}

Note: You can use Java or Python as alternatives to AQL.

Overview of IBM SmartCloud Analytics - Log Analysis extension

options

This section describes how data is consumed by IBM SmartCloud Analytics - Log Analysis, the processes that are used to consume the data, and the aspects of those processes that can be customized to create an Insight Pack.

Figure 1 illustrates the flow of data in IBM SmartCloud Analytics - Log Analysis and outlines the extension interfaces that you can use to develop an Insight Pack.

Data can only be processed after it has first been consumed by IBM SmartCloud Analytics - Log Analysis. Data can be consumed using one of:

v _{IBM Tivoli Monitoring Log File Agent}

v _{IBM SmartCloud Analytics - Log Analysis Data collector client}

Note: The WebSpere Insight Pack, installed when you install IBM SmartCloud Analytics - Log Analysis is used to illustrate the topics in this guide.

(12)

IBM Tivoli Monitoring Log File Agent

The IBM Tivoli Monitoring Log File Agent reads log records and sends them using an Event Integration Facility (EIF) event to the IBM SmartCloud Analytics - Log Analysis server. You can use multiple remote IBM Tivoli Monitoring Log File Agents to send data to an EIF receiver running on the same machine as the IBM SmartCloud Analytics - Log Analysis server or you can consume data remotely using the IBM Tivoli Monitoring Log File Agent that is installed and running on the same machine as the IBM SmartCloud Analytics - Log Analysis server. In each scenario, the IBM Tivoli Monitoring Log File Agent formats the log data as an EIF event and sends it to the EIF Receiver. The EIF Receiver forwards this event to the to Data collector on the IBM SmartCloud Analytics - Log Analysis server.

You must configure the IBM Tivoli Monitoring Log File Agent EIF record format to include the data required by the IBM SmartCloud Analytics - Log Analysis server.

IBM SmartCloud Analytics - Log Analysis Data collector client

The Data collector is a client application that reads log records and sends them directly to the Data collector REST API provided by the IBM SmartCloud Analytics - Log Analysis server. You can use the Data collector client application that is provided when you install IBM SmartCloud Analytics - Log Analysis. This application reads a log file and sends the data to the IBM SmartCloud Analytics -Log Analysis server in multiple batches. The batch size can be configured to meet your requirements. You can also create your own client that invokes the Data collector REST API.

Annotation

As data is passed to IBM SmartCloud Analytics - Log Analysis for processing, it is annotated to extract information based on rules or other custom logic that has been specifically developed for the log Source Type. After the key information is

extracted, the log record data is indexed using configuration attributes that you have provided. These attributes indicate to the indexer how the data can be used in subsequent retrievals. After the data has been indexed, you can then search the data to gain more insight into the data for better problem determination.

Customizing artifacts

There are a number of interfaces in IBM SmartCloud Analytics - Log Analysis for which you can create Insight Pack artifacts to provide support for a new log Source Type.

Adding data

There are two ways in which data can be consumed by IBM SmartCloud Analytics - Log Analysis:

IBM Tivoli Monitoring Log File Agent

When you are using the IBM Tivoli Monitoring Log File Agent to push log file data into IBM SmartCloud Analytics - Log Analysis, configuration files are required to format the Event Integration Facility (EIF) record sent to the EIF Receiver. These configuration files ensure that all of the data required by IBM SmartCloud Analytics - Log Analysis is present. The IBM Tivoli Monitoring Log File Agent format configuration can, if required, include a more restrictive expression that selectively passes log record data on to the

(13)

EIF Receiver component. Include the default configurations for the IBM Tivoli Monitoring Log File Agent in your Insight Pack .

Annotation

Annotation is the extraction of key pieces of data from unstructured or

semi-structured input text. When you develop annotations for IBM SmartCloud Analytics - Log Analysis, you can use Annotation Query Language (AQL) rules, or custom Java or Python logic.

Split/Annotate

There are two steps to the annotation process, split and annotate. During the split stage, specific logic, that is comprised of rules or custom logic, is invoked to determine the logical beginning and end of an input data record. For example, if the logic is written to split log records by

timestamp, then all physical records without a timestamp which follow the first physical record with a timestamp are considered part of the current logical record until the presence of the next timestamp is detected. After a complete logical record has been established, it is forwarded on to the annotate stage where additional logic is executed. This additional logic annotates or extracts the key pieces of information that are to be indexed. The fields annotated and subsequently indexed are those that provide the most insight for searches or other higher-level operations performed on the indexed data.

AQL Annotation Query Language (AQL) rules can be used to split input data records based on some known boundary and also used to annotate data from each record so that the records can be indexed. AQL rules included in an Insight Pack are installed into the IBM SmartCloud Analytics - Log Analysis server when the Insight Pack is installed. Tools are provided to assist you with the development of AQL rules.

Custom

You can write custom logic, in Java or Python script, to perform the split and annotate functions. This is useful when you do not want to use or write AQL rules. You can include custom logic in an Insight Pack. none You can choose to exclude split and annotation logic from your Insight

Pack. If you choose this option, any data records processed by Collections defined in the Insight Pack are indexed based on the indexing

configuration only. In this case, only free form searches can be performed on the indexed data records.

Index configuration

To allow the fields extracted by the annotation logic to be indexed by IBM

SmartCloud Analytics - Log Analysis, you must supply an indexing configuration. The index configuration determines what is indexed, and how indexed data can be used in subsequent retrievals. After the data has been indexed, you can perform searches and other higher-level operations to gain greater insight into the data for better problem determination. Tools are provided to allow you to develop an indexing configuration.

Administrative configuration

IBM SmartCloud Analytics - Log Analysis provides a REST API to allow you to create configuration artifacts. As an Insight Pack developer, you can include definitions for various configuration artifacts such as Source Types, Collections,

(14)

Rule Sets and so on. These artifacts are created when the content Insight Pack is installed. Tools are available to assist you with creating the configuration artifacts.

Custom annotations and splitters

To control how the system processes incoming log file records, you can define custom annotations and splitters for your Insight Pack.

Before IBM SmartCloud Analytics - Log Analysis indexes any data, it can split and annotate the incoming log file records. You can use either the Annotation Query Language (AQL) rules or custom logic implemented using technologies such as Java™or Python.

Splitting

Splitting describes how IBM SmartCloud Analytics - Log Analysis separates physical log file records into logical records using a logical boundary such as time stamp or a new line. For example, when a timestamp is used as the logical boundary, all records after the beginning of the first detected timestamp are included in the logical record. The beginning of the next timestamp is used to end the logical record and to start the next logical record.

The logic used by a splitter to determine how to manage incoming data records must adhere to a schema that is required by IBM SmartCloud Analytics - Log Analysis. This is true for both AQL and custom logic splitters. Splitter logic is used to process batches of records when a complete set of logical log records might not be included in a record batch. The splitter must process partial records that can occur at the start of the batch as well as at the end of the batch.

A splitter must distinguish between incoming data records that form a complete log record from records that it must buffer to be marked as complete when additional records are added. It also must identify records that can be discarded, for example, records that the splitter determines are not going to be part of complete log records. The splitter logic can process a batch of incoming records and must split them on the defined boundary. It returns split records with a type that indicates to IBM SmartCloud Analytics - Log Analysis how each record is handled.

The general schema that is returned by the splitter contains the following attributes:

Log text

The text that is contained in the log record after it is split. Timestamp

The timestamp, if there is one, that is associated with the log record. Type The type is a single character, A, B, or C, that indicates the type of this log

record. The possible types are as follows:

v _{A: indicates a complete log record. The splitter logic determines that the} associated record is complete. The record can be sent to the annotation and indexing processes. For example, in this example, the first record is a type A record and the second is of type B. This is because the second record indicates to the splitter that the first record is complete:

(15)

[9/21/12 14:31:13:117 GMT+05:30] 0000003e InternalGener I DSRA8203I: Database product name : D2/LINUXX8664 [9/21/12 14:31:13:119 GMT+05:30] 0000003e InternalGener I

DSRA8204I: Database product version : SQL09070

v _{B: indicates that there is a partial log record at the end of the set. For} example, the splitter detects the start of a new logical record but cannot determine if it is complete because the splitter cannot find the next logical record boundary that indicates the start of the next record. The splitter marks the record as type B to indicate to the IBM SmartCloud Analytics - Log Analysis server that this record is a partial record and it must be buffered until more incoming records are received to allow it to complete the logical record. The IBM SmartCloud Analytics - Log

Analysis server sends all type A log records for annotation and indexing. It buffers type B records. The buffered type B records are then prefixed to the next batch of input that is sent to the splitter when it receives more input records. For example:

[9/21/12 14:31:27:882 GMT+05:30] 00000051 servlet

E com.ibm.ws.webcontainer.servlet.ServletWrapper service SRVE0068E: Uncaught exception created in one of the

service methods of the servlet TradeAppServlet in application DayTrader2-EE5. Exception created :

javax.servlet.ServletException: TradeServletAction.doLogout (...)exception logging out user uid:1

at org.apache.geronimo.samples.daytrader.web .TradeServletAction.doLogout(TradeServletAction.java:458) at org.apache.geronimo.samples.daytrader.web .TradeAppServlet.performTask(TradeAppServlet.java:169) at org.apache.geronimo.samples.daytrader .web.TradeAppServlet.doGet(TradeAppServlet.java:78)

v _{C: indicates that the text can be discarded. The IBM SmartCloud} Analytics - Log Analysis server discards this text. This type of record is not sent for annotation and indexing. It is not buffered. You must define the splitter so that it only marks text as type C if it is certain that it is not part of a log record that is not complete. For example, a partial log record is detected at the beginning of a batch of records. Then, a complete but unrelated logical log record is found. IBM SmartCloud Analytics - Log Analysis can never complete the partial record that was detected first. The record must be marked as type C and discarded. For example:

************ Start Display Current Environment ************

WebSphere Platform 7.0.0.0 [ND 7.0.0.0 r0835.03] running with process name cldftp48Node01Cell\cldftp48Node01\server1 and process id 28811 Host Operating System is Linux, version 2.6.18-194.el5

Java version = 1.6.0, Java Compiler = j9jit24, Java VM name = IBM J9 VM

Annotating

After the log records are split, the logical records are sent to the annotation engine. The engine uses rules that are written in AQL or custom logic that is written in Java or Python to extract important pieces of information that are sent to the indexing engine. IBM SmartCloud Analytics - Log Analysis represents the results from the annotation process in a Java Script Object Notation (JSON) data structure called annotations. The annotations JSON structure is part of a larger structure which also contains the original log record text (the content key) and the metadata passed into the Data Collector API (the metadata key). You can reference the annotations structure to access the actual values from the annotation result.

(16)

For more information, see the example. You can reference the annotation results in thesource.pathsattributes that are contained in the field definitions in the indexing configuration. You use dot notation to indicate where the values of the fields that are indexed are located in the annotations structure.

For example, the annotation engine in IBM SmartCloud Analytics generates the following JSON structure when it processes an AQL rule set against an incoming logical log record:

{ "annotations" : { "annotatorCommon_EventTypeOutput" : [ { "field_type" : "EventTypeWS", "span" : { "begin" : 57, "end" : 58, "text" : "E" }, "text" : "E" } ], "annotatorCommon_LogTimestamp" : [ { "span" : { "begin" : 1, "end" : 32, "text" : "03/24/13 07:16:28:103 GMT+05:30" } } ], "annotatorCommon_MsgIdOutput" : [ { "field_type" : "MsgId", "span" : { "begin" : 59, "end" : 68, "text" : "DSRA1120E" }, "text" : "DSRA1120E" } ], "annotatorCommon_ShortnameOutput" : [ { "field_type" : "ShortnameWS", "span" : { "begin" : 43, "end" : 56, "text" : "TraceResponse" }, "text" : "TraceResponse" } ], "annotatorCommon_ThreadIDOutput" : [ { "field_type" : "ThreadIDWS", "span" : { "begin" : 34, "end" : 42, "text" : "00000010" }, "text" : "00000010" } ], "annotatorCommon_msgText" : [ { "fullMsg" : { "begin" : 59, "end" : 167,

"text" : "DSRA1120E: Application did not explicitly close all handles to this Connection. Connection cannot be pooled."

},

"span" : { "begin" : 70, "end" : 167,

"text" : "Application did not explicitly close all handles to this Connection. Connection cannot be pooled."

} } ] },

(17)

"content" :

{ "span" : { "begin" : 1, "end" : 169,

"text" :

"[03/24/13 07:16:28:103 GMT+05:30] 00000010 TraceResponse

E DSRA1120E: Application did not explicitly close all handles to this Connection. Connection cannot be pooled.\n"

},

"text" : "[03/24/13 07:16:28:103 GMT+05:30] 00000010 TraceResponse E DSRA1120E: Application did not explicitly close all handles to this Connection. Connection cannot be pooled.\n"

}, "metadata" : { "batchsize" : "506", "flush" : true, "hostname" : "mylogfilehost", "inputType" : "logs", "logpath" : "/data/unityadm/IBM/LogAnalyticsWorkgroup/logsources/was/ SystemOut.log",

"logsource" : "WAS system out", "regex_class" : "AllRecords",

"timestamp" : "03/24/13 07:16:28:103 GMT+05:30", "type" : "A"

} }

In the example, there are three main sections or keys that are defined in the JSON data structure:

v _{Annotations: provide access to the annotation results that are created by the} annotations engine when it processes an incoming log record according to AQL rules or custom logic.

v _{Content: provides access to the raw logical log record.}

v _{Metadata: provides access to some of the metadata that describes the file that the} log record was obtained from. For example, the host name or log source. In general, the metadata section contains any name/value pairs sent to the IBM SmartCloud Analytics - Log Analysis server from a client along with the log data.

When you create the indexing configuration, you can set the value of the

sourcepaths attribute for each field to a dot notation reference to an attribute within the input JSON data structure.

For example, to specify the text value for the annotated field MsgIdfrom the previous example, use the following dot notation reference that references the actual value DSRA1120E:

annotations.annotatorCommon_MsgIdOutput.text The following reference produces the same result: annotations.annotatorCommon_MsgIdOutput.span.text

In a similar manner, you can use dot notation references to the content and metadata keys for thesourcepathsattribute value of each field to be indexed. For example:

content.text metadata.hostname

For more information about indexing configuration, see the topic about index configuration, see Indexing configuration.

(18)

Custom Annotated Query Language (AQL) rules

You can define custom rules for splitting and annotating log records in AQL. AQL is similar to Structured Query Language (SQL) where data generated by executing AQL statements is stored in tuples. A collection of tuples generated for a given statement forms a view which is the basic AQL data model. All tuples for a given view must have the same schema.

AQL is a feature of the IBM InfoSphere®BigInsights™platform. For more

information, see http://pic.dhe.ibm.com/infocenter/bigins/v2r0/index.jsp?topic= %2Fcom.ibm.swg.im.infosphere.biginsights.text.doc%2Fdoc

%2Fbiginsights_aqlref_con_aql-overview.html.

You must be aware of the key concepts of AQL. Some of the key concepts are as follows:

v _{You must add a .aql extension to any file containing AQL statements. You can} group related AQL files in the same directory on a file system. The directory then becomes an AQL module. Declare the module at the beginning of each.aql

file. Then, when you want to reuse the same logic elsewhere, you can import the modules into other AQL files that are in a different directory.

v _{The text that is sent to the AQL engine in IBM SmartCloud Analytics - Log} Analysis for annotation is represented in a specific view that is called

Document. TheDocumentview is populated by the engine when it runs. Each AQL statement can access this view and perform operations on it.

v _{The fields in an AQL tuple must belong to one of the built-in scalar types. The} types are Boolean, Float, Integer, List, Span, String, and Text.

v _{The Span type represents a contiguous region of text in a text object that is} identified by the beginning and ending positions. For examples, see “Custom annotations and splitters” on page 8.

v _{The following are some of the primary AQL language statements:} – import,export, andmoduleare used to create, share, and use modules – create tableis used to define static lookup tables to augment annotations

with additional information

– create dictionary is used to define dictionaries that contain words or phrases. The dictionary is used to identify matching terms across input text through extract statements or predicate functions.

– create viewis used to create a view and to define the tuples inside that view – create external view is used to specify additional metadata about a

document as a new view. You can use this view alongside the predefined Document view that holds the textual and label content.

– extract is used to extract useful data from text.

– selectis used to provide a powerful mechanism for constructing and combining sets of tuples that are based on various specifications

v _{AQL also has the following built in functions that you can use in extraction} rules:

– Predicate functions such asContains,Equals, andFollows. – Scalar functions such asGetLength,GetString, andLeftContext. – Aggregate functions such asAvg,Count,Min, andMax.

(19)

v _{You can also add user-defined functions (UDFs) that you define to AQL. For} more information, see http://pic.dhe.ibm.com/infocenter/bigins/v2r0/ index.jsp?topic=%2Fcom.ibm.swg.im.infosphere.biginsights.text.doc%2Fdoc %2Fbiginsights_aqlref_ref_udfs.html.

For examples of AQL statements, see the AQL files provided with each of the Insight Packs that are installed with IBM SmartCloud Analytics - Log Analysis.

ThreadID.aqlcontains the views for annotating the thread Idfield from a WebSphere log file. TheThreadID.aqlfile is located in the$UNITY_HOME/

unity_content/WAS/WASInsightPack_v1.1.0/extractors/ruleset/annotatorCommon

directory.

Requirements for a custom splitter in AQL

If you define your own splitter in AQL, you must name the AQL viewLogRecord.

You also must define the columns in the AQL view and the corresponding data types as outlined in the following table.

Table 1. LogRecord columns and data types

Column Data type Description

logSpan Span The span of the input

document that this log record represents.

logText String The text of the log record.

timestamp String The time stamp, if there is any, that is associated with the log record. If the log record does not contain a time stamp, this entry contains an empty string.

type String A single character that

denotes the type of the log record. The value for this entry is A, B or C. For more detailed information about these values, see “Custom annotations and splitters” on page 8.

Tooling for custom AQL rules

You use the Eclipse based tools that are provided by the IBM InfoSphere

BigInsights platform to help you to develop and test AQL rules. You can use the tools to import sample log file data, write AQL statements that extract the relevant information, and to test the AQL statements before you install your custom Insight Pack on the IBM SmartCloud Analytics - Log Analysis Server.

For more information about how to install the tools, see “Tools for extending IBM SmartCloud Analytics - Log Analysis” on page 37.

Best practices

To help ensure that you write effective and reusable rules, read the best practices section of the documentation before you create your own AQL rules for IBM

(20)

SmartCloud Analytics - Log Analysis. For more information, see “Best practices information” on page 99.

Reusable Insight Pack components

Common, reusable Annotation Query Language views and dictionaries are

installed with the standard Insight Packs included with IBM SmartCloud Analytics - Log Analysis. You can save development time by copying and reusing these components in other Insight Packs.

Common AQL module

The Insight Packs for WebSphere Application Server, DB2®, and Generic Annotations each contain a common AQL module containing AQL views and dictionaries that you can use in any Insight Pack. These views contain logic for annotating general concepts such as time and date, IP addresses, hostname, and so on from incoming file data.

Some of the AQL files in the common module define functions that utilize User Defined Functions (UDFs), which are implemented in Java. JAR files that contain UDF classes are also included within the common module. The UDFs expose capabilities through AQL functions for:

v _{date and time manipulation} v _{pattern matching}

v _{string utility functions.}

The common AQL module including the views, dictionaries, and UDF JAR files is installed as part of each standard Insight Pack. For example, within the

WebSphere®Application Server Insight Pack, the common module is located at:

$UNITY_HOME/unity_content/WAS/WASInsightPack_v1.1.0/extractors/ruleset/ common

Within the common module, all files ending with the extension.aqlcontain the AQL views and are located in thecommondirectory.

Dictionaries

All of the dictionaries associated with the common module and referenced by the common module AQL views reside in thedicts subdirectory and all of the UDF JAR files utilized by the common module AQL views reside in thelib

subdirectory.

Within the common module, the included dictionaries are the following: v _month.dict_{- dictionary of month names and abbreviations. See the file}

Date_BI.aqlfor an example of how the month dictionary is used within a view. v _{timeZone.dict}_{- dictionary of timezone and time-related abbreviations. See}

MacrosForTimeDates.aqlfor an example of how the timezone dictionary is used within a view.

v _{tlds-alpha-by-domain.dict}_{- dictionary of top-level domains. See}_HostName.aql for an example of how the top-level domains dictionary is used within a view. v _wkday.dict_{- dictionary of weekday names and abbreviations. See}

MacrosForTimeDates.aqlfor an example of how the weekday dictionary is used within a view.

(21)

Views

Examples of some of the AQL views included within the common AQL module are the following:

v _{DateTimeOutput}_(see_{DateTime-consolidation_BI.aql}_{) - a view that contains date} and time stamps extracted from input data. This view can process many

different date and time formats based on the underlying and related views on which it was built.

v _{HostnameOutput}_(see_HostName.aql_{) - a view that extracts hostnames that are} either fully qualified or followed by a top level domain name

v _{IPAddressOutput}_(see_{IPAddress.aql}_{) - a view that extracts IPv4 addresses} v _SingleLine_(see_{logRecordSingleLine.aql}_{) - a view that extracts single lines}

delimited by newline character from the input document

v _URLOutput_(see_{url_BI.aql}_{) - a view that extracts URLs that begin with https or} ftps or that have no protocol

UDFs

Examples of some of the AQL functions (that utilize UDFs) included within the common AQL module are the following:

v _StrCat_(see_{StringUtils.aql}_{) - concatenates a given list of input strings and} returns a single string.

v _Matches_(see_{PaternMatcherUtils.aql}_{) - determines if a given input string} matches any of a given set of patterns

Reusing views

To reuse views, dictionaries, functions from the "common" AQL module do the following:

1. Create a new Insight Pack project using the eclipse-based tooling.

2. Copy the common directory and everything within it from one of the existing Insight Packs to thesrc-files/extractors/rulesetdirectory within your Insight Pack project.

After you copy the files, the common directory and its contents should reside under the ruleset directory as follows:

src-files/extractors/ruleset/common

3. Utilize the views defined within the common AQL module from within your own AQL files in your project-specific AQL module by doing the following:

a. Add an import statement at the top of your AQL file in your project-specific AQL module. For example,import module common;

b. Use a qualifier when referencing the common AQL module views from within your AQL file in your project-specific AQL module. For example, Select S.logSpanfrom common.SingleLine S;

4. Include the location for the common AQL module in your Insight Pack project ruleset definition.

For example, a rule set defined using the Eclipse tooling can have the following values:

Name: MyProjectRuleSet Type: Annotate Rule file directory:

(22)

Using Java to create annotators and splitters

You can use Java technology to split and annotate incoming log records.

About this task

You create Java classes that implement the IBM SmartCloud Analytics - Log Analysis interfaces used by the splitter and annotator functions. This method is an alternative to using Annotation Query Language (AQL) rules to create the log splitters and annotators.

Java interfaces for splitters and annotators

The Java interfaces are included with the IBM SmartCloud Analytics - Log Analysis are described here.

The implementation process for the Java-based splitters and annotators is:

1. Create Java classes that implement specific interfaces. You create one class to implement the splitter interface and you create one class to implement the annotator interface. The JAR file that contains the classes for each of these interfaces is installed with IBM SmartCloud Analytics - Log Analysis.

2. Import the interface jar files into the Insight Pack project under the lib directory. The name of the JAR files required for compiling are

unity-data-ingestion.jarand JSON4J.jar. After successful compilation, the Java splitter and annotator implementation class files are packaged in a JAR file which is included within the Insight Pack when it is exported from the tooling.

3. Use thepkg_mgmtscript utility to install the Insight Pack into the IBM SmartCloud Analytics server. During the installation, the pkg_mgmtutility copies the implementation JAR to the required location in the IBM SmartCloud Analytics server.

.

Splitter interface

The Java splitter interface is defined as follows:

package com.ibm.tivoli.unity.splitterannotator.splitter;

/************************************************************************ * This interface defines the APIs for Java based Splitters and is used * by third party custom Java Splitter developers

*

***********************************************************************/ public interface IJavaSplitter

{

/****************************************************************** * Split a batch of log records packaged in the input JSON

*

* @param batch * @return

* @throws JavaSplitterException

******************************************************************/ public ArrayList<JSONObject> split( JSONObject batch ) throws Exception ; /*****************************************************************

* Data section

* ***************************************************************/ public static final String IBM_COPYRIGHT =

"Licensed Materials - Property of IBM\n" + "LK3T-3580\n"

(23)

+ "US Government Users Restricted Rights - Use, duplication \n"

+ "or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.\n\n"; Input JSON

The input JSON is primarily a batch of raw log records that needs to be split into logical log records according to a particular criteria (for example, timestamp). The class implementing theIJavaSplitterinterface provides the logic that performs the splitting for the given criteria.

The basic structure of the incoming JSON object is: {

“content”: {

“text” : // raw text to be split ,

, },

“metadata”: {

...meta data fields, eg. hostname, logpath, other fields passed from client...

} } Output JSON

The class implementingIJavaSplittermust return anArrayList of

JSONObjects . EachJSONObjectrepresents either a complete logical log record or a partial log record (for cases where the splitter was unable to specifically determine that the record was complete) and meta-data to indicate whether the included record is complete or not.

Output JSON: {

“content”: {

“text” : // text for this complete/partial log record ,

, }, “metadata”: {

“type”: , // “A” = complete log record // “B” = partial log record at end // “C” = partial log record at beginning

, }

“annotations”: {

"timestamp": // include the timestamp for the current record represented in this JSON object

} }

Annotator interface

The Java annotator interface is defined as follows:

package com.ibm.tivoli.unity.splitterannotator.annotator;

/************************************************************************ * This interface defines the APIs for Java based Annotators and is used * by third party custom Java Annotator developers

*

***********************************************************************/ public interface IJavaAnnotator

(24)

/***************************************************************** * Annotate the input log record & return the output with annotations *

* @param input * @return

* @throws JavaAnnotatorException

*****************************************************************/ public JSONObject annotate( JSONObject input ) throws Exception ; /*****************************************************************

* Data section

* ***************************************************************/ public static final String IBM_COPYRIGHT =

"Licensed Materials - Property of IBM\n" + "LK3T-3580\n"

+ "US Government Users Restricted Rights - Use, duplication \n" + "or disclosure restricted by GSA ADP Schedule Contract with

IBM Corp.\n\n"; }

Input JSON

The input JSON includes a logical log record (formed by splitter or raw record if no split was performed) that is now ready for annotation. The class implementing the IJavaAnnotatorinterface provides the logic that performs the annotation against the given input record and creates an outputJSONObjectrepresenting the JSON structure containing the annotations.

The basic structure of the incoming JSON object is: {

“content”: {

“text” : // logical record to be annotated },

“metadata”: {

...meta data fields, eg. hostname, logpath, other fields passed from client... }

} Output JSON

The class implementingIJavaAnnotatormust return a singleJSONObject

representing a JSON data structure containing the original data passed as input plus the annotated fields parsed from the incoming record. The following sample JSON structure depicts the format of the data that is expected to be returned in the object.

Output JSON: {

“content”: {

“text” : // same text as passed in the input JSON object },

“metadata”: {

},

...annotation fields and their values produced by IJavaAnnotator implementation

} }

(25)

Building splitters and annotators in Java

Building custom splitter and annotator classes in Java.

About this task

To build custom Java splitter and annotator classes in Java.

Procedure

1. Create an Insight Pack Project.

2. Import the interface JAR files into the Insight Pack project in the lib directory. The name of the JAR files required for compiling are

unity-data-ingestion.jarand JSON4J.jar. These files are located$UNITY_HOME/wlp/usr/ servers/Unity/apps/Unity.war/WEB-INF/lib/unity-data-ingestion.jar and

$UNITY_HOME/wlp/usr/servers/Unity/apps/Unity.war/WEB-INF/lib/ JSON4J.jar.

3. Create your Java source files that implement theIJavaSplitterand

IJavaAnnotatorinterfaces under the<project name>/srcdirectory of your insight pack project.

4. Compile your class files and package them into a JAR file.

Restriction: The JAR file that contains the custom Java splitter and annotator classes must reside in the <project name>/src-files/extractors/fileset/java

directory. Otherwise, the JAR file does not install successfully when you install the insight pack on the server.

Note: The JAR file containing theIJavaSplitterand IJavaAnnotatorinterfaces as well as other JAR files containing classes needed for compilation must be located within the project under the<project name>/lib directory. These JAR files must be on the classpath in order for compilation to be successful. To resolve any workspace compilation errors within your eclipse development environment, you can edit the properties for the insight pack project and add the JARs residing under <project name>/libto the Java Build Path.

To run the build file externally:

a. Set theANT_HOMEvariable:

set ANT_HOME=<your home location for ANT>

The recommended version is Apache ANT version 1.7.1.

b. Set theJAVA_HOME variable:

set JAVA_HOME=<your java SDK - home location>

Use the recommended version of the IBM Java SDK at version 1.7.0, which is the JRE installed with Log Analysis.

c. From the directory in which the build file exists (for example

<workspace>/<project name>), issue the command: ant all

5. Using the Insight Pack editor within the tooling, create two file set definitions; one for the custom splitter and one for the custom annotator. To create a file set using the file set editor do the following:

a. Clickaddto define a new file set

b. Enter a name for the fileset (for example,Custom Splitter)

c. Select the type (SplitorAnnotate)

(26)

e. Select the file name (you should see the name of the JAR file containing your custom Java splitter and annotator)

f. Enter the class name corresponding to the type of file set that it is (split or annotate) - include the full package name (for example,

com.mycompany.splitter.MySplitter)

Repeat the above procedure twice - once for defining the splitter file set and once for defining the annotator file set.

6. Using the editors provided within the tooling, create other artifacts that you wish to include within your insight pack (sourcetypes, collections, index configuration, etc).

7. When you are ready to test your custom Java splitter and annotator functions, you can build an installable insight pack from the tooling and then transfer the generated archive file to a IBM SmartCloud Analytics server and install it.

Using Python to create annotators and splitters

You can use Python technology to split and annotate incoming log records.

About this task

You create Python scripts that implement the IBM SmartCloud Analytics - Log Analysis interfaces used by the splitter and annotator functions. This method is an alternative to using Annotation Query Language (AQL) rules to create the log splitters and annotators.

Python interfaces for splitters and annotators

You can create log splitters and annotators using Python scripts with the IBM SmartCloud Analytics - Log Analysis.

The implementation process for the Python-based splitters and annotators is:

1. Create Python scripts that implement specific interfaces. You create separate scripts - one for the splitter and one for the annotator. Create or copy the splitter and annotator scripts to the specific directory for an Insight Pack. When the Insight Pack is packaged and exported from the Log Analysis Tooling, it contains the implementation scripts.

2. Use thepkg_mgmtscript utility to install the Insight Pack into the IBM SmartCloud Analytics server. During the installation, the implementation scripts are copied to the required location within the IBM SmartCloud Analytics server.

.

Note: The Input JSON and Output JSON formats described here for the splitters and annotators are the same for both the Java and Python implementations. That is, the logical JSON format is the same for both Java and Python. The formats are included here for completeness. The key difference between Java and Python is how the input and output JSON is passed in and returned. For Java, the JSON data is passed in and returned using objects. For Python the JSON data is passed in and returned using input and output files for splitters and stdinand stdoutfor the annotators.

Splitter interface

Use Python to define your log splitter. Input JSON

(27)

The input JSON is primarily a batch of raw log records that needs to be split into logical log records according to a particular criteria (for example, timestamp). The log records are passed to the script using an input file. The basic structure of the incoming JSON data is:

{

“content”: {

“text” : // raw text to be split ,

, },

“metadata”: {

} } Output JSON

The splitter script must return output files in the required JSON format. Each JSON record represents either a complete logical log record or a partial log record (for cases where the splitter was unable to specifically determine that the record was complete) and meta-data to indicate whether the included record is complete or not.

The basic structure of the output files is: Output JSON:

{

“content”: {

“text” : // text for this complete/partial log record ,

, }, “metadata”: {

“type”: , // “A” = complete log record // “B” = partial log record at end // “C” = partial log record at beginning ,

} “annotations”: {

"timestamp": // include the timestamp for the current record represented in this JSON structure

} }

Example splitter script

IBM SmartCloud Analytics - Log Analysis includes a sample splitter script here:

$UNITY_HOME/DataCollector/annotators/scripts/DB2PythonSplitter.py TheDB2PythonSplitter.pyscript splits the data within thedb2diag.log. Develop the Python splitter script to process the input JSON and transfer the output JSON records to a file. You specify the file names when you invoke the splitter script. For example, the splitter script

DB2PythonSplitter.pyis invoked with the command: python DB2PythonSplitter.py -i

/opt/UnityContent/db2LogBatch.json -o /opt/UnityContent/db2LogBatchSplitOut.json

(28)

Wheredb2LogBatch.jsonis the name of the input JSON and db2LogBatchSplitOut.json is the name of the output JSON.

Annotator interface

Use Python to define your log annotator using the Input and Output JSON records.

Input JSON

The input JSON includes a logical log record (formed by splitter or raw record if no split was performed) that is ready for annotation. The log records are passed to the script usingstdin and the creates a JSON data structure that contains the annotations and is written tostdout.

The basic structure of the incoming JSON structure is: {

“content”: {

“text” : // logical record to be annotated },

“metadata”: {

...meta data fields, eg. hostname, logpath, other fields passed from client... }

} Output JSON

The script implementing the annotator writes a single JSON data structure tostdoutthat contains the original data passed as input plus the annotated fields parsed from the incoming record. The following sample JSON

structure depicts the format of the data that is expected to be written to

stdout. Output JSON:

{

“content”: {

“text” : // same text as passed in the input JSON structure

}, “metadata”: {

},

...annotation fields and their values produced by Python script implementation

} }

Example annotator script

IBM SmartCloud Analytics - Log Analysis includes a sample annotator script here:

$UNITY_HOME/DataCollector/annotators/scripts/DB2PythonAnnotator.py TheDB2PythonAnnotator.py script annotates the data within the

db2diag.log.

Develop the Python annotator script to process the input JSON fromstdin

and transfer the output JSON records to stdout. For example, the annotator scriptDB2PythonAnnotator.pyis invoked with the command: python DB2PythonAnnotator.py

(29)

Building splitters and annotators in Python

Building custom splitter and annotator scripts with Python.

Before you begin

Before you begin, install the tools for extending theIBM SmartCloud Analytics.

About this task

To build an Insight Pack that contains custom splitters and annotators implemented in Python:

Procedure

1. Create an Log Analysis Insight Pack Project.

2. Create your Python script files that implement the splitter and annotator functions under the <project name>/src-files/extractors/fileset/script

directory of your Insight Pack project. The files must be located in this directory or they will not install successfully.

3. Using the Insight Pack editor within the tooling, create two file set definitions; one for the custom splitter and one for the custom annotator. To create a File set definition, open theFile setstab in the Insight Pack Editor and complete the steps:

a. ClickAddto define a new File set

b. Enter a name for the File set (for example,Custom Splitter)

c. Select the type (SplitorAnnotate)

d. Select the file type (Script)

e. Select the file name. The scripts containing your custom Python splitter and annotator are listed in the drop-down list.

Repeat the above procedure twice - once for defining the splitter File Set and once for defining the annotator File Set.

4. Using the editors provided within the Log Analysis tooling, create other artifacts that you wish to include within your Insight Pack (Source types, Collections, Index configuration, and so on).

5. When you are ready to test your custom Python splitter and annotator functions, you can build an installable Insight Pack from the tooling and then transfer the generated archive file to a IBM SmartCloud Analytics server and install it.

Indexing configuration

To control how IBM SmartCloud Analytics - Log Analysis indexes records from a log file, you can create indexing settings for your content Insight Pack.

The indexing configuration settings specify the data type for each field that is indexed. The settings also specify a set of indexing attributes for each field. The index processing engine uses these attributes to define how a field is processed. One configuration is defined for each Source Type that is contained in an Insight Pack. For more information about Source Types, see the topic about Source Types in the IBM SmartCloud Analytics - Log Analysis Administration Guide.

The index configuration settings are defined in the Java Script Object Notation (JSON) format. To edit the index configuration settings, use the Eclipse based

(30)

tooling that is provided with IBM SmartCloud Analytics - Log Analysis. For more information about how to edit the index configuration settings, see “Editing the index configuration” on page 43.

The indexing configuration specification consists of the following attributes: v _{indexConfigMeta}_{contains some basic metadata information about the indexing}

configuration itself. This information includes the following attributes: – namespecifies the name of the indexing configuration. For example,WAS

SystemOut Config.

– Description specifies the description of the indexing configuration. For example, WAS SystemOut indexing config.

– version specifies version of the indexing configuration. For example,1.0. – lastModifiedspecifies the last modified date. For example,01/11/2013. v _{Fields are used to define field descriptions for the each record to be indexed.}

IBM SmartCloud Analytics - Log Analysis uses the following field descriptions to define the data for each field that is indexed:

– fieldname specifies the name of field to be indexed

– dataTypespecifies the data type of field to be indexed. This can beTEXT,LONG,

DOUBLE, andDATE.

– indexingattributesare five attributes that contain binary values. IBM SmartCloud Analytics - Log Analysis uses the five attributes to indicate how the field is processed. The five attributes are:

- retrievable

- retrieveByDefault

- sortable

- filterable

- searchable

For more information about field configuration, see “Field configuration” on page 26

IBM SmartCloud Analytics - Log Analysis also uses an attribute that is called

Sourceduring indexing. TheSourceattribute is structured as follows: indexConfigMeta

timeZone fields:

<list of indexing attributes such as sortable, searchable.> “source”: {

“paths”: [json_path1, json_path2, ...., json_pathN], “dateFormats”: [date_format1, date_format2],

“combine”: “one of two possible values – ALL or FIRST” }

The Sourceattribute consists of three other attributes:

paths

Thepaths attribute contains an array of one or more JSON path expressions.

(31)

ThedateFormats attribute is only relevant for fields that use the DATE type. It is used to specify format strings that determine how date values that are entered in this field are parsed.

Attention: The number of elements in the array must be the same for both thepathsand dateFormatsattributes.

combine

Thecombine determines how the values that are returned by thepathsand

dateFormats attributes are used. Thecombineattribute has two possible values,ALLorFIRST.ALLis the default value.

Ifcombine is set toALL, all the non-null values from all the paths are added to the content of the corresponding field. This setting allows an index field to be populated from multiple attributes in the JSON record that you specify.

For example, consider a scenario where you want to index all the host names that are associated with each record into a single indexed field. The host names can be part of the structured metadata that belongs to an incoming log record or they can be extracted by analytics from a log message. For example, IBM SmartCloud Analytics - Log Analysis generates the following JSON structure after the annotation is complete:

{

“logRecordID”: “3344564533”, “hostname”: “host1.ibm.com”,

“message”: “Server failed to ping host2.ibm.com and host3.ibm.com”, “Annotations”: {

“hosts”: [{“name”: “host2.ibm.com”, “begin”: 22, “end”: 35}, {“name”: “host3.ibm.com”, “begin”:40, “end”:53}

] }

}

To ensure that the value for the field that is indexed includes both of the host names that are related to the annotated record, you use the following source attribute definition in the indexing configuration:

“source”: {

“paths”: [“hostname”, “Annotations.hosts.name”], “combine”: “ALL”

}

Ifcombine is set toFIRST, the JSON path expressions are evaluated individually in the order that they are listed in the array. The first path expression that returns a non-null and non-empty string value is used and the subsequent expressions are ignored. If the first path expression that returns a non-null and non-empty string value returns multiple values, IBM SmartCloud Analytics - Log Analysis uses all the values to populate the indexed fields.

For example, imagine that you want to index a field that stores the host names that are included in the log message. However, IBM SmartCloud Analytics - Log Analysis cannot extract the host name from some log records. In this case, you want to use the host name that is associated with the overall log record as a substitute. You use the following source

attribute to do this: “source”: {

“paths”: [ “Annotations.hosts.name”, “hostname”], “combine”: “FIRST”

(32)

Example

The following example shows an abbreviated example of the indexing configuration for WebSphere Insight Pack:

{ "indexConfigMeta" :

{ "description" : "Index Mapping Configuration for WAS SystemOut logs", "lastModified" : "11/01/2013",

"name" : "WAS SystemOut Config", "version" : "0.4"

},

"timeZone" : "UTC", "fields" : {

"className" : { "dataType" : "TEXT", "filterable" : true, "retrievable" : true, "retrieveByDefault" : true, "searchable" : true, "sortable" : false, "source" : { "paths" : [ "annotations.annotatorCommon_ClassnameOutput.span.text" ] }, "tokenizer" : "literal" },

"timestamp" : { "dataType" : "DATE", "filterable" : true,

"retrievable" : true, "retrieveByDefault" : true, "searchable" : true, "sortable" : true,

"source" : { "combine" : "FIRST",

"dateFormats" : [ "MM/dd/yy HH:mm:ss:SSS Z", "MM/dd/yy HH:mm:ss:SSS Z" ], "paths" : [ "annotations.annotatorCommon_LogTimestamp.span.text", "metadata.timestamp" ] }, "tokenizer" : "literal" } } }

Field configuration

IBM SmartCloud Analytics - Log Analysis uses the attributes that are listed in the table to configure individual fields during indexing.

The indexing configuration is a file in the JavaScript Object Notation (JSON) format. The attributes are set up as the key-value pairs in the indexing

configuration file and the resulting record is mapped to the appropriate field name. The JSON record key for each attribute is listed in the first column. The possible values that are associated with this key and default values that are used when the key is missing are shown in the second and third columns. The symbols true and false refer to the corresponding JSON Boolean values. All other values, unless otherwise specified, are JSON strings.

Table 2. Field configuration

Attribute key Possible value Default Description

dataType TEXT,LONG,DOUBLE

andDATE

TEXT Specifies the type of data that is stored in this field.

(33)

Table 2. Field configuration (continued)

Attribute key Possible value Default Description

retrievable true or false false Determines whether

the contents of this field are stored for retrieval. When set to false, the content is not stored in the index. When set to true, the content is stored and available for retrieval. The

retrieveByDefault

value controls how and when the content of this field is included in search results.

retrieveByDefault true or false false When set to true, the

contents of the field is always returned as part of any search response. When set to false, the field is not part of the default response. However, when required, the content of the field can be explicitly requested using the appropriate parameters that are supported by the search run time. The

retrieveableflag must be set to true for this attribute to work.

sortable true or false false Enable or disable the

field for sorting and range queries

filterable _{true or false} _false _{Enable or disable}

facet counting and filtering on this field

searchable _{true or false} _true _{Controls whether the}

field is enabled for searching/matching against it

enableWildcard true or false false Controls whether the

field is enabled for wildcard matching

(34)

Data type configuration

You can include custom data type definitions in your custom Insight Pack. You can create data type configurations for each of the following entities: Collections

You use a Collection to group log data from different log sources that have the same Source Type. The Collection definition depends on the Source Type definition that specifies how the IBM SmartCloud Analytics - Log Analysis Server splits, annotates, and indexes the incoming data records. You must define values for the following properties in the Collection definition:

Name Specify a unique name that is used to identify the Collection Source Type

Specify the name of the Source Type that is associated with the log records in the Collection

Source Types

A Source Type defines how data of a particular type is split, annotated, and indexed by IBM SmartCloud Analytics - Log Analysis.

The Source Type specifies the Rule Sets and, if you want to implement custom processing, the File Sets that the IBM SmartCloud Analytics - Log Analysis Server uses to split and annotate the log records for the particular log Source Type. The Source Type specifies the index configuration settings that the IBM SmartCloud Analytics - Log Analysis uses to index the log records for the particular log Source Type.

You must define values for the following properties in the Source Type definition:

Name Specify a unique name that is used to identify the Source Type. Enable splitter

Select this flag to enable the splitter function that splits the log records during processing.

Splitter Rule Set name

Specify the name of the Annotation Query Language (AQL) rule set that governs how log records are split.

Splitter File Set name

Specify the name of a file that you created that contains custom splitter logic that you defined, for example Java or Python script, that governs how log records are split. This is an alternative to the Rule Sets.

Enable annotator

Select this flag to enable the annotator function that annotates the log records during processing.

Annotator Rule Set name

Specify the name of AQL rule set used to perform annotator function.

Annotator File Set name

(35)

annotator logic that you defined, for example Java Archive (JAR) or Python script, that governs how log records are annotated. This is an alternative to the Rule Sets.

Deliver data on annotator execution failure

Set this indicator to enable indexing even when the annotation fails. By default, indexing is stopped if the annotation fails. Index configuration

Specify the name of index configuration JSON file that you use in your custom Insight Pack.

Rule Sets

A Rule Set is a collection of files that contain rules that are written in the Annotation Query Language (AQL). IBM SmartCloud Analytics - Log Analysis uses the AQL rules to split logical log records according to a known boundary or to extract the data from fields in log records that contain structured or semi-structured data.

You must define the following properties in the Rule Set definition: Name Specify a unique name that is used to identify the Rule Set. Type Specify whether you want the Rule Set to split or annotate the log

records. Rule file directory

Specify the paths for the directories that contain the AQL rule files that the Rule Set uses. The paths must be relative to thesrc-files

directory path that is defined in your custom Insight Pack. For example,extractors/ruleset/common;extractors/ruleset/ splitterSystemOut.

File Sets

A File Set is a collection of files that contain the custom logic that you defined to split or annotate log data. You can use either Java or Python to create the custom logic. You must define the following properties in the File Set definition:

Name Specify a unique name that is used to identify the File Set. Type Specify whether the File Set is used to split or annotate data. File type

Specify whether the file is Java or script. File name

Specify the name of the file that contains the custom logic that you defined. For example, if you use Java, this file is a Java Archive (JAR) file.

Class name

If you use Java, specify the name of the main Java class name. Note:

Data sources, such as log source definitions, are not defined as part of a custom Insight Pack because data sources require specific information, such as host name, log path, and service topology information that is dependent on the server and environment. This information varies depending on where IBM SmartCloud

(36)

Analytics - Log Analysis is installed. As a result, when you define a custom Insight Pack, you only need to define data types such as Collections, Source Types, and Rule and File Sets.

After you install your custom Insight Pack, you must

IBM SmartCloud Analytics - Log Analysis Version 1.1. Extending IBM SmartCloud Analytics - Log Analysis