Semantic Logging - r bdaus pdf

We define semantic logging as data or events that are written to log files explicitly for the purpose of gathering analytics. As mentioned earlier, most existing code already has logging statements. With semantic logging, we formalize the process by adding or modifying logging instructions throughout the code. In general, it is something very easy; for example, this high-level snippet of pseudocode that is called every time a commercial transaction is to be processed:

void submitPurchase(transactionID) {

log.info("action=submitPurchaseStart, transactionID=%d", transactionID,\ " productId=%d", productId,\

" listPrice=%d\n", listPrice);

// Each of the following calls throw an exception on error, which is also logged submitCreditCard(...);

generateInvoice(...); fulfillOrder(...);

log.info("action=submitPurchaseComplete, transactionID=%d\n", transactionID); }

Chapter 7 ■Using Log FiLes to Create advanCed anaLytiCs

The logging statements, emphasized in boldface in the previous pseudocode, call a method that logs the corresponding information and also includes a timestamp. By just adding those two logging statements, all at once we have enough information to answer the following questions:

What is the hourly, daily or monthly purchase volume?

•

How long are the purchases taking during different times of the day and different days

•

of the week?

Are the purchases taking any longer than they did last month?

•

Are the underlying systems getting slower over time, or are they stable?

•

How many purchases are failing? Graph these failures over time

•

Which specific purchases are failing?

•

A couple of years ago, we generated some fake data that would have been created by the code sample presented earlier. The file contains 5 weeks’ worth of transactions starting on Monday, August 29, 2011, until Sunday,

October 2, 2011, with almost 4 million events. You can find this file in the download package of the book under the name c7sampledata.log.gz.

Before we load the data into Splunk, we create a new index called c7 following the steps we presented in Chapter 2. It is in this index that we will load the sample data following these steps:

From any screen in the Splunk user interface, click on the Manager link, which is located on

•

the top right corner

Select the “Data inputs” option on the manager screen

•

To the right of the “Files & directories” option, under the “Actions” column, click on “Add new”

•

Select the

• c7sampledata.log.gz file by clicking on the “Browser server” box under the “Preview data before indexing” option and select the “Continue” button on the bottom right corner As Splunk is not familiar with this data source, it presents a pop-up window giving you the

•

option to start a new source type or use an existing one. Chose to start a new source type At this point, Splunk presents a preview of the data, where it highlights the timestamp within

•

In Figure 7-3, you can see that the timestamp assigned by Splunk is different than the timestamp that is highlighted in the actual event. The difference is because the event has the timestamp based on the Eastern Daylight Saving time zone, but the Splunk user has defined the Pacific time zone so it presents the timestamp accordingly.

Caution

■

even though splunk stores the correct timestamp, it displays it according to the time zone defined in the

user profile. this also affects the time periods selected in searches.

Note

■

as an aside, to see or change the time zone of the user profile you are working with, you can go to the Manager

and select the “your account” option. you can see in Figure 7-4 that the screen presented allows managing a number of

items in the user profile, including the time zone.

Chapter 7 ■ Using Log FiLes to Create advanCed anaLytiCs

Back to loading the sample data. Now that you understand the reason for the difference in the timestamps, and all the rest of the data looks correct, we click on the “continue” link above the data preview. This action presents a popup window asking to name the new source type. We choose to call it c7example and click on the “Save source type” button on the bottom right corner. You will get another popup window stating that c7example was successfully created, and then you click on the “Create input” button.

You are now presented with the screen shown in Figure 7-5, where you select “Index a file once from this Splunk server”. Not visible in Figure 7-5, but under the “More settings” section, select the index where Splunk will store this data from the corresponding pull-down menu, which for this example is c7.

The next step is to verify that the data was loaded correctly. Before we look at the search, you must remember to select the correct time range for the data, as it is from the year 2011. To do this, select “Custom time” from the pull-down menu of the time picker. This presents a popup window that is shown in Figure 7-7. In it, you can see that we chose the earliest date to start and October 3 as the latest time. Even though the sample data finishes a few seconds before midnight of October 2, by choosing October 3 you don’t have to type 23:59:59.999 in the latest time field. If you don’t do this, the results of the search will be empty.

Once you save the new file definition, Splunk goes ahead and starts loading it. For this example, it takes a couple of minutes to upload. To know when loading has completed, you can go to the Search window in the user interface and select “Index activity” from the Status pull-down menu, and then choose “Index activity overview”. This will show the information presented in Figure 7-6, where you can see that the c7 index has 3,815,666 events, which coincides with the number of events the original file has.

Chapter 7 ■ Using Log FiLes to Create advanCed anaLytiCs

Now you can type the following search to verify the data: index=c7 | head. This displays the 10 most recent events, which you use to verify that the fields in the events are recognized and show up in the left side bar. You also look for the values of the default fields created by Splunk for each event: these are the host, the source type, and the source file. In Figure 7-8, you can see that the action, transactionID, productId and listPrice fields are present in the left side bar, and that the default fields of each event show the correct information.

Figure 7-7. Setting a custom time range

Figure 7-8. Verifying the loaded data

As the verification is complete, you can write the searches that will answer the questions we posed earlier. However, simply calculating the metrics by using the submitPurchaseStart event is not correct, because if any of the functions before the submitPurchaseComplete event fails, the metrics will be flawed. Given that a complete transaction is made up by both events, you will have to use the Splunk transaction command, which we already reviewed in Chapter 3, to correctly calculate the desired metrics.

The transaction command is pretty expensive because it keeps track of all of the ends of a transaction until it can match them to a beginning, assuming that they are paired, which might not always happen. Depending on the amount of events and the time between the first and last, this can consume a lot of memory. The tracking is done in reverse order because Splunk sees events in reverse time order, going from the most recent to the oldest.

We start by finding out which is the maximum time between the beginning and end of the transactions in our sample data. This can be done by typing the following command:

In this search, you specify that the transaction is delimited by the field transactionID, that is, only events with the same value in that field will be grouped. You also specify that the maximum time between the first and last events in a transaction cannot be greater than one minute. We do this because otherwise some transactions will not be counted, as they are evicted to make space for others. By specifying a relatively big amount of time, we can be sure that all the transactions are included in the count.

As you saw in Chapter 6, the transaction command produces two fields, duration and eventcount. The result of this search is 0.6 seconds, so from now on you can specify that the maximum amount of time for a transaction is 1 second to make it more efficient. Next, you will find out which transactions are incomplete, that is, the purchase failed. One search to obtain this information is:

index=c7

| transaction transactionID maxspan=1s | search NOT *submitPurchaseComplete

This search is rather interesting, because we use the search command twice. The first time it is used to bring all the events from the c7 index. Then we use it again to find the new events created by the transaction command that do not have a string with the regular expression *submitPurchaseComplete. Note that the first line does not include the word “search”, as it is always implicit that the first command is a search. The second time the command has to be stated as search. It is worth indicating that the results of this search will include any transactions that take longer than 1 second, which we now is not the case. For the sample data, the result of this search produces four transactions, which are shown in Figure 7-9. This answers the question regarding failed purchases and specifically which of them are failing.

Figure 7-9. Incomplete transactions

You can now find out the average transaction times, which should help to answer a few of the questions in our list. We start with how long the purchases are taking. In order to get a good resolution on the column chart that we use to present the results, we chose to calculate the hourly average of the duration of the transaction, for a period of one week, any longer period of time will produce an illegible chart. Remember to change the time period for this search.

index=c7

| transaction transactionID maxspan=1s | timechart avg(duration) span=1h

The results of this search can be found in Figure 7-10, which presents a column chart with a minimum value for the y-axis of 0.3. We did this to have a better resolution on the chart. As we hovered over one of the columns, Splunk presents detailed information of that particular column. From this, we can deduce that the longer times of the transactions are between 4:00pm and 7:00pm, pretty much every day.

Chapter 7 ■ Using Log FiLes to Create advanCed anaLytiCs

To find out if the underlying systems are getting slower, or if the purchases are taking any longer than they did the previous month, you can do a search similar to the previous one, but this time you calculate the average on a daily basis, thus we change the span argument of the timechart command to span=1d. This will reflect much better any issues over a longer time span, and we use a line chart, which will clearly show any trend there might be. After changing the time range to include all the 5 weeks’ worth of data in the time picker, the results can be see in Figure 7-11.

As can be seen in Figure 7-11, except for the skew presented on the first day of the data, the transaction times seem pretty stable over the whole period covered by the sample data. If there had been any issues with the underlying systems, they would have shown up by presenting an increasing trend over time on this line. Very likely, the number of transactions is the same over the sample period. You can see this and partially answer the first question of the list by creating a graph that shows the daily purchases of all the products with the following search and the results, in the form of a stacked column chart, can be seen in Figure 7-12. In it you can see that the product sales seem to be very stable over time, and this is probably the reason why we don’t see any degradation of the transaction time.

Figure 7-10. Hourly average transaction times

index=c7

| transaction transactionID maxspan=1s | timechart count by productId span=1day

The skew on the daily average times observed in Figure 7-11 can be explained by reviewing the number of sales for Sunday August 28, 2011, the first day of the data in Figure 7-12. As there were so few sales that day, the transaction time was on average 0.1 seconds shorter than on the other days.

Admittedly, this is a simplistic example, but it clearly illustrates how easy it is to produce advanced analytics from log files, without having to mess up with data schemas and writing special code to insert data in a data warehouse or database.

In document r bdaus pdf (Page 131-139)