getting the most From multiple proCessors

Bso wIll never dIe and here Is why 4

4.5 getting the most From multiple proCessors

using parallelism can have huge benefits, but the relationship between CALCPArALLEL and CALCtASkDImS is not always obvious. The first thing that needs to be under-stood is that you cannot have one without the other. There can be no parallelism with-out a CALCtASkDImS setting. In order for Essbase to utilize multiple processors, the outline must be evaluated to determine which sections of the outline can be processed in parallel. The output of the evaluation is called the task list. The CALCtASkDImS statement tells Essbase how many dimensions to consider when developing the task list.

The key concept here is how BSo determines the dimensions to consider when building the task list. The largest aggregating dimensions offer the best opportunities for paral-lelism. reviewing the statement CALCtASkDImS x, Essbase uses the last x dimen-sions in the outline to determine the task list. If a database having two large aggregating dimensions uses the hourglass organization, then CALCtASkDImS 2 is a good start-ing point for teststart-ing. on the other hand, if the hourglass on a stick format is used and the last two dimensions are nonaggregating, CALCtASkDImS must be increased until the largest dimensions are included in the scope. In the example, testing should start at three or four to ensure that the large dimensions are evaluated.

4.6 CaChe is king; know the BasiCs

Caches in BSo are a powerful yet poorly understood subject. Let me relay a story that sums up the state of cache tuning. A few years ago, I visited a customer as a member of a team that was tasked with addressing a BSo calculation issue. The customer had a cal-culation script or, more accurately, a Planning business rule that was taking three min-utes to complete. This was their primary rule that would get executed extensively during the forecasting period. The rule needed to complete in no more than 10 seconds. my associate, whom I had never met, was a self described Essbase tuning expert. I will call him Bill for the sake of the story. Bill immediately viewed the contents of the Essbase.

cfg file. With much confidence Bill declared that he had found the problem. There were missing entries in the configuration file. The customer was thrilled and I could not wait to see how he was going to reduce the calculation time from three minutes to a hand full of seconds. Just when I thought that it could not get stranger, Bill opened the configu-ration file in the editor. Striking a pose reminiscent of Joey tribbiani’s (tv’s Friends) smell-the-fart act, Bill added the following to the file:

CALCLOCKBLOCKHIGH 2000

In the calculation script header, Bill inserted the following:

Bill did not ask anything about the server memory or processors, the number of concur-rent users, or what the calculation was supposed to do. For Bill, testing was not a priority.

In any event, the customer just stared. They were mesmerized. you would have thought Bill was rain man. Did it decrease the calculation time? of course not. Sadly, this story defines the state of BSo tuning today. Somehow the bigger-is-better philosophy became the rule. The simple truth is that there is no one size fits all when it comes to BSo tuning.

to illustrate my point, let us start the tuning discussion by describing the caches that can make a difference. Consider this a quickie discussion for setting caches for those who do not have three weeks to dedicate to benchmarking. The object here is to make the system as fast as possible in a reasonable time frame, rather than make it worse.

keep in mind that improving calculation time may slow user reporting and vice versa;

the optimum configuration is likely to be a compromise. Finally, when the operating system is 32 bit, there is a hard limit of 4 gb of memory. Exceed this limit and bad things will happen. Whether the server is 64 or 32 bit, never allocate more resources than are available on the server.

most often, caches are tuned in a vacuum rather than real world conditions. unfortu-nately, the data, data file, index, and operating system caches are shared by everyone using the database. Each request competes for resources. testing processes in isolation might pinpoint best case performance, but probably not real-world results. For simplicity, I suggest that initial tests be done individually for expediency.

For final testing, it is important to determine that the test simulates the production environment complete with activity. Are calculations scripts and reports both to be tested? Is the purpose the test to determine best, average, or worse case performance?

how many concurrent calculations? Are the calculations batch or interactive? how many concurrent calculations and reports will be tested? If there is no stress testing software available, execute multiple maxL processes from a command file.

4.6.1 Index Cache

This is the easiest cache to set particularly in 64 bit. Place the entire index in memory. For those in 32 bit or, in a tight memory situation where the index is really large, test with 50 or 75% of the index in memory. For testing, aggregate the sparse dimensions increasing the index cache size with each test. Start the testing with the upper blocks calculated or not, but be consistent. If the upper blocks are cleared prior to the test, I have seen situations where a dense restructure will shrink the PAg file and speed the aggregation. Stopping and starting the database will flush the caches. under no circumstances set the cache larger than the index because Essbase will allocate everything that is specified. Figure 4.5 captures the physical index size for Sample.Basic is 8,216,576 bytes as shown in EAS.

Figure 4.6 shows that the index cache setting is 1,024,000 and the current value is also 1,024,000. As an experiment, I increased the index cache from 1024 kb to double the physical index size of 16,433 kb. Be certain to stop/start the database after changing the cache settings and then execute a calculation. Figure 4.7 shows the results.

notice that the current index cache value is now twice as large as the physical index size. BSo will claim whatever is specified up to the limit of 4,294,967,295 bytes.

If mEmSCALIng is specified in the configuration file, the limit is extended for 64 bit.

Periodically review the index cache size and update the cache accordingly.

4.6.2 Data Cache

Before discussing the data cache, it is important to understand compression and how it relates to the cache. Physical data blocks are stored in the ESS*. PAg file(s) in com-pressed form assuming that some lunatic has not turned off compression. There are three types of compression in BSo: bitmap, rLE, and zLib. zLib is primarily a zipped format that is slow relative to the other two. zLib is used when disk space is an issue.

Bitmap, as the name implies, places a bitmap in the header of each block. The bitmap is used to tell BSo which cells have data and which have missing values. As the block size increases, so does the bitmap size increasing the block header overhead. rLE stands

Figure 4.5 Sample.Basic index size. (From Oracle Essbase Administration Services. With permission.)

Figure 4.6 Sample.Basic cache statistics. (From Oracle Essbase Administration Services. With permission.)

for run length encoded. The header for rLE is fixed at approximately 72 bytes. rLE compression type stores the value and then a count indicating the consecutive blocks that have the value. rLE works best when there are many missing values or values that repeat in the data block. According to the documentation, if rLE is specified, Essbase will analyze the data and choose the appropriate compression. I tried very hard to prove the documentation correct or otherwise, but I did not see this behavior. There is no sub-stitute for testing. In general, larger blocks will probably produce smaller PAg files using rLE than Bitmap and vice versa.

now back to cache settings. As BSo writes a data block, the uncompressed block (stored cells × 8 bytes) is compressed and moved to the operating system cache. The block is then written to the PAg file by the operating system. If using direct I/o, the data is written to the PAg file directly from the Essbase data file cache bypassing the operating system cache altogether. The data and data file caches give BSo a place to save blocks in case it needs them again, saving a fetch from the relatively slow PAg disk file. The Essbase kernel will continue writing blocks to the cache until it is full. once the cache is full, older blocks are replaced by more recently retrieved data blocks. The object of sizing the caches is to increase the likelihood that the data block will be in the cache.

The BSo data cache default size is 3072 kb. Dividing 3,072,000 by the block size gives the number of blocks that fit in the data cache. In most cases, the default is too low. to size the cache, double it for test, stopping and starting the database after the change.

Then run a full aggregate and note the time. When aggregations times stop improving or improve only slightly, the cache is too large. The correct value is somewhere between the data cache setting of the last two tests. Perform several tests adjusting the cache between the two values until the optimum value is found.

4.6.3 Data File Cache

This cache is only used with Direct I/o. As stated earlier, for Direct I/o, the data cache and the operating system cache are not used. Direct I/o is better suited to 64-bit operat-ing systems because they are able to address larger amounts of memory. The data blocks stored in the data file cache are compressed. Let me say that again. The key concept to understand about Direct I/o is that the data file cache blocks are the same size as the

Figure 4.7 New Sample.Basic cache statistics. (From Oracle Essbase Administration Services. With permission.)

compressed blocks in the PAg file. It is feasible to place the entire database into mem-ory by setting the data file cache to the PAg size. Should you choose this route, oracle recommends:

1. Direct I/o should be set at the server level.

2. The Essbase server should be on a dedicated box and not be in a shared resources environment.

3. memory must be carefully monitored.

4. restart the server regularly. Do not exceed server memory.

4.6.4 Calculator Cache

Calculator cache is used to track the children of parent members during calculations.

If the database is small enough and multiple bitmaps are possible, the cache also tracks the parents. In short, the cache is used most commonly during aggregations. The DBAg has an excellent write up that covers calculator cache, so I am not going to repeat it here.

What the documentation does not say is that calculation cache operates differently than index and data caches. While the index and data caches are shared by all of the users of a given database, calculator cache is not shared. In fact each calculation creates its own calculator cache for each processor thread. Please go back and read that last sentence again. It means if CALCCAChE = 199m and a single calculation script logs the message

“Calculating in parallel with [5] threads,” then the Essbase kernel could use as much as 1 gb of memory for the calculator cache alone. Considering that each script has its own calculator cache, memory can be an issue during periods of high concurrency.

hence, if 199m is not the right answer, what is? That is easy, we test. For starters, when the database is small, use the guidelines in the documentation and try to achieve mul-tiple bitmaps. An hourglass organization may be required for mulmul-tiple bitmaps. When the database is large, as mine always seem to be, only single bitmap is available. In this situation, oracle offers one suggestion: Do not set the calculator cache higher than 50 mb. This is because the calculator cache has no index. The search method is sequential.

The thinking is that at some point it takes longer to search than to retrieve. Somewhere there must be a database that needs 199m calculator cache, but I have never seen one.

optimize calculator cache after the index and data caches have been tuned. Set the calculator cache in the Essbase.cfg file to 20m, 50m, and 100m for Low, Default, and high, respectively. recycle the Essbase server to reset the values. Clear the upper blocks of the database and then run a script that aggregates the sparse dimensions first with CALCCAChE set to Low. repeat the test for 50m and 100m. If there is not a measure-able improvement of say 5 to 10%, then stay with the lower value. only set the calcula-tor cache to 199m if there is a provable performance benefit. In general, scripts that aggregate large dimensions require larger calculator caches. Scripts that utilize multiple processors have smaller calculator cache needs because each thread has its own cache.

4.6.5 Dynamic Calc Cache

This cache is used for reporting. The formula is pretty straight forward, so do the math, but do not exceed your operating system resources. DynCALCCAChEmAxSIzE is the name of the Essbase.cfg file setting and the default value is 20 mb. The calculation using the variables in table 4.1 would be:

Dynamic Calc Cache in bytes = C * S * U

The lock block value is displayed in the log if the calc script contains “Set msg Summary” or “Set msg Detail.” The message looks as follows:

Maximum Number of Lock Blocks: [5000] Blocks

to calculate the expanded block, the total members in the dense dimension are used instead of stored members, as shown in Figure 4.8. using Sample.Basic as an example, the expanded block size is 19 * 17 = 323 cells or 2584 bytes (8 bytes per cell).

In this pretend world, 15 simultaneous users are expected; the Dynamic Calc Cache should be set to 5000 * 2584 * 15 or 193,800,000 bytes. A Dynamic Calc Cache of this size can hold 75,000 blocks.

going forward, the dynamic calc cache can be monitored using the application log.

As reports are executed, the kernel writes informational messages showing the number of blocks inserted into the cache.

Extractor Big Blocks Allocs -- Dyn.Calc.Cache:[9504] non-Dyn.Calc.

Cache:[0]

The first half of the message indicates that this report needed 9504 blocks and they all fit in the cache. If the non-Dyn.Calc.Cache value is consistently greater that zero, consider raising the size using the DynCALCCAChEmAxSIzE in the Essbase.cfg.

Figure 4.8 Members in dense dimensions. (From Oracle Essbase Administration Services. With permission.) Table 4.1 Calc Cache Variables

Variable Description

C Lock block value

S Expanded block size

u Estimated number of concurrent users

In document Developing Essbase Applications Advanced Techniques for Finance and IT Professionals (Page 140-146)