• No results found

Using Grid Control

In document Exadata (Page 70-88)

As you learned previously, you can administer the Exadata Database Machine in several ways: using the command-line tool CellCLI (or a number of cells at the same time by DCLI), using SRVCTL and CRSCTL, and via plain-old SQL commands. But the easiest approach, hands down, is to use Oracle Enterprise Manager Grid Control. The graphical interface simply makes the command interface intuitive and easy to read. If you already have a Grid Control infrastructure, it makes the decision very easy – you just add the Database Machine to it. If you don’t have Grid Control, perhaps you should seriously think about it now.

Setup

To manage the Database Machine via Grid Control, you need:

• A plugin for storage server management for GC (download

from http://www.oracle.com/technetwork/oem/grid-control/downloads/system-monitoring-connectors-082031.html)

• An agent installed in each of the compute nodes of DBM. No agent need be installed on the storage cells.

• To configure access to the cells from the Oracle Management Server console machine

Let’s examine each approach in detail. The plugin is a jarfile you download and keep on the OMS server. The grid control agents are installed on the database compute nodes the same way you would have done for a normal database server. You can either push the agents from the Grid Control console, or download the agent software on the database compute nodes and install there. During the installation you will be asked the address and port number of the OMS server, which will complete the installation.

The storage plug-in is different. This is not an agent on the storage cells; rather it’s installed as a part of the OMS server. So you need to download the plugin (which is a file with .jar extension) to the machine on which you launched the browser (typically your desktop), not the storage cells. After that, fire up the Grid Control browser and follow the steps shown below.

1.

Go to the Setup screen, shown below:

2.

From the menu on the left, choose Management Plug-Ins.

3.

You need to import the plug-in to OMS. Click on the Import button.

4.

It will ask for a file name. Click on the button Browse, which will open the file explorer.

5.

Select the jarfile you just downloaded and click OK.

6.

The jarfile is actually an archive of different plug-ins inside. Choose the Exadata Storage Server Plugin.

7.

Press the OK button.

8. You will be asked to enter preferred credentials. Make sure the userids and passwords are correct.

9.

Click on the icon marked Deploy.

10.

Click on the button Add Agents.

11.

Click on Next and then Finish.

12.

From the top bar menu (shown below) click on the hyperlink Agents.

13. Select the agent where you deployed the system management plug-in.

14.

Go to the Management Agents page.

15.

Select Oracle Exadata Storage Server target type from the drop down list.

16.

Click on Add.

17. The screen will ask for the management IP of the storage server. Enter the IPs.

18.

Click on the Test Connection button to ensure everything is working.

19.

Click on the OK button.

That’s it; the storage servers are now ready to be monitored via Grid Control.

Basic Information

Once configured, your Exadata Database Machine shows up in Grid Control. The database compute nodes show up as normal database servers and the cluster shows up as a normal cluster database; there is nothing Exadata-specific in nature on those screens.

You may already be familiar with the normal Oracle RAC database management features on Grid Control, which applies to Exadata compute nodes as well. In this article we will assume you are familiar with the database management via Grid Control.

The real benefit comes in looking at the storage cells as an alternative to CellCLI. To check storage cells, go to Homepage > Targets >All Targets. In the Search dropdown box choose Oracle Exadata Storage Server and press Go. The resulting screen that comes up is similar to the picture shown below. Note the names have been erased.

This is a full rack Exadata so there are 14 cells, or storage servers named with a prefix _Cell01 through _Cell14.

Clicking on each of the hyperlinked cell names takes you to the information screen of the respective cell (also known as Storage Server). Let’s click on cell#2. It brings up the cell details similar to a screen shown below:

This screen shows you three basic but very important metrics:

• CPU Utilization %-age

• Memory Utilization %-age

• Temperature of the server

The screen gives only a glimpse of these three metrics for the last 24 hours or so. Note that memory utilization has been constant, which is pretty typical. Since this is a storage cell, the only software that runs here is Exadata Storage Server, not the database sessions which tend to fluctuate in memory usage. The temperature has been pretty steady but spiked a little at around 6AM. The CPU however may show a wide fluctuation because of the demand on the ES software as a result of cell offloading (described in detail inPart 1). If you click on CPU Utilization, you will see a dedicated screen for CPU utilization as shown below.

It shows a more detailed graph on the CPU busy-ness across a more granular scale. You can choose to get the refresh frequency of the metric by choosing from the drop-down menu View Data near the top right corner. Here I have chosen "Real Time: 5 Minute Refresh." It will get the data in the real time but refresh only every 5 minutes.

That is why you will notice the graph is non-smooth. You can also choose a longer period such as “last 7 days” or even “last 31 days” but that information is no longer real time.

You can also choose the same information from Reports (described later in this installment). Clicking on Temperature and Memorygraphs will provide similar data.

Cell Information

Since the cell contains hard drives, which are mechanical devices and generate a lot of heat, it will fail if the temperature is too high. The cells contain fans to force the hot air out and to reduce the temperature. There are 12 fans in each cell. Are they working? There are two power supplies in the cell for redundancy. Are both working? To know the answer to these questions and get other information on the cell, click on the hyperlink View Configuration to bring up a screen like the following.

The information shown here is pretty similar to what you would have seen in the list cell command in CellCLI.

Let’s examine some of the interesting information on this screen and why you would like to know about them:

• Status – of course you want to know if this cell is even online and servicing traffic.

• Fan Count – number of working fans

• Power Count – the number of working power supplies (there should be two)

• SMTP Server – the email server that is used to send emails. Very important for alerts.

• IP Address – IP of the cell

• Interconnect Count – the number of cell interconnects. Default is three.

• CPU Count – the number of CPU cores.

This is just a summary of the cell. The objective of the cell is to provide a storage platform for the Exadata Database Machine. So far, we have not seen anything about the actual storage in the cell. To know about the storage in the cells, look further down the page. There are five sections:

Grid disks

Cell disks

LUNs

• Physical disks

• I/O Resource Manager

Note the set of 5 hyperlinks toward the top of the screen that will direct you to the appropriate section:

Instead of looking at the sections in the way they have been laid out, I suggest you look in the order they are created. For instance a cell contains physical disks, from which LUNs are created, from which cell disks are created, from which grid disks are created. Finally, IORM (I/O Resource Manager) is used on the disks. Let’s look at them in the same logical order.

Physical Disks

To know about the physical disks available in the cell, this is the section to look at. Here is a partial screenshot.

If this looks familiar, it’s because this is formatted output from the CellCLI command LIST PHYSICALDISK DETAIL or LIST PHYSICALDISK ATTRIBUTES. The important columns here to note are:

• LUN/s – the LUN which is created on this physical disk

• Error Count – if there are any errors on these physical disks

• Physical Interface – SAS for hard disks, ATA for flash disks. Some older Exadata Database Machines may have SATA hard disks and this column will show “sas”.

• Size – The size in GB of the physical disks

• Insert Time – the date and timestamp when these records were recorded in the repository.

LUNs

Next, let’s look at the LUNs in the cell by clicking on the Cell LUN Configuration hyperlink.

Again, this is the graphical representation of the LIST LUN ATTRIBUTES command. Note that the flash drives are also listed here, toward the bottom of the output. They are named with a prefix FD in the Cell Disk, unlike

“CD” in case of regular SAS hard disks. LUNs are created from hard disk partitions, which are presented to the host (the storage cell) as a device. The device ID shows up here, along with the cell disks created on the LUN.

The two columns that you should pay attention to are “Status” and “Error Count”.

If you see just 10 LUNs, click on the dropdown box and select all 28.

Cell Disks

Cell disks are created from the physical disks. In CellCLI, you got the list by using the LIST CELLDISK

command. In Grid Control, clicking on the hyperlink Celldisk Configuration will give show the list of cell disks in a graphical manner, similar to one shown below. Like the previous section, click on the drop-down list to choose

“All 28” to show all 28 cell disks on the same page. Again, you should pay attention to “Error Count” and “Status”

columns.

Grid Disks

Grid Disks are created from the cell disks, and are used to build the ASM diskgroups. Obviously, these are the actual building blocks of the storage the ASM instance is aware of. Typically there is a one-to-one relationship between cell disk and grid disk, but if there is not one in your case, you may see less available space in your ASM diskgroup. Clicking on the Griddisk Configuration hyperlink brings up the screen similar to the following:

The key columns in this screen to look at are:

• Status – if the grid disk is active. If inactive, it’s offline and ASM can’t use it.

• Size – does the size sound right? Or was it created with less size from the cell disk? In that case the ASM instance will see that reduced size as well.

• Error Count – have there been errors on this grid disk.

This brings us to the end of the basic building blocks of the storage cells, or storage servers. Using these graphical screens or the CellCLI commands you can get most of the information on these building blocks.

Now let’s get on to some of the more complex data needs: to gather the performance and management metrics on these components. For example, is the cell disk performing optimally, or are there any issues with the grid disk that may result in data loss?

Metrics

The best way to examine the metrics for these components is to explore them in the dedicated Metrics page in Grid Control. Note the set of hyperlinks toward the bottom of the page and click on All Metrics. It will bring up a screen as shown below.

Remember, metrics are gathered and stored in the OMS server and then presented as needed. Since they are stored in the OMS server, they affect the performance of the Exadata Database Machine little or none at all. The screen above shows the frequency these metrics are collected (15 minutes, 1 minute, etc.). Some of the metrics are real time (flash cache statistics for example); others do not have any schedule (such as error messages and alerts). The collection frequency is displayed under the column “Collection Schedule”.

When the metrics are collected, they are uploaded to the OMS server’s database. The screen above also shows how soon the metrics are uploaded after collection (the column “Upload Interval”), along with the date and time the last upload occurred (“Last Upload”).

Metrics can also be used to trigger alerts. For instance, if the cell offload efficiency falls below a certain threshold, you can trigger an alert. We will see how these alerts are set up later but the alerting threshold is shown in this screen as well, under the column “Threshold”.

The screen shows the metrics collected by Grid Control and the frequency of the collection. If you want, you can see all the sub categories and specific metric names by clicking on Expand All.

Let’s focus on only one category – Category Statistics. Click on the “+” before the category to show the various metrics; you will see a screen similar to the following:

Click on Category Small IO Requests/Sec.

Click on the eyeglass icon under the column “Details”.

Like all the other screens you saw earlier, you can choose a different refresh frequency by choosing the appropriate value from the drop-down menu “View Data”. Alternatively, you can choose historical records by selecting Last 24 hours, or Last 7 days, etc.

Reports

Reports are similar to screens reporting metrics but with some important differences. Some reports could be better for viewing certain metrics but the most valuable use of reports is to compare metrics across several components at a glance.

From the main Enterprise Manager page, choose the Reports tab from the ribbon of tabs at the top. Click on the Report tab and scroll down several pages to a collection of reports named “Storage”, as shown below:

This section shows all the metrics you might have seen earlier, but with a major difference: The reports are organized as independent entities, which can report the data on any component -- not driven from the details screen of the component itself. For instance, click on the hyperlink Celldisk Performance. It brings up a screen asking you to choose the cell disk you are interested in rather than choosing the cell disk screen first and then going to the performance metrics of that cell disk. You can choose a cell disk from the list by clicking on the flashlight icon (the list of values) and it brings up a screen as shown below.

If you click on one of the hyperlinked cell disks Cell Disk, you will see the data for that cell disk.

Grid Disk Performance

As you learned in Part 2 of this series, the grid disks are built from the cell disks and then used for ASM diskgroups. Thus the performance of the grid disks will directly affect the performance of the ASM diskgroups, which in turn will affect database performance. So it’s important to keep a tab on the grid disk performance and examine it if you see any issues in the database. From the report menu shown earlier, go

to Storage > CELL > Grid Disk Performance. It will ask you to pick a cell from the list. Choosing Cell 02 will bring up a screen similar to this.

The other important metric to examine is host interconnect performance:

Doing Comparisons

Sometimes you wonder how one component such as a cell disk is doing compared to other cell disks. The comparison is a very neat feature in the Grid Control. First get to the page of the metric you are interested in:

Click on the hyperlink just below Related Links shown as Compare Objects Celldisk Name/Cell Name/Realm Name.

Then click on the OK button. The comparison will appear in the screen similar to the one shown below:

The comparison shows that cell disk CD_00_cell01 has a higher write throughput compared to the other two.

Metrics on Realms

A realm is a set of components enclosed within a logical boundary. Realms are used to separate systems for different usage – for instance you may create a realm each for database A, database B, etc. By default there is a single realm. Since all the cells belong to the realm, metrics for a realm go across the cells and are not limited to the current cell alone.

Let’s see an example. If you pull up the LUN Configuration report, you will see the following:

It shows 28 records since there are 28 LUNs in this cell. The scope of this report is the current cell, not the others. When you pull up the Realm LUN Configuration, it will show the LUNs of all the cells, as shown below.

Note the difference: 392 rows instead of 28. All the LUNs from all the cells have been shown here.

Choosing realms for reports can allow you to get the picture for a specific realm immediately. If there is only one realm, then all the cells are pulled together in the report. Consider a very important report – Realm

Performance, shown below.

It shows you the metrics across all the cells in that realm. This is an excellent way to quickly compare various cells to see if they are uniformly loaded.

On the same line of thought, you may want to compare performance of the components of the cells, not just the cells. Choosing Realm Celldisk Performance from the View Report dropdown menu shows the performance of the cell disks across all the cells, not just that cell.

However, with the sheer amount of data in the report (392 rows), it may not be as useful. A better option may be is to compare the Cell Disk #1 of all the cells. You can do that by using a filter in the Celldisk field.

This shows the performance of the first cell disks of all the cells. From the above output, it is clear that Cell 05 is seeing some action while the others are practically idle. Since you don’t operate at the cell level, it can be assumed that the data distribution may not be uniform. Cell #5 may happen to have the most number of storage blocks.

Conclusion

In this installment you learned how to examine metrics from the storage cells using two different interfaces: the command-line tool CellCLI and Enterprise Manager Grid Control. Both have their own appeals. While the visual advantage of the Grid Control is undeniable, the command line tool may be useful in developing scripts and repeatable processes. The Exadata Database Machine comes with a lot of predefined alerts that can be

configured via thresholds to make sure they trigger when the metric values fall outside the pre-specified boundary conditions.

This brings us to the end of this series. I hope you developed enough skills in this series to make the transition

This brings us to the end of this series. I hope you developed enough skills in this series to make the transition

In document Exadata (Page 70-88)

Related documents