Using R to visualize Oracle metrics

Oracle 10g introduced the metrics views. Kyle Hailey has blogged about getting a quick view of the load on the system by using the wait class metrics. Using these ideas I started to work on the visualization of Oracle performance metrics using R, an open source programming language and software environment for data analysis and graphics.

Oracle Enterprise Manager and ASH are a nice toolset for performance monitoring. Unfortunately I am not allowed to use OEM at my current workplace and we have no license for the diagnostic and tuning packs either. So I had to find another way to visualize a history of database performance metrics.

For some of the metrics there is an associated history view. These views normally allow looking into the past for about an hour or so. This is obviously unusable for long term analysis but it is all I have and at least allows investigating recent incidents. So I am using the Oracle views V$SYSMETRIC_HISTORY and V$WAITCLASSMETRIC_HISTORY to get data about the recent system and wait class metrics.

The R package ggplot2 is used to generate graphs from the data while the package RJDBC handles the database connection to Oracle. Check my R script metric.R containing the functions I wrote to create the graphs shown below.

Database time and CPU time

My R function metric.time() was used for the first diagram. The graph contains two parts with the upper part showing the time used by the Oracle instance. It is a stacked graph where the violet color represents the CPU time used by the Oracle instance while the green color accounts for the time spent waiting.

The data is displayed as seconds per second. So in this case the violet color shows that every second Oracle uses about 10 seconds of CPU time, thus indicating a demand for ten CPU cores. The green color represents the time the database has accounted as wait time. Again this shows that every second there were about 50 or 60 waiting sessions on average.

The bottom part has a graph of the CPU utilization of the server. In this example the server has been busy between 45% and almost 80%. This includes not just the Oracle instance itself but all processes running on the server.

Both graphs share the time axis and therefore allow an easy correlation between CPU utilization and Oracle CPU demand.

diagram of database and CPU time

Database wait class time

My R function metric.waitclass() generates another graph. The example below (this time from another instance) shows the wait times broken down by wait classes. Wait classes are used by Oracle to group the large number of wait events and create understandable clusters. You can check V$EVENT_NAME for the mapping. The data here is also displayed as waited seconds per second. The wait class Idle has been omitted from the diagram altogether.

It is easy to see that user I/O contributes the biggest part. The wait class Network mostly contains access to remote databases over database links. Problems on the remote database would be visible here. The sudden peak in the Administrative class was caused by rman saving archived logs to tape.

diagram of database wait class time