Tuning seemed easy when I started working with Oracle databases. There were some rules of thumb in the form “if you observe certain values for statistic X then change parameter Y or Z”. Unfortunately the people using the databases sometimes didn’t see any change in performance even if the statistic clearly showed an improvement.
Years later Cary Millsap developed a scientific approach he called Method R. The approach is not specific for Oracle, databases or even technical systems but can be used almost anywhere where you want to monitor performance and improve a process. This article is a short introduction to advocate the method.
According to the Method R FAQ there are four simple steps:
- Identify the task that’s the most important to you.
- Measure its response time ®. In detail.
- Optimize that response time in the most economically efficient way.
- Repeat until your system is economically optimal.
Let me illustrate the key aspects of this (apparently) easy description:
We apply the method by looking at the most important task. This seems obvious but my experience shows that people tend to request an improvement of everything at once. Improving everything at the same time is like juggling: you need a lot of experience to do it or things will be on the floor pretty soon. We will see shortly that in this case there is another reason to focus on one issue at a time. It is also key to have a business task defined by you or your users as the center of investigation and not some performance statistic. See my blog entry on hitrates for an explanation why this may go down the drain.
The second step is to identify the response time of this business task in detail. In detail means you will need an end-to-end analysis of the different components of the response time. On a high level there might be a web server, an application server and a database connected via a network. On a low level you might want to look into the database and determine the duration of different SQL statements concerning parse time, CPU execution time, CPU time for sorts, waiting on I/O or waiting on locks. Unfortunately this is going to be the hardest part because most software probably will not give this low level insight. Read more on the Psychology of Instrumentation.
Drawing your attention to one business task at a time has also the side effect of disallowing the use of system wide statistics. These system wide statistics might imply a conclusion for an obvious performance problem when in reality you don’t know whether the current business task actually suffers from that apparent problem. This is an important difference to the conventional tuning method. Fixing the obvious problem might improve your system but that does not help if your key business tasks are still running too slow.
With the detailed breakdown of the response time you have one critical piece of information that is missing most of the time. It will allow a quantification of what gain can be expected at most by a certain change. If only ten percent of the response time is CPU time then it is impossible to achieve even a ten percent improvement by just buying the fastest CPUs available. So in contrast to conventional performance tuning projects you now know the impact before the money is actually spent.
With the response time breakdown you can now prioritize your improvement actions. Amdahl’s Law tells us that an improvement of the response time component with the largest fraction will create the biggest benefit. Therefore you should look at the components in descending order of the measured response time.
This leads us to the third step of the method. Now we can apply all the knowledge about system and database tuning we have. This is actually a good example of what I tried to make clear in my article on Effectiveness vs. efficiency. First you need to identify the component where an improvement will be most effective and then you can use an efficient way to do it.