jLuger.de - Fixing bad performance numbers

The blog title seams a little bit strange but it has an interesting story behind. Our application (JEE app on a Weblogic Application Server called by a rich client) has a build in monitoring component. It was added on demand of our head of department. So he wanted to view some of the numbers it produced and these numbers were bad. After I've got some time left I was ordered to improve them. As the only goal was to satisfy management (no user complaint) I don't want to call it performance tuning. But be assured that this won't be an article about number faking.

So why do I refuse to call it performance tuning? Well first you need a business goal you want to reach with performance tuning. Do you want to have more transaction on existing hardware, allow your employees to be more productive, don't want your customers to buy new hardware, and so on. It doesn't help if you reduce CPU usage when the user is annoyed by the time a screen transition needs.

There is of course more than a goal you need:
Sending random requests over the heavily used company network from a computer where the antivirus runs regularly berserk to a shared application/database server will not allow you to measure any improvements. On the other hand this procedure is cheap and can get you results (doesn't mean the right one) really fast. Doing it right takes a lot of time and resources.

So where to start? If you are doing it right you need to get a usage profile. If you are doing it wrong you start building the test client. Our numbers were so bad that I could start directly with investigating the monitoring component. Most execution times were OK but we had too much in the range of weeks to be ignored. On the other no timeout in a JEE application would allow execution time in the range of weeks. So I started to go over the log output for error messages and found them in an "unimportant" log file. There were errors about measurement end calls without a start. The way we included the monitoring didn't allow this error. So I inquired the source for the component. It used internally two lists that weren't synchronized correctly.

After I've fixed the synchronization I've built a test client. That was a program that called some server methods in a loop several times (the cheap and wrong way of a test client). The error messages from the monitoring component were gone and all execution times looked sane.

I was still unhappy with the monitoring component as all the synchronization serialised the execution of the EJB methods. So I've had a meeting with the maintainer of the component. We removed some code that was no longer needed and thus we could change one list into a thread local variable. That way we got rid of a lot of synchronization.

The execution time decreased in a order of one magnitude. That sounds like a lot but remember that I didn't use a proper test client. The for loop probably caused a lot of caching. Maybe the synchronization wasn't such a problem in the real world. But don't use this statements as an argument for not testing your changes. It showed that my changes didn't introduce a new bottle neck. The overall direction was right.

As the next step I made a code review for those methods that had the worst numbers. In the top one I've found immediately a large bunch of database queries. The majority of them was the same query with altered parameters. This means that several network connections were opened and the database had to go several times over the same data. A perfect waste of resources. I've changed the query so that the altered parameter was part of the result set. The review of the other methods showed some ignorance about databases. In order to count the number of rows all data were loaded and the size of the result list was taken. I've changed this to use a query with a count in it. The effect of this changes was measurable but not that much as the numbers were sane already before.

In the next step I've added some indices to the database for all those columns that where part of the where clause. It had almost no effects. I wondered why and wondered how I could get more information. One way is to look at the execution plan of a query. A coworker told me that I can get a nice view of the execution plan in SQL Developer. I just have to enter the sql query and then press F6. This showed me that the Oracle database didn't combine indices. Creating one index with all the columns in it lowered the expected costs and tests showed another reduction in execution time.

Until now I had used a lot of tricks to speed up my application but all that was focused on some parts that I declared as hot spots. To get more tangible speedup I needed a systematic check for hot spots. As all tricks were about the database it was natural to put on the sql logging of hibernate. That way I will see what statments are executed when I move through the GUI.

It didn't took long to get some good results. Guess how many times you have to load a customer to edit the data when the GUI has three tabs for it. Well, it was three times. Once for the data itself, and another twice times for the detail data. Why to load an object that you already have? Well the loading of the details was in other EJBs and there is no object to transfer all data for the view. (Note: We are talking about a rich client that is executed on the computer of the user several hundred kilometers away from the sever) So for each tab a server call has to be made. And in the server nobody bothered to write custom loaders for the details. Instead the customer was loaded for the customer id and then they fetched the details as lists of it (Imagine customer has list of adresses as attribute. Then they would load the customer to call getAdressess() method on it). That was fine object oriented programming but it caused a lot of network overhead. I didn't fix it as the expensive part were the two additional EJB calls from the client to the server and I would had to change the architecture to fix that. Changing architecture is far way out of the scope of a cheap performance tuning.

I didn't introduce some new caches in the application but generally they are another good tool. You may try to introduce them in your application.

When you go through performance tuning on the web you may hear about choosing the right collection classes, O notation of algorithms or so on. When you develop a normal enterprise application don't care. Well, at least until you have used all the tricks I've described above. A "can't change architecture" doesn't count as used. In 99,9% the usage of the right collection class won't influence performance as much as one additional remote call over a WAN. Always remember http://en.wikipedia.org/wiki/Memory_hierarchy.