kjkoster
04-05-2009, 21:49
Dear All,
Single processor machines are fairly hard to come by these days. Even laptops come with dual core processors. For us running Java application servers, this is a good thing, right? Java comes with out of the box threading support and great libraries that help you make programs that scale out to use whatever processor resource you may have.
As a side note, check out Cliff Click's Google talk on lock-free maps for Java (http://video.google.com/videoplay?docid=2139967204534450862). Cliff devised a lock-free map that scales practically linearly to 750 or so processors. Cliff speaks of 30 processor cores as a 'moderate' number of processors. Well, a moderate number of processors for me is 2, not 30. Cliff and I not in the same league, I guess. But I digress.
So we have this great Java application running on the brand new server with a moderate number of processors. We named the server something fitting for its perceived prowess; 'beast', 'monster' or somesuch. But when we run the application, whoops, it is not all that fast. Grr. Now what?
The first step is to see what the machine is up to. Basic performance tuning says that a given machine can be limited by one of two things: CPU or I/O. The latter further breaks down into being limited by three things: disk I/O (which is further composed of file I/O and swapping memory pages), network I/O or inter-process communication.
To see if the machine is CPU-bound or I/O-bound, we can use (on FreeBSD) top -P or (on Linux) press the 1 (one) key in top. This will show the CPU consumption on each processor.
Let's assume that the machine is cpu-bound. Not only that, let's further assume that it is using only one processor and not load all of them. Typically, top(1) (http://www.freebsd.org/cgi/man.cgi?query=top) will show that one processor is loaded at 90% to 100%, while the other processors only run at a few percent. The processor being loaded may be a different one each time. This is normal behaviour as your OS moves processes around on the machine.
If you see that no processors are being loaded, you are most likely I/O-bound. Optimising I/O-bound processes is an art in itself, so we'll not discuss this in this post.
With the process diagnosed to be CPU-bound, we now have to determine what the process is doing: executing user code or scheduling threads to fight over locks.
On the processor that is eating 100%, look at the distribution of that time. It is (roughly) split up into idle time, user time and system time. Keep an eye out for high percentages of system time on your Java process.
Java processes that consume a lot of system time spend a lot of time doing operating system related tasks. That can be all kinds of things, like opening and closing files and sockets. It can also quite well be that the threads are fighting over locks in the JVM. Such locks are also managed by the operating system and the time spent putting threads to sleep and waking them back up shows as system time in top(1) (http://www.freebsd.org/cgi/man.cgi?query=top).
If the processor is shown to consume 80%-90% user time, it is most likely spending its time on regular Java code. That means that you have probably programmed something different than you intended to.
Notice that we are still in machine-dependent-land here, even though we are diagnosing a Java problem. I can hear some of you Java developers whailing: "but should the JVM not abstract from such gross details? I have not become a Java programmer to find myself back at the command line!". Well, think of this as walking the track. Motorcycle racers walk an unfamiliar race track on foot. While walking they spot details in the tarmac that are simply not visible when you are riding at speed. So you may be Java's equivalent of Valentino Rossi: you're still walking. :-)
One final task remains on the command line, before the Java developers among you may safely retreat into the Java code again. :) We have to take a thread dumps. Depending on this problem we are diagnosing, we may have to take a few thread dumps.
Java has long supported dumping its thread states to standard output. You can do this by sending it the QUIT signal using kill(1) (http://www.freebsd.org/cgi/man.cgi?query=kill). The command line to do this is "kill -3 <pid>", where <pid> is the UNIX process ID of the Java application.
Current versions of Java come with jstack, which makes it possible to dump thread states to a separate file. If you have jstack available, you're better off using it than using kill(1) (http://www.freebsd.org/cgi/man.cgi?query=kill). The command line for jstack is "jstack <pid>", where (again) <pid> is the UNIX process ID of your running Java application.
That was the easy part. For the next part, fill up your coffee jug and close the office door. You have some reading and scripting to do.
If you are unfamiliar with Java stack traces and thread dumps, have a look at the file and browse around. Get a feel for the information that is in that file.
The actual format of the file differs from platform to platform and from Java version to Java version. The rough layout is the same, though. For each thread, you see the thread name, its state and some extra information. If the thread is blocked, the dump describes what it is blocked on or waiting for. The fact that almost all threads are either blocked, waiting or in-native is quite normal. I daresay that in a typical thread dump of a healthy application server you will not find more than one or two threads actually running. Everything else should be sleeping and not get in the way. In a way threads are like kids: you get more work done when they're all off sleeping.
Ok, let's review what we have; our imaginary application is eating up 100% time on one processor, but not does not spread the load to the other processors. It may be leaning heavily on the operating system to schedule the threads, or it may be churning on the application logic. We have a thread dump, or a few thread dumps to look at.
Now you have to ask yourself: "What am I looking for in the thread dump". In the case of an application eating loads of CPU, that question is likely to be: "what is keeping my threads from reaching the CPU?" In other words: "what are the threads doing, if not processing the logic I crafted?"
If your application is scheduling itself to death, and using up a lot of system time, the question is more specific: "what lock or locks are my threads fighting over?"
In either case, we need to find 'our' threads first. This cuts back on the size of the stack dumps dramatically. As a rule of the thumb, I remove all threads from the dump that 1) do not have any of my application code in the stack trace and 2) have a name that suggests it is library or JVM related (e.g. "derby.antiGC", "CompilerThread1", or "java-monitor collector").
With the reduced stack dump, it is a matter of sitting down with the code and your wits. From this point on you are firmly on your own, as each application is different.
While hunting, take a few thread dumps at regular intervals. Reduce them using the same steps each time and compare the resulting stack traces.
Don't go overboard and take dozens of dumps. You lose sight of the forest for the trees if you do that. Start with three or four dumps and see what you learn from them.
Go back to that question you formulated earlier: "what is keeping my threads from reaching the processor cores?" Examine the locks that your threads are blocked on. Are they in places you expected? What can you do to remove those locks?
In the code, look for signs of shotgun synchronisation. Developers who are not used to working in multithreaded programming environments sometimes lose control and start littering the code with Java's 'synchronized' keyword. A good indication of shotgun synchronisation is one or more classes that have all or most of their methods marked as 'synchronized'. Comments in the code such as: "weird things happen if we don't synchronise this class" should also be regarded as fishy.
Another tell-tale sign of bad multithreading practice is that programmers are using Java's wait/notify mechanism. With the introduction of Java 5's concurrency classes, that mechanism should no longer be used in Enterprise software. There is no need for it any longer. All common patterns have more efficient and less buggy implementations in Java's standard libraries.
Just keep returning to that all-important question: "what is keeping my threads from getting to the processor cores they so desperately desire?"
If you get really stuck and find your hair between your fingers instead of on your head, post your (reduced) stack traces on Java-monitor for us to look at and discuss. Maybe we can help?
Kees Jan
PS. Oh, and awk(1) (http://www.freebsd.org/cgi/man.cgi?query=awk) is your friend when it comes to automatically stripping uninteresting threads from the dump. :-)
Single processor machines are fairly hard to come by these days. Even laptops come with dual core processors. For us running Java application servers, this is a good thing, right? Java comes with out of the box threading support and great libraries that help you make programs that scale out to use whatever processor resource you may have.
As a side note, check out Cliff Click's Google talk on lock-free maps for Java (http://video.google.com/videoplay?docid=2139967204534450862). Cliff devised a lock-free map that scales practically linearly to 750 or so processors. Cliff speaks of 30 processor cores as a 'moderate' number of processors. Well, a moderate number of processors for me is 2, not 30. Cliff and I not in the same league, I guess. But I digress.
So we have this great Java application running on the brand new server with a moderate number of processors. We named the server something fitting for its perceived prowess; 'beast', 'monster' or somesuch. But when we run the application, whoops, it is not all that fast. Grr. Now what?
The first step is to see what the machine is up to. Basic performance tuning says that a given machine can be limited by one of two things: CPU or I/O. The latter further breaks down into being limited by three things: disk I/O (which is further composed of file I/O and swapping memory pages), network I/O or inter-process communication.
To see if the machine is CPU-bound or I/O-bound, we can use (on FreeBSD) top -P or (on Linux) press the 1 (one) key in top. This will show the CPU consumption on each processor.
Let's assume that the machine is cpu-bound. Not only that, let's further assume that it is using only one processor and not load all of them. Typically, top(1) (http://www.freebsd.org/cgi/man.cgi?query=top) will show that one processor is loaded at 90% to 100%, while the other processors only run at a few percent. The processor being loaded may be a different one each time. This is normal behaviour as your OS moves processes around on the machine.
If you see that no processors are being loaded, you are most likely I/O-bound. Optimising I/O-bound processes is an art in itself, so we'll not discuss this in this post.
With the process diagnosed to be CPU-bound, we now have to determine what the process is doing: executing user code or scheduling threads to fight over locks.
On the processor that is eating 100%, look at the distribution of that time. It is (roughly) split up into idle time, user time and system time. Keep an eye out for high percentages of system time on your Java process.
Java processes that consume a lot of system time spend a lot of time doing operating system related tasks. That can be all kinds of things, like opening and closing files and sockets. It can also quite well be that the threads are fighting over locks in the JVM. Such locks are also managed by the operating system and the time spent putting threads to sleep and waking them back up shows as system time in top(1) (http://www.freebsd.org/cgi/man.cgi?query=top).
If the processor is shown to consume 80%-90% user time, it is most likely spending its time on regular Java code. That means that you have probably programmed something different than you intended to.
Notice that we are still in machine-dependent-land here, even though we are diagnosing a Java problem. I can hear some of you Java developers whailing: "but should the JVM not abstract from such gross details? I have not become a Java programmer to find myself back at the command line!". Well, think of this as walking the track. Motorcycle racers walk an unfamiliar race track on foot. While walking they spot details in the tarmac that are simply not visible when you are riding at speed. So you may be Java's equivalent of Valentino Rossi: you're still walking. :-)
One final task remains on the command line, before the Java developers among you may safely retreat into the Java code again. :) We have to take a thread dumps. Depending on this problem we are diagnosing, we may have to take a few thread dumps.
Java has long supported dumping its thread states to standard output. You can do this by sending it the QUIT signal using kill(1) (http://www.freebsd.org/cgi/man.cgi?query=kill). The command line to do this is "kill -3 <pid>", where <pid> is the UNIX process ID of the Java application.
Current versions of Java come with jstack, which makes it possible to dump thread states to a separate file. If you have jstack available, you're better off using it than using kill(1) (http://www.freebsd.org/cgi/man.cgi?query=kill). The command line for jstack is "jstack <pid>", where (again) <pid> is the UNIX process ID of your running Java application.
That was the easy part. For the next part, fill up your coffee jug and close the office door. You have some reading and scripting to do.
If you are unfamiliar with Java stack traces and thread dumps, have a look at the file and browse around. Get a feel for the information that is in that file.
The actual format of the file differs from platform to platform and from Java version to Java version. The rough layout is the same, though. For each thread, you see the thread name, its state and some extra information. If the thread is blocked, the dump describes what it is blocked on or waiting for. The fact that almost all threads are either blocked, waiting or in-native is quite normal. I daresay that in a typical thread dump of a healthy application server you will not find more than one or two threads actually running. Everything else should be sleeping and not get in the way. In a way threads are like kids: you get more work done when they're all off sleeping.
Ok, let's review what we have; our imaginary application is eating up 100% time on one processor, but not does not spread the load to the other processors. It may be leaning heavily on the operating system to schedule the threads, or it may be churning on the application logic. We have a thread dump, or a few thread dumps to look at.
Now you have to ask yourself: "What am I looking for in the thread dump". In the case of an application eating loads of CPU, that question is likely to be: "what is keeping my threads from reaching the CPU?" In other words: "what are the threads doing, if not processing the logic I crafted?"
If your application is scheduling itself to death, and using up a lot of system time, the question is more specific: "what lock or locks are my threads fighting over?"
In either case, we need to find 'our' threads first. This cuts back on the size of the stack dumps dramatically. As a rule of the thumb, I remove all threads from the dump that 1) do not have any of my application code in the stack trace and 2) have a name that suggests it is library or JVM related (e.g. "derby.antiGC", "CompilerThread1", or "java-monitor collector").
With the reduced stack dump, it is a matter of sitting down with the code and your wits. From this point on you are firmly on your own, as each application is different.
While hunting, take a few thread dumps at regular intervals. Reduce them using the same steps each time and compare the resulting stack traces.
Don't go overboard and take dozens of dumps. You lose sight of the forest for the trees if you do that. Start with three or four dumps and see what you learn from them.
Go back to that question you formulated earlier: "what is keeping my threads from reaching the processor cores?" Examine the locks that your threads are blocked on. Are they in places you expected? What can you do to remove those locks?
In the code, look for signs of shotgun synchronisation. Developers who are not used to working in multithreaded programming environments sometimes lose control and start littering the code with Java's 'synchronized' keyword. A good indication of shotgun synchronisation is one or more classes that have all or most of their methods marked as 'synchronized'. Comments in the code such as: "weird things happen if we don't synchronise this class" should also be regarded as fishy.
Another tell-tale sign of bad multithreading practice is that programmers are using Java's wait/notify mechanism. With the introduction of Java 5's concurrency classes, that mechanism should no longer be used in Enterprise software. There is no need for it any longer. All common patterns have more efficient and less buggy implementations in Java's standard libraries.
Just keep returning to that all-important question: "what is keeping my threads from getting to the processor cores they so desperately desire?"
If you get really stuck and find your hair between your fingers instead of on your head, post your (reduced) stack traces on Java-monitor for us to look at and discuss. Maybe we can help?
Kees Jan
PS. Oh, and awk(1) (http://www.freebsd.org/cgi/man.cgi?query=awk) is your friend when it comes to automatically stripping uninteresting threads from the dump. :-)