PDA

View Full Version : Out of Memory


roundqube
05-01-2010, 16:17
http://java-monitor.com/postedimages/64faac28-226b-4d76-85ed-9b2d10dbecdf.png

I have to restart the openfire service (3.6.3) every morning at 3am via a cron job because it continues to eat up memory until it runs out. I have 489MB allocated to the java heap and this is running on CentOS 5.3.

Can anyone tell me how to begin troubleshooting this? Graph attached.

roundqube
05-01-2010, 16:33
http://java-monitor.com/postedimages/8eb6a32b-a9ab-4aee-a732-ccbfd288c834.png

Is this graph depicting my ongoing issues of running out of memory every few days? We have about 200 ppl on Spark (openfire) and I have to restart the service every few days.

kjkoster
05-01-2010, 20:05
Dear roundqube,

From this graph and from the GC times you posted I am not worried at all. 400ms pause times are ok and you only use about 15% memory in the graph below.

I think it would help if you would show us the graph from when things went wrong. Either when the memory is completely full, or the GC times run over (say) 1.5 seconds.

Also, what command line arguments are you using? Any memory or GC tuning settings in there?

Kees Jan

roundqube
05-01-2010, 20:08
command line arguments:
/opt/openfire/jre/bin/java -server -DopenfireHome=/opt/openfire -Dopenfire.lib.dir=/opt/openfire/lib -classpath /opt/openfire/lib/startup.jar -jar /opt/openfire/lib/startup.jar

No memory or gc tuning. Not sure where/what I need to tune.

kjkoster
05-01-2010, 20:09
Dear roundqube,

The command line looks sane to me too.

You say you have to restart the service. Why do you have to do that? Any errors or exceptions and stack traces in the logs?

Kees Jan

roundqube
05-01-2010, 20:10
I will disable the restart of Spark tonight so I can gather more statistics to view tomorrow.

roundqube
05-01-2010, 20:12
I have to restart the service otherwise the memory keeps growing until the service stops responding. I have disable the restart for tonight so I can gather more statistics for you to view tomorrow. Usually with our user base, it takes about 2-3 days before the service stop due to lack of memory.

kjkoster
05-01-2010, 20:13
Dear roundcube,

Please take a heap dump as part of restarting the service and then load that dump into a memory analyser. Check out jmap -heap for that.

You might want to consider adding –XX:+HeapDumpOnOutOfMemoryError (http://java-monitor.com/forum/showthread.php?t=273) to the command line options and then *not* restart it automatically but let it run out of memory. That way you get a heap dump when the problem is largest.

Kees Jan

admin
05-01-2010, 20:17
Merged threads.

roundqube
05-01-2010, 20:39
I'm assuming for my OpenFire server, I should edit the /etc/sysconfig/openfire and add the following?

OPENFIRE_OPTS=–XX:+HeapDumpOnOutOfMemoryError

What does the -XX mean? Should I tweak the memory as well? There are only DNS (internal), DHCP and OpenFire running on this server, it has 2G of memory. 490MB allocated to the JVM.

kjkoster
05-01-2010, 20:58
Dear roundcube,

I am a FreeBSD dude, so I don't know where to add the flags on a Linux machine. At any rate, I usually use ps(1) (http://www.freebsd.org/cgi/man.cgi?query=ps) to see if the flags were added to the command line properly. On FreeBSD I would do the following to see if my Java app has the right flags. Check your ps(1) man page for the correct syntax on Linux.

$ ps -alxwww | grep java
1004 37727 37724 0 -8 0 3340 988 piperd S+ p0 0:00.00 grep java
1004 89302 1 0 44 0 139512 38264 ucond S p0- 2:49.33 /usr/local/jdk1.6.0/bin/java -Djava.util.logging.config.file=/home/collector/collector/apache-tomcat-6.0.20/conf/logging.properties -Djava.awt.headless=true -server -Xmx24M -XX:MaxPermSize=32M -XX:+HeapDumpOnOutOfMemoryError -Dsun.net.inetaddr.ttl=3600 -Djava.util.logging.manager=org.apache.juli.ClassLo aderLogManager -Djava.endorsed.dirs=/home/collector/collector/apache-tomcat-6.0.20/endorsed -classpath :/home/collector/collector/apache-tomcat-6.0.20/bin/bootstrap.jar -Dcatalina.base=/home/collector/collector/apache-tomcat-6.0.20 -Dcatalina.home=/home/collector/collector/apache-tomcat-6.0.20 -Djava.io.tmpdir=/home/collector/collector/apache-tomcat-6.0.20/temp org.apache.catalina.startup.Bootstrap start
$ _


The -X marker means that this is a non-standard command line flag. So this is specific to the Sun JVM implementation. If you were to switch to an IBM or Oracle JVM, you'd have to review all of the -X flags to see how that VM is configured for the same thing.

The -XX flag means that Sun is not giving any guarantees that that flag will exist in future versions of their VM. So use them while they work. :-)

Kees Jan

roundqube
06-01-2010, 16:17
I left my Spark server running and now it's showing 161MB used. This will continue to climb until the service is started. Can you take a look at the graphs now and help me figure out if there is an obvious issue?

Before my next restart, I'm going to add the options for heap dumping to my service before it starts.

roundqube
06-01-2010, 19:49
Threads continue to climb as well. We only have 40-50 active sessions during the day and between 20-30 at night. Image is attached.

kjkoster
06-01-2010, 22:53
Dear roundcube,

Ahhh, now that is an interesting data point. You have a thread leak somewhere. I got the advise from an Openfire admin to look at the thread dump of this Openfire and then look for threads that are named "client-<something>".

Do you have any custom components or plugins in the server? What plugins are you using?

If you have no experience with thread dumps, have a look at http://java-monitor.com/forum/showthread.php?t=317 and http://java-monitor.com/forum/showthread.php?t=616 where I discuss thread dumps in some detail. While not fully tuned to your situation, I'm sure they will put you on the right track.

Let us know what you find.

Kees Jan

PS. No, I cannot look at the graphs of your servers. They are private to you, as they should be (I think). If you see something interesting you can post snapshots on the forum, as you have been doing.

I appreciate it is a bit more work for you to work this way. The advantage of this is that future forum visitors will see a complete record of our conversation, with graphs, and compare that to their own situation.

kjkoster
10-01-2010, 20:20
Dear roundcube,

Did you find anything in the thread dumps?

Kees Jan

roundqube
10-01-2010, 20:57
I was unable to find any more information. I ran a jstack against my openfire pid and its results has nothing with "client-<something>" for me to look further into. I think the same results are dumped into nohup.out by default by Openfire server so you dont have to manually run a jstack trace. In any case, I did run it just to be sure and it had the same data as nohup.out file which did not contain any "client-<something>" info either.

I even set xmpp.pep.enabled = false because I read it might have to do something with this. When I disabled it, my server graphs changed a bit. Now much less mark sweep gc and steady scavenge collection. Although my heap mem is still growing. It's currently @ 300MB after 3 days of running. Thread count is at 2700! I followed the java debugging tutorial but could not find out much.

As a last resort, I switched to the JVM installed on my machine rather than the one that comes with Openfire and still no luck.

I'll post some actual statistics when I get to work tomorrow.

roundqube
11-01-2010, 22:29
Here are the graphs I spoke about yesterday.

kjkoster
12-01-2010, 08:55
Dear roundcube,

Well, the graphs just underline clearly that you have a memory leak. Since the thread count is rising, that is a likely candidate as the root cause.

If you make a thread dump, you should see a massive number of threads that are in the waiting state. Of the threads that are in waiting state, read through their stack traces. After a few, you will see that almost all of them have the same stack trace. If you could please post the stack trace that occurs the most number of times, we can perhaps help you analyse it.

Do you have any Openfire plugins installed?

Kees Jan

roundqube
12-01-2010, 16:17
How do I do a thread dump? I may already have this. I ran a jstack against the pid. I only have threads in IN_NATIVE or BLOCKED state. Attached are the logs.

Plugins installed:
Asterisk-IM Openfire Plugin
Broadcast
Client Control
Email Listener
Fastpath Service
Fastpath Webchat
Monitoring Service
MotD (Message of the Day)
SIP Phone Plugin
Search
java-monitor

kjkoster
19-01-2010, 21:19
Dear roundcube,

That thread dump does not show the 2700 threads you spoke of earlier. Take one when you are close to restarting.

And indeed, you can run jstack to get the thread dump. You should get a much bigger file if you have 2700 threads, though.

If there are any plugins that you can live without for a while, try running without them for a few days. Maybe one of them has a thread leak.

Kees Jan

roundqube
19-01-2010, 21:41
I will do that. However, the reason my graphs dont show those high threads anymore is because I've been rebooting the service nightly otherwise it dies every few days in the middle of the business day.

I've disabled restart for tonight so thread count can get high again and I'll run a jstack against the openfire pid tomorrow and post my results.

kjkoster
19-01-2010, 21:44
Dear roundcube,

Cool, awaiting your stack dump. :)

Kees Jan

roundqube
20-01-2010, 17:00
Attached is my stack dump.

kjkoster
20-01-2010, 22:52
Dear roundqube,

After a little experimenting I came to the selection below.


$ cat stackfile2.txt | fgrep -v 'java.lang.Object' | fgrep -A 1 'state =' | fgrep -v 'state =' | grep -v \^-- | sort | uniq -c
3 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
1 - java.net.PlainSocketImpl.socketAccept(java.net.Soc ketImpl) @bci=0 (Interpreted frame)
5 - java.net.SocketInputStream.socketRead0(java.io.Fil eDescriptor, byte[], int, int, int) @bci=0 (Compiled frame; information may be imprecise)
5 - java.util.TimerThread.mainLoop() @bci=201, line=509 (Compiled frame)
20 - org.jivesoftware.smack.PacketReader.processListene rs(java.lang.Thread) @bci=82, line=302 (Compiled frame)
576 - org.jivesoftware.smack.PacketReader.processListene rs(java.lang.Thread) @bci=82, line=302 (Interpreted frame)
4 - org.mortbay.thread.QueuedThreadPool$PoolThread.run () @bci=327, line=541 (Interpreted frame)
36 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
7 - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 (Interpreted frame)
$ _


Basically, I check to see what most threads are blocking on, while ignoring Object.wait(). This shows that almost all of you threads are blocked in org.jivesoftware.smack.PacketReader.processListene rs(). This means that something in your Openfire server is blocking your server from processing all of the listeners that have registered for that packet. Or maybe a listener is supposed to reply and it does not. I cannot tell from here.

This suggests that you are running into a bug somewhere in one of your plugins. (I suggested before that you minimize their number and to test running without each of them for a while to see).

Is there anything in the Openfire logs that might seem related? Any exceptions?

Kees Jan

guus
23-01-2010, 15:31
There appears to be a problem in your setup that involves code from the Smack library. Smack isn't used by Openfire itself, so I assume that you have a plugin that uses Smack internally. I have checked all of the plugins that ship with Openfire - none of them appear to use the Smack library.

One known plugin that uses Smack is the Kraken plugin - this is a third party plugin though.

Are you perhaps running code (either a plugin, or another type of extension) that you have written yourself and that uses Smack? If so, try disabling that code and see if your problem disappears.

On a side note: Smack is a client library - it is primarily intended to be used to handle client-sided connections. Typically, you wouldn't use this code on the server side (the Kraken plugin is rather exceptional in this case). I'm not sure why Smack is used in your environment, but chances are good that it can be replaced by something that's more effective.