We have had a deployment of OpenMRS running on Ubuntu server for about 5 months. It has been running well.
We are using tomcat7
This morning the cpu was maxed out and the webserver was very sluggish. I logged in and checked ‘top’ and saw that java was using 90-99% of my cpu on the server.
I restarted tomcat7 but this didn’t solve the problem.
I rebooted the server and it solved the problem.
Any thoughts on what could have caused this?
’
Here is the output of top command.
Sifting through the log from this morning I found nothing of note before I tried to restart Tomcat, but I found these from after I restarted TOMCAT7
SEVERE: IOException while loading persisted sessions: java.io.StreamCorruptedException: unexpected EOF in middle of data block
java.io.StreamCorruptedException: unexpected EOF in middle of data block
at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2539)
at java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2615)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2820)
at java.io.ObjectInputStream.readInt(ObjectInputStream.java:971)
at java.util.HashMap.readObject(HashMap.java:1154)
at sun.reflect.GeneratedMethodAccessor504.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.catalina.session.StandardSession.readObject(StandardSession.java:1595)
at org.apache.catalina.session.StandardSession.readObjectData(StandardSession.java:1060)
at org.apache.catalina.session.StandardManager.doLoad(StandardManager.java:282)
at org.apache.catalina.session.StandardManager.load(StandardManager.java:202)
at org.apache.catalina.session.StandardManager.startInternal(StandardManager.java:489)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5476)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:632)
at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1073)
at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1857)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "Thread-212" org.openmrs.api.APIException: contextDAO is null
at org.openmrs.api.context.Context.getContextDAO(Context.java:173)
at org.openmrs.api.context.Context.closeSession(Context.java:737)
at org.openmrs.api.context.Daemon$4.run(Daemon.java:215)
Apr 15, 2015 9:33:32 AM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/openmrs] appears to have started a thread named [Thread-2] but has failed to stop it. This is very likely to create a memory leak.
Apr 15, 2015 9:33:32 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks
SEVERE: The web application [/openmrs] created a ThreadLocal with key of type [org.springframework.core.NamedThreadLocal] (value [Transactional resources]) and a value of type [java.util.HashMap] (value [{org.hibernate.impl.SessionFactoryImpl@271e65fe=org.springframework.orm.hibernate3.SessionHolder@1594c6f6}]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
By analysing the logs, i suspect it could be a threading issue with hashMap
usage.
Could you please mention the OpenMRS version and the modules you are using
in your setup. So that would help to rectify the issue. Also if possible
please submit the full tomcat log.
I know it doesn’t help you know now, but I few tricks that have worked before to diagnose this kind of behaviour:
Take thread dumps of the java process. If in linux or mac, send a kill -3 <pid>, let’s say every 30 seconds. The thread dumps will appear on the tomcat logs, the one for stdout. A tool like https://java.net/projects/tda can help investigate what the threads were actually doing or waiting for, as well as deadlocks. It’s a non-blocker process, so it’s quite alright to run it anytime, even if there’s a chance of recovering.
It’s nice to configure the the JVM to print information about its garbage collection. A set of useful options:
-XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintConcurrentLocks -verbose:gc. Every invocation to the garbage collector will print a line on the logs, and you can see if it’s doing a lot of full GCs and not freeing any memory (I’m used to call it ‘GC death’ or ‘GC catalepsy’).
Hi Adam. I have seen similar behavior when reports are running. You might see if you have any reports running and/or check the Tomcat log, catalina.out, should this happen again.
@cintiadr@arbaughj You can take a thread dump using jstack -l JAVA_PID > jstack.out . Also if you running on jdk 1.7.50 or above jfr dumps also very useful when analyzing these situations. If you can provide the thread dump or jfr recordings, it will really helpful for analyze the issue.
Hi Adam,
Its a technical term & it just came to my mind by looking at the log, so dont worry much about it. (Basically hashMaps are used in the code.)
I think as suggested by everyone, now what you could do is: Follow these steps (before restarting the servers or tomcat) whenever occur this again. So we can dig up into source of the problem.
in stack dumps look for thread with the matching HEX PID.
After that you can report us with your findings / File a ticket on OpenMRS jira. Otherwise its very difficult to exactly say what causing this problem. Other than OpenMRS system issues, it also could be your Ubuntu server configuration error/issues of other installed applications/Java-JRE issue/etc.
Good job! But i think you have missed one step. (4th step: get stack dump of java process)
What was the output of: jstack PID
for example in this case: jstack 1326
UPDATE: Adam it would be better to get multiple dumps in regular intervals. I have updated the instructions here: https://wiki.openmrs.org/display/docs/Troubleshooting+High+CPU+Utilization#Getting stack dump of Java process.
Looks like the heap is full, probably cause of thrashing or maybe the garbage collector cannot free objects fast enough and instead of throwing an out of memory tries to stay alive. If it’s a production environment 150Mb is very small, give it some Gb. The more the merrier.
Heap
def new generation total 157248K, used 152685K [0x00000000d0000000, 0x00000000daaa0000, 0x00000000daaa0000)
eden space 139776K, 100% used [0x00000000d0000000, 0x00000000d8880000, 0x00000000d8880000)
Not anymore. The newer JVMs default to -server when it finds 64-bit OS or more than a GB of RAM. I agree what @lluismf suggested above is necessary for OpenMRS. Larger heap and PermGen space. In addition, try clearing the logs. The StreamCorruptedException might be a due to a bad client requests. That might be another cause of 100% CPU usage. Please come back to report what resolved the problem