Enconter.getObs leads to Unresponsive OpenMRS in EncounterEventListener with bulk encounter post

Tags: #<Tag:0x00007fe7ee63e150> #<Tag:0x00007fe7ee63e060> #<Tag:0x00007fe7ee63df70>

Hello everyone,

In our project, we are using openmrs-event-module to listen for the encounter creates and performing some task for a specific encounter type in a daemon thread. Below is all the code we have in the onMessage of the EventListener:

@Override
public void onMessage(Message message) {
	Daemon.runInDaemonThread(new Runnable() {
		@Override
		public void run() {
			String uuid = ((MapMessage) message).getString("uuid");				
			Encounter encounter = encounterService.getEncounterByUuid(uuid);
			System.out.println(encounter.getObs().size());
		}
	}, daemonToken);
}

We are facing issues when we are making many post requests to create a bulk encounters(We are migrating an existing system). The creation is stopping after 50 encounters and all subsequent requests are failing with SocketException. The refapp UI is not responding as well.

If we delete the last line in method, i.e. call to encounter.getObs(), everything goes smooth. Any idea what might be going wrong, we are not able to figure out what might be going wrong.

Version of platform and modules:

<openMRSVersion>2.3.0</openMRSVersion>
<eventVersion>2.7.0</eventVersion>

CC: @mksd @wyclif @dkayiwa @mogoodrich @rrameshbtech @premnisha

This looks to be a Database Connection Leak. Once I post all the encounters, I see number of threads connected to be 51

mysql> show status where `variable_name` = 'Threads_connected';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Threads_connected | 51    |
+-------------------+-------+
1 row in set (0.00 sec)

The show processlist shows a lot of them have commadn as sleep

mysql> show processlist;
+-----+---------+------------------+---------+---------+------+----------+------------------+
| Id  | User    | Host             | db      | Command | Time | State    | Info             |
+-----+---------+------------------+---------+---------+------+----------+------------------+
|  54 | root    | localhost        | openmrs | Query   |    0 | starting | show processlist |
|  55 | openmrs | 172.31.0.3:37402 | openmrs | Sleep   | 2185 |          | NULL             |
|  63 | openmrs | 172.31.0.3:37548 | openmrs | Sleep   |  166 |          | NULL             |
|  65 | openmrs | 172.31.0.3:37598 | openmrs | Sleep   |  165 |          | NULL             |
|  66 | openmrs | 172.31.0.3:37600 | openmrs | Sleep   |  165 |          | NULL             |
|  68 | openmrs | 172.31.0.3:37604 | openmrs | Sleep   |  165 |          | NULL             |
|  69 | openmrs | 172.31.0.3:37658 | openmrs | Sleep   |  165 |          | NULL             |
|  70 | openmrs | 172.31.0.3:37664 | openmrs | Sleep   |  165 |          | NULL             |
|  71 | openmrs | 172.31.0.3:37666 | openmrs | Sleep   |  165 |          | NULL             |
|  72 | openmrs | 172.31.0.3:37676 | openmrs | Sleep   |  165 |          | NULL             |
|  73 | openmrs | 172.31.0.3:37682 | openmrs | Sleep   |  165 |          | NULL             |
|  74 | openmrs | 172.31.0.3:37684 | openmrs | Sleep   |  165 |          | NULL             |
|  75 | openmrs | 172.31.0.3:37694 | openmrs | Sleep   |  165 |          | NULL             |
|  76 | openmrs | 172.31.0.3:37700 | openmrs | Sleep   |  165 |          | NULL             |
|  77 | openmrs | 172.31.0.3:37706 | openmrs | Sleep   |  164 |          | NULL             |
|  78 | openmrs | 172.31.0.3:37708 | openmrs | Sleep   |  164 |          | NULL             |
|  79 | openmrs | 172.31.0.3:37718 | openmrs | Sleep   |  165 |          | NULL             |
|  80 | openmrs | 172.31.0.3:37720 | openmrs | Sleep   |  164 |          | NULL             |
|  81 | openmrs | 172.31.0.3:37730 | openmrs | Sleep   |  164 |          | NULL             |
|  82 | openmrs | 172.31.0.3:37732 | openmrs | Sleep   |  164 |          | NULL             |
|  83 | openmrs | 172.31.0.3:37742 | openmrs | Sleep   |  164 |          | NULL             |
|  84 | openmrs | 172.31.0.3:37748 | openmrs | Sleep   |  164 |          | NULL             |
|  85 | openmrs | 172.31.0.3:37758 | openmrs | Sleep   |  164 |          | NULL             |
|  86 | openmrs | 172.31.0.3:37760 | openmrs | Sleep   |  164 |          | NULL             |
|  87 | openmrs | 172.31.0.3:37762 | openmrs | Sleep   |  164 |          | NULL             |
|  88 | openmrs | 172.31.0.3:37772 | openmrs | Sleep   |  164 |          | NULL             |
|  89 | openmrs | 172.31.0.3:37774 | openmrs | Sleep   |  163 |          | NULL             |
|  90 | openmrs | 172.31.0.3:37776 | openmrs | Sleep   |  163 |          | NULL             |
|  91 | openmrs | 172.31.0.3:37790 | openmrs | Sleep   |  163 |          | NULL             |
|  92 | openmrs | 172.31.0.3:37796 | openmrs | Sleep   |  163 |          | NULL             |
|  93 | openmrs | 172.31.0.3:37798 | openmrs | Sleep   |  163 |          | NULL             |
|  94 | openmrs | 172.31.0.3:37808 | openmrs | Sleep   |  163 |          | NULL             |
|  95 | openmrs | 172.31.0.3:37810 | openmrs | Sleep   |  163 |          | NULL             |
|  96 | openmrs | 172.31.0.3:37820 | openmrs | Sleep   |  163 |          | NULL             |
|  97 | openmrs | 172.31.0.3:37826 | openmrs | Sleep   |  163 |          | NULL             |
|  98 | openmrs | 172.31.0.3:37832 | openmrs | Sleep   |  163 |          | NULL             |
|  99 | openmrs | 172.31.0.3:37838 | openmrs | Sleep   |  163 |          | NULL             |
| 100 | openmrs | 172.31.0.3:37844 | openmrs | Sleep   |  163 |          | NULL             |
| 101 | openmrs | 172.31.0.3:37850 | openmrs | Sleep   |  163 |          | NULL             |
| 102 | openmrs | 172.31.0.3:37852 | openmrs | Sleep   |  163 |          | NULL             |
| 103 | openmrs | 172.31.0.3:37862 | openmrs | Sleep   |  163 |          | NULL             |
| 104 | openmrs | 172.31.0.3:37868 | openmrs | Sleep   |  162 |          | NULL             |
| 105 | openmrs | 172.31.0.3:37874 | openmrs | Sleep   |  162 |          | NULL             |
| 106 | openmrs | 172.31.0.3:37880 | openmrs | Sleep   |  162 |          | NULL             |
| 107 | openmrs | 172.31.0.3:37882 | openmrs | Sleep   |  162 |          | NULL             |
| 108 | openmrs | 172.31.0.3:37884 | openmrs | Sleep   |  162 |          | NULL             |
| 109 | openmrs | 172.31.0.3:37898 | openmrs | Sleep   |  162 |          | NULL             |
| 110 | openmrs | 172.31.0.3:37908 | openmrs | Sleep   |  162 |          | NULL             |
| 111 | openmrs | 172.31.0.3:37910 | openmrs | Sleep   |  162 |          | NULL             |
| 112 | openmrs | 172.31.0.3:37916 | openmrs | Sleep   |  162 |          | NULL             |
| 113 | openmrs | 172.31.0.3:37922 | openmrs | Sleep   |  162 |          | NULL             |
+-----+---------+------------------+---------+---------+------+----------+------------------+

Could it be reason of Daemon thread still talking to database or something like that?

Good Morning Everyone,

Here are some more updates. If I replace the line

encounter.getObs()

with below

obsService.getObservations(asList(encounter.getPatient()), asList(encounter), null, null, null, null, null,
		    null, null, null, null, false)

Which is basically searching for obs for a given patient and given encounter.

This workaround looks to be working fine. After posting 100 encs number of connected threads is just 9.

mysql> show status where `variable_name` = 'Threads_connected';                                                                                                                     +-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Threads_connected | 9     |
+-------------------+-------+
1 row in set (0.01 sec)

Any thoughts on what might be going wrong here? Do we know any problems which might occur if we go ahead with the above fix?

As of now this workaround is working for us, but on a broad level we wanted to know what is the issue and why it is not working as expected in the first approach.

Thank you

Great troubleshooting @mddubey!

I don’t know what happens for sure. Clearly Encounter#getObs() relies on Hibernate lazy loading of related Obs within the Encounter ; while ObsService#getObservations(..) goes and fetches them again afresh.

@dkayiwa @wyclif, any ideas?

@mddubey can you share a bit of your server side log?

Hello @dkayiwa,

This is how the logs are if I pick logs for one of the encounters while the tool is posting encounters:

INFO - LoggingAdvice.invoke(116) |2020-03-23 14:57:01,227| In method EncounterService.saveEncounter. Arguments: Encounter=Encounter: [(no ID) Tue Jul 30 00:00:00 UTC 2019 encounter.basicServicePlanForm (no Location) 39 encounter.basicServicePlanForm num Obs: [obs id is null, obs id is null] num Orders: 0 ],
INFO - LoggingAdvice.invoke(116) |2020-03-23 14:57:01,238| In method VisitService.saveVisit. Arguments: Visit=Visit #null,
INFO - LoggingAdvice.invoke(156) |2020-03-23 14:57:01,241| Exiting method saveVisit
INFO - LoggingAdvice.invoke(116) |2020-03-23 14:57:01,247| In method ObsService.saveObs. Arguments: Obs=obs id is null, String=null,
INFO - LoggingAdvice.invoke(116) |2020-03-23 14:57:01,250| In method ObsService.saveObs. Arguments: Obs=obs id is null, String=null,
INFO - LoggingAdvice.invoke(156) |2020-03-23 14:57:01,253| Exiting method saveObs
INFO - LoggingAdvice.invoke(116) |2020-03-23 14:57:01,254| In method ObsService.saveObs. Arguments: Obs=obs id is null, String=null,
INFO - LoggingAdvice.invoke(156) |2020-03-23 14:57:01,256| Exiting method saveObs
INFO - LoggingAdvice.invoke(116) |2020-03-23 14:57:01,257| In method ObsService.saveObs. Arguments: Obs=obs id is null, String=null,
INFO - LoggingAdvice.invoke(156) |2020-03-23 14:57:01,259| Exiting method saveObs
INFO - LoggingAdvice.invoke(116) |2020-03-23 14:57:01,260| In method ObsService.saveObs. Arguments: Obs=obs id is null, String=null,
INFO - LoggingAdvice.invoke(156) |2020-03-23 14:57:01,262| Exiting method saveObs
INFO - LoggingAdvice.invoke(156) |2020-03-23 14:57:01,263| Exiting method saveObs
INFO - LoggingAdvice.invoke(116) |2020-03-23 14:57:01,265| In method ObsService.saveObs. Arguments: Obs=obs id is null, String=null,
INFO - LoggingAdvice.invoke(116) |2020-03-23 14:57:01,268| In method ObsService.saveObs. Arguments: Obs=obs id is null, String=null,
INFO - LoggingAdvice.invoke(156) |2020-03-23 14:57:01,270| Exiting method saveObs
INFO - LoggingAdvice.invoke(116) |2020-03-23 14:57:01,272| In method ObsService.saveObs. Arguments: Obs=obs id is null, String=null,
INFO - LoggingAdvice.invoke(156) |2020-03-23 14:57:01,274| Exiting method saveObs
INFO - LoggingAdvice.invoke(116) |2020-03-23 14:57:01,276| In method ObsService.saveObs. Arguments: Obs=obs id is null, String=null,
INFO - LoggingAdvice.invoke(156) |2020-03-23 14:57:01,279| Exiting method saveObs
INFO - LoggingAdvice.invoke(116) |2020-03-23 14:57:01,280| In method ObsService.saveObs. Arguments: Obs=obs id is null, String=null,
INFO - LoggingAdvice.invoke(156) |2020-03-23 14:57:01,282| Exiting method saveObs
INFO - LoggingAdvice.invoke(156) |2020-03-23 14:57:01,283| Exiting method saveObs
INFO - LoggingAdvice.invoke(156) |2020-03-23 14:57:01,315| Exiting method saveEncounter
8

The above is one chunk on my console, where all lines barring last one looks to be related to saveEnc/saveObs and the last one is from my Listener. Since it is a direct sysout it is just printing the number.

Mostly interested in the bits of the log that have the SocketException

Hello Daniel,

As for as I have understood the problem, it has not been reaching to the server at all so there is no logs on the server.

Theoretically, it feels like the server doesn’t get a socket to accept any new connection and all the existing ones are occupied. So a Java client gives an error message ‘Socket Exception’ after timeout.

In a nodejs client I am getting something like ‘ERRCONNRCT’.

Let me know if that clarifies the issue.

Oh i see!

And the stopping at 50 encounters, rhymes with https://github.com/openmrs/openmrs-core/blob/2.3.0/api/src/main/resources/hibernate.default.properties#L16

What happens when you replace: System.out.println(encounter.getObs().size()); with just: encounter.getObs().size(); ?

That is, remove the System.out.println

1 Like

Hello @dkayiwa,

We have actually already tried increasing the property to 500 and then the all the encounters are created. By the end of the 100 encounter numbers of threads connected vary somwehere between 105-110.

What happens when you replace: System.out.println(encounter.getObs().size()); with just: encounter.getObs().size(); ?

About above, our actual use case is not doing a System.out.println and actually filtering obs based on certain concept. In the mentioned post I have put simplest reproducible state.

Note: After replacing encounter.getObs with ObsService.getObservation, we started getting the same issue for concept.getMapping. That is alsp getting solved if we replace concept.getMapping with ConceptService.getConceptByMapping so it feels like this problem occurs whenever there is a lazy load for a JOIN table.

Additionally if we mark method which is doing all this task as @Transactional that is also solving the issue.

1 Like

Does the problem still happen if you do not do this in a thread?

Hello @dkayiwa,

When I try running it without a Daemon thread, the listener is failing with below error:

ERROR - ActiveMQMessageConsumer.dispatch(1233) |2020-03-24 02:32:23,140| ID:709fe0f2807d-36885-1585016958839-2:1:37:1 Exception while processing message: ID:709fe0f2807d-36885-1585016958839-2:1:165:12:1
org.openmrs.api.APIException: A user context must first be passed to setUserContext()...use Context.openSession() (and closeSession() to prevent memory leaks!) before using the API

This is happening on the line where I am calling encounterService.getEncounterByUUID. I suppose this is expected because there is no context set when we listen to the encounter create event.

Let us know if we are missing anything. Every example code I see for EventListerner seems to be running using DaemonThread.

@mddubey the fact that splitting queries through separate transactions seems to push away the session issue is a great clue.

I wonder if you will would not be able to workaround the problem by setting the transaction propagation type accordingly at the top level of your logic.

If you find the “right” type and that it solves your issue, great ; even though that doesn’t explain why the issue happens.

I’ll discuss this with @wyclif later today.

2 Likes

Hello @mksd

Thanks for the response. We have been able to workaround it by setting up @Transactional on the very start of our logic , with default PROPAGATION_TYPE which is required.

Though we are not sure what is happening around that.

@mddubey what is the easiest way of reproducing this problem locally?

Hello @dkayiwa

I am able to reproduce this problem will be below steps:

  • Pick an existing encounter_uuid from your database.
  • Write Below code in very starting of your activator. (DaemonTokenAware)
EncounterService encounterService = Context.getEncounterService();
for (int i = 0; i < 100; i++) {
	System.out.println("Running for " + i);
	int finalI = i;
	Daemon.runInDaemonThread(new Runnable() {
		@Override
		public void run() {
			Encounter encounter = encounterService.getEncounterByUuid(<uuid-selected>);
			Set<Obs> obs = encounter.getObs();
			System.out.println("Done for " + finalI);
		}
	}, daemonToken);
}

  • Deploy the module and restart.
  • Watch the logs.

On my local it has printed Running for <i> for all 100 threads and done for finalI for any 50 threads, whichever it is picking. After that there are no logs. There is no response to UI. Number of Threads_Connected to the db are 51.

Please find the log file below.

logs.txt (2.0 KB)

1 Like