Multi-tenant architecture / feedback

atlcto · May 31, 2019, 7:46pm

I started working on a multi-tenant branch and was interested in feedback, but I’m not sure it’s ready for a pull-request.

You can see the diff at on Github, but it’s mostly just integrating Hibernate Multi-tenancy.

It’s able to route the correct database by using the requested hostname; so for example if your domain is hospital.com and you point clinic1.hospital.com and clinic2.hospital.com to the same openmrs server, then the requests would be routed to the same mysql server, but use clinic1 or clinic2 databases depending on the hostname.

All “tenants” (clinic1&2 in the example) have to use the same modules, but can have different locations, concepts, and patients and users.

When the server starts up, one of the tenants is the “primary” tenant, or preferred, you would create a fresh openmrs database. This “primary” database name is specified in ‘connection.database_name’ and is used when openmrs is starting; so it will some of the global properties from that database.

Database updates are completed disabled. The thinking here is that since the core war & modules are shared by all tenants that upgrades would need to be coordinated across all tenants. My thought is that it would be a slower moving release cycle. When an upgrade is needed, one solution would be to copy the “fresh” database to a new instance and run the new core war & modules against that in a staging server (run multi-tenancy disabled so upgrades can run). Then use an offline tool like liquibase diff to create an upgrade script that could be applied to each of the databases and run at the same time.

While liquibase is disabled for core & modules, some modules use their module activator to make data changes. The module activators still run at startup but only against the “primary tenant” database. This is another reason to upgrade modules on a staging server and determine the changes and create a update script for each tenant. This may seems like a lot of work, but again, my thinking is that these would be focused run on stable long-term support versions and not organizations on the bleeding edge.

I tested most of the modules that come with the ref app and they seem to work in multi-tenant mode; i.e. they don’t cache things globally and use hibernate correctly.

My overall goal for working on multi tenancy is to work on a free hosting solution for hospitals and clinics that don’t want to put a big investment into hosting & operations. If we can host each OpenMRS distributions on a ‘beefy’ server in the cloud then it should be rather easy/low cost to add individual tenants.

I welcome any feedback / suggestions.

mksd · May 31, 2019, 8:54pm

Hi @atlcto,

This is very interesting! Thanks for sharing your WIP. Btw the link was not pointing to your branch, so I paste it here again for others.

Quite amazing, and super exciting to read that you have actually tested this!

I don’t really understand your remark here. Since all tenants operate under the same artifacts, they should go through the Liquibase changesets that go with each artifact’s upgrade. Disabling Liquibase changesets will most likely just break things.

About the activators, they don’t play the same role at all. They don’t introduce schema changes, they might create/update metadata that is required by the module. IMO they should just run as well.

I can see this combining nicely with SSO. Based on who the user is the OAuth 2 provider would redirect the user to the correct host.

Thanks so much for giving it a stab and sharing this!

atlcto · May 31, 2019, 9:29pm

@mksd: Thanks for your comments, let me see if more details help-

On the Liquibase, the first issue is that in it’s current form, the OpenMRS server doesn’t actually know how many tenants it has; I’m sure that issue could be solved, but if this goes into production, you could have 100 to 300 tenants running on one server. Each time the tomcat server is started up it would try and run the Liquibase changesets on each database would cause the server to take forever to startup. Let me be clear that while I don’t think 300 clinics are going to be using it at once, you would have people sign up to try it out, add a few patients, then maybe leave it for a while (maybe to come back and use it later, maybe not).

On the activators, yes, they are importing metadata which inserts SQL data, which would have to be run against all the databases increasing start time again.

So I didn’t explicitly say it in my post, but another thing I would like to improve is OpenMRS startup time, long term I would like to have it run on AWS Elastic Beanstalk with the file system data stored on a networked Elastic File System (EFS). If the startup time can be greatly reduced then it makes it easy to add in load balanced servers and turn it into a more of a cloud service and less of a loving maintained (and backed up) server instance.

I agree on the SSO; maybe there is a way to use OpenMRS ID as an SSO provider. Still thinking I would want multiple URLs as a user may have access to multiple organizations (maybe the clinician works in a hospital + outside clinic, both on OpenMRS).

I’m also looking into doing some stuff with SAML to authenticate from one OpenMRS system to another OpenMRS system. I wrote a module, which as part of it’s process exports patient data to another server using CCDA (Telemedicine consultancy service & module). Long term if you look at synchronizing the metadata at the application level through some sort of org-to-org slave/sync process, then you could leave the modules putting the metadata in the “primary” tenant and then having it push down to the other tenants through an authenticated online process.

mksd · June 1, 2019, 5:27am

Thanks for the more detailed explanations, this makes more sense.

Correct me if I’m wrong but there are roughly three ways to achieve data multi-tenancy:

One database per client.
One database for all, one schema per client.
One schema for all, but some inner logic to split the data.

Would the single database + different schema not relax those challenges with Liquibase migrations and activators?

I don’t know when the threshold is met for actually splitting the app across separate databases. If things scale to such a point then surely it will also require to do some work upstream as well, such as load balancing between Tomcat instances that cover regions of tenants.

If 2. does indeed relax those issues, then I’m wondering whether it should not be the path for enabling a first version of multi-tenancy with OpenMRS. One that would work more “out of the box" without bringing too many further deployment/upgrades new considerations. Did you try it out, or what’s your thinking here?

Note that 3. will soon be achievable through using appropriate configurations of the new Data Filter module.

atlcto · June 1, 2019, 6:23am

Yes, I did look at options 2 & 3.

The one database for all, one schema per client quickly breaks down because for MySQL: database and schema are synonyms. Even when this works (Postgres, H2), you still have to create the tables one at a time for each schema, so it doesn’t seem to give much improvement.

The one database for all but some inner logic to split the data (a discriminator column) could be a better long term solution, but it kind of scares me from a data-security perspective, one bug and you are leaking data across organizations. It also seems like you are going to get competing answers on “should we share users/locations/concepts/patients across tenants”… if we open the door to sharing.

Finally, a discriminator wasn’t supported in Hibernate 4 (as a multi-tentancy strategy). It was supposed to be planned for Hibernate 5, but some articles said it’s still not supported in Hibernate 5. It could still be done with filters, but we already have a number of persistence attributes on the entities, that I worry that adding a bunch of FilterDef and Filter attributes could make them overly complex. I also worry that a data discriminator would break down some of our core assumptions around unique uuid columns, etc. Would you be able to enable different encounter types per tenant?

If the goal of multi-tenancy is just a means to hide patients from users that are not supposed to see them- then I agree with you that a data filter module would be a MUCH better solution than dealing with a bunch of databases and configuration, just so patients can be kept separate.

I thought I saw multi-tenancy on a roadmap somewhere, but maybe it would do some good to query the community about what they hope to achieve by putting it on the roadmap.

mksd · June 1, 2019, 12:17pm

I would say that ‘hiding patients from users that are not supposed to see them’ is in fact the primary goal of the Data Filter module. Therefore it can also be leveraged to achieve multi-tenancy, but that was not its primary intent. Anyway this is why I mentioned it. It uses Hibernate filters under the hood btw.

Whether one can achieve the data segregation that is demanded by a given business case will be up to each implementation’s configuration that is fed to the module. It would be great is if you could test it for multi-tenancy when it is a little more ready. So if the above seems to fit your end goal as well, I’d be more than happy to keep you in the loop. We’re planning a POC demo either next week or the week after.

Cc @snehabagri @wyclif @lilian

And I am not saying that we shouldn’t pursue other types of multi-tenancy strategies such as the spike that you’ve done () , just trying to join forces wherever possible.

My quick thinking on this is that IF the ‘one database-multiple schema’ strategy could work (and my understanding is that you’re not sure yet that this would improve the situation) then we should aim at providing support for Postgres. It hinges on your remark:

Even when this works (Postgres, H2), you still have to create the tables one at a time for each schema, so it doesn’t seem to give much improvement.

I have no idea, but if there is a chance that there is an acceptable performance hit with Postgres when running Liquibase changesets… etc over multiple schemas, then this is the way IMO.

dkayiwa · June 2, 2019, 1:22pm

Thank you so much @atlcto for sharing your spike on this.

Basing on your target use case, what do you think of @angshuonline’s post in regards to this? Bahmni as Multi-Tenant SAAS solution

atlcto · June 3, 2019, 8:40pm

@dkayiwa I would completely agree with @angshuonline’s post!

atlcto · June 3, 2019, 8:42pm

@gschmidt can you comment on what AMPATH was looking for in terms of multi tenancy?

gschmidt · June 7, 2019, 11:26pm

Let’s check with @jdick and @nkimaina their thoughts on this,

atiq · December 21, 2020, 6:13am

Hello There, Enjoyed your discussion. Is this architecture implemented?

atlcto · December 21, 2020, 1:25pm

Yes, it was the basis for Automated cloud hosting of OpenMRS.. It is no longer running or being worked on as my hours were cut back in February. I’m happy to share any of the scripts with the community if you are interested in developing it further.

mksd · December 21, 2020, 3:07pm

Hi @atiq, not sure which architecture you’re referring to but in the meantime we have been using Data Filter to segregate data based on 1) the user allowed locations and 2) the patients locations (both of which follow strict business definition rules), and it’s been working like a charm.

atiq · December 23, 2020, 5:34am

Thanks for your reply @mksd, I am thinking of single instance of application with multiple databases, serving multiple hospitals.Thanks

atiq · December 29, 2020, 3:39pm

Hello Thanks for Your Response. I have successfully implemented Location Based Access Control and Yes it’s working like a charm. Thanks for your suggestions.

atiq · December 29, 2020, 3:45pm

Thanks for your response Patrick. I have solved the requirements with the help of Location based Access control to filter the data. I appreciate your work what you have done towards multi-tenancy.

banji · September 14, 2022, 4:56am

Hello @atiq trust you are great. I would like to engage you regarding your implementation of your requirements if you dont mind?

banji · March 12, 2023, 12:24am

Thanks i have solved it

mksd · March 13, 2023, 7:06pm

@banji please share with the rest of the community how you have solved your issue, this could benefit and help countless others out there.

banji · March 13, 2023, 10:43pm

Thanks @mksd hope you are great and doing well. Thanks so much for all the phenomenal work you and your team are doing as well as the rest of the community. You guys are truly awesome. I sincerely mean it. Please keep up the phenomenal work.

How i solved it can be found in this really long thread here, i still reference myself Location Based Access Control - v0.1.0 Released - #75 by banji

Work still in progress though but we’ve crossed the hurdle regarding the LBAC implementation with you and others help. Can’t thank you enough. Please keep up the great work.

Anyone requiring specifics regarding this after reading the thread above can inquire, will be more than glad to respond. Basically working with LBAC module and Data-filter regarding data segregation strategy. Thank you @wyclif too!! and @dkayiwa ! smiles, you guys are simply wonderful. Thank you!

Addendum, i am still following threads carefully and saw some discussions regarding going deeper using hibernate or something wrt to mult-tenancy…i wish i was a programmer lol , hoping to reach that level someday with persistence and passion and contribute my quota as well. You have my support and best wishes always.

With Regards