In relational databases with huge tables the solution usually is to use partitions (physically splitting the table in multiple ones, by some column normally dates). Obviously this is a DBA task, can’t be done with liquibase scripts. But I don’t know if MySQL supports partitions.
Another option would be using PostgreSQL if it’s better than MySQL.
As far as i can tell, our obs table is some day going to require something NoSQL like.
Uber had a similar problem for their tripstore table which was growing the fastest, just as we have our obs table.
They modified MySQL to give it some NoSQL capabilities to store data for this biggest table. The rest of the not so big data remained in their original PostgreSQL database.
You can read details at: https://eng.uber.com/mezzanine-migration/
Of course we do not have the capacity to modify a RDBMS. So we may choose an existing NoSQL for obs while leaving the rest intact, as we face the reality of Polyglot Persistence - using different data storage technologies to handle varying data storage needs!
Using a NoSQL data store will fix this particular problem (adding new columns fast) but will create others (complex queries for instance).
It’s not a silver bullet.
Some people maintain two stores, one relational and one NoSQL that must be synchronized (queues, scheduled ETL processes …).
I’ve read the Uber article and it’s interesting, however I don’t think they modified MySQL, they built something on top of it. I don’t know why they didn’t pick up an already existing solution that provides sharding.
The Bahmni is going to prioritize adding obs.interpretation (and we’ll also add obs.status, per @burke’s suggestion). We really want to do this in Platform 2.1.0, which should be released this month.
@jdick can you prioritize testing whether the Online DDL feature in modern MySQL will allow this change to happen smoothly for an AMPATH-sized database?
(I have also asked @teleivo if he might have time to test this, or perhaps @lluismf is interested. But really this is a huge issue for AMPATH, and we need AMPATH to be involved in testing the solution.)
I think a sufficient test is as easy as:
Install OpenMRS on MySQL 5.7
Populate a large obs table (by script or something)
Manually execute “alter table obs add column interpretation varchar(50)” and check whether you’re still able to use the OpenMRS UI while this DDL is running
I am not sure why I didn’t see this sooner. I don’t work for AMPATH anymore but when we were upgrading from 1.9.7 to 1.11.6 we used a technic similar to what you have asked about. It is found here . I believe the technic can be modified and included in a generic upgrade module as suggested by @burke
My understanding, then, is that it’s okay to make certain kinds of changes to the obs table in new releases, because there is a proven way for an implementation with a huge obs table to upgrade.
It would be nice to also automate the upgrade process for implementations with huge obs tables (and maybe the latest MySQL versions can handle this even easier). Since there are only a few implementations this applies to, I propose that the next time one of them is doing an upgrade they should collaborate with some OpenMRS community developers to build out a standard set of scripts for applying data model changes pre-upgrade.
Thus the Bahmni team can start to work on TRUNK-4976 because it’s not blocked by this particular concern.
(The ticket is still not ready-for-work for a different reason, because there’s an open question about the obs.status column.)
My thinking was that we don’t have to answer “exactly how much work will it be for ampath to do this next upgrade” but instead “if ampath and other large implementations do a (possibly large) amount of work to get on the latest version of MySQL, can they then rely on a built in mechanism to add columns to the OBS table without taking their system offline?”
This seems like a quicker-to-test question whose answer tells us how much the OpenMRS community would prioritize building a custom devops approach to this, or if we can just say it’s a solved problem because MySQL.
Of course if you want to go further towards testing ampath’s exact upgrade path, don’t let me stop you!
I believe that building a custom solution is bad. The tricky part is, using the shadow table approach you can’t lock the original table and you can miss all the writes performed on it during the process. The github solution reads from the transaction log which is brilliant, and I’m not sure we can do that without a big effort.
Your narrative about what to include makes a lot of sense. You should have put it before the table so I didn’t start going row by row.
I think we should also support < and >. These aren’t the same as LL and HH, rather they are “Off Scale (Low/High)”, i.e. can’t be accurately measured by the lab machine. For HIV I remember it being an important use case (that we never got right) to record undetectable Viral Load for which the machine we had reported “<40”, its lower limit. (Though maybe in FHIR that should be stored in a different property? Viral Load “<40” is both “normal” and “off scale low”.)
You wrote that B and W can be replaced by U/D, but this isn’t strictly true, since up/down imply knowing the direction of change, whereas better/worse can apply to a qualitative finding. (This is nitpicking though. I have no use case for this and I’m not arguing for adding it.)