Suggestions to model marking an observation as "Abnormal"

In relational databases with huge tables the solution usually is to use partitions (physically splitting the table in multiple ones, by some column normally dates). Obviously this is a DBA task, can’t be done with liquibase scripts. But I don’t know if MySQL supports partitions.

Another option would be using PostgreSQL if it’s better than MySQL.

Thanks Darius. We have not but will look into it now and report back.

JJ

As far as i can tell, our obs table is some day going to require something NoSQL like. :slight_smile: Uber had a similar problem for their tripstore table which was growing the fastest, just as we have our obs table. They modified MySQL to give it some NoSQL capabilities to store data for this biggest table. The rest of the not so big data remained in their original PostgreSQL database. You can read details at: https://eng.uber.com/mezzanine-migration/

Of course we do not have the capacity to modify a RDBMS. So we may choose an existing NoSQL for obs while leaving the rest intact, as we face the reality of Polyglot Persistence - using different data storage technologies to handle varying data storage needs!

Using a NoSQL data store will fix this particular problem (adding new columns fast) but will create others (complex queries for instance). It’s not a silver bullet.

Some people maintain two stores, one relational and one NoSQL that must be synchronized (queues, scheduled ETL processes …).

I’ve read the Uber article and it’s interesting, however I don’t think they modified MySQL, they built something on top of it. I don’t know why they didn’t pick up an already existing solution that provides sharding.

@jdick have you had a chance to look into this?

(This is triggered by me trying to get to inbox 10000, not any specific activity on this topic.)

The Bahmni is going to prioritize adding obs.interpretation (and we’ll also add obs.status, per @burke’s suggestion). We really want to do this in Platform 2.1.0, which should be released this month.

@jdick can you prioritize testing whether the Online DDL feature in modern MySQL will allow this change to happen smoothly for an AMPATH-sized database?

(I have also asked @teleivo if he might have time to test this, or perhaps @lluismf is interested. But really this is a huge issue for AMPATH, and we need AMPATH to be involved in testing the solution.)

I think a sufficient test is as easy as:

  1. Install OpenMRS on MySQL 5.7
  2. Populate a large obs table (by script or something)
  3. Manually execute “alter table obs add column interpretation varchar(50)” and check whether you’re still able to use the OpenMRS UI while this DDL is running

I am not sure why I didn’t see this sooner. I don’t work for AMPATH anymore but when we were upgrading from 1.9.7 to 1.11.6 we used a technic similar to what you have asked about. It is found here . I believe the technic can be modified and included in a generic upgrade module as suggested by @burke

1 Like

Thanks for that context @willa.

My understanding, then, is that it’s okay to make certain kinds of changes to the obs table in new releases, because there is a proven way for an implementation with a huge obs table to upgrade.

It would be nice to also automate the upgrade process for implementations with huge obs tables (and maybe the latest MySQL versions can handle this even easier). Since there are only a few implementations this applies to, I propose that the next time one of them is doing an upgrade they should collaborate with some OpenMRS community developers to build out a standard set of scripts for applying data model changes pre-upgrade.

Thus the Bahmni team can start to work on TRUNK-4976 because it’s not blocked by this particular concern.

(The ticket is still not ready-for-work for a different reason, because there’s an open question about the obs.status column.)

Yes I can, I love DB stuff :slight_smile:

I on a demo DB (MySQL 5.7) with 500k obs, took 30 seconds ALTER TABLE obs ADD interpretation VARCHAR(50) NOT NULL ;

@darius, we will test this out in the coming week and get back to you.

If for some reason the ALTER TABLE doesn’t work, there’s this tool created and used by GitHub staff:

Please read the Readme, they use a pretty clever technique to achieve it. But IMHO to just add a new column it shouldn’t be necessary. I guess this is intended for big schema changes.

1 Like

can we get more details on the AMPATH implementation and the test

  • MySQL version?
  • the AMPATH wiki page says 90 million obs in 2011, how many do they have now?
  • I assume AMPATH has its own distro, what OpenMRS version/setup should we use to test it? Simply use openmrs sdk with reference application 2.3.1?
  • with ‘whether you’re still able to use the OpenMRS UI while the DDL is running’ do you mean if I can still add an observation like for example capture a patients vital signs?

:rolling_eyes: I don’t think my PC can handle that. My hard disk isn’t even SSD (10x slower).

I inserted 8m observations, and the alter table took 4 minutes:

My thinking was that we don’t have to answer “exactly how much work will it be for ampath to do this next upgrade” but instead “if ampath and other large implementations do a (possibly large) amount of work to get on the latest version of MySQL, can they then rely on a built in mechanism to add columns to the OBS table without taking their system offline?”

This seems like a quicker-to-test question whose answer tells us how much the OpenMRS community would prioritize building a custom devops approach to this, or if we can just say it’s a solved problem because MySQL.

Of course if you want to go further towards testing ampath’s exact upgrade path, don’t let me stop you!

I believe that building a custom solution is bad. The tricky part is, using the shadow table approach you can’t lock the original table and you can miss all the writes performed on it during the process. The github solution reads from the transaction log which is brilliant, and I’m not sure we can do that without a big effort.

@jdick, any update on this investigation?

@lluismf, were you able to continue using the OpenMRS UI during the 4 minutes that the alter table statement was running?

@jeffneiman, can we schedule for next Monday’s design forum:

TRUNK-4976 - Support obs.interpretation and obs.status, aligned with FHIR

(I’m hoping @burke will be back from vacation and able to join then. @angshuonline, @hesicong FYI.)

No, I wasn’t using the UI. For what I read the ALTER TABLE in the current version of MySQL does not lock the table.

Per today’s Design meeting, here’s my cut on paring down the very large list of flags in https://www.hl7.org/fhir/valueset-observation-interpretation.html .

Thanks for doing this @jteich. Some comments:

Your narrative about what to include makes a lot of sense. You should have put it before the table so I didn’t start going row by row. :slight_smile:

I think we should also support < and >. These aren’t the same as LL and HH, rather they are “Off Scale (Low/High)”, i.e. can’t be accurately measured by the lab machine. For HIV I remember it being an important use case (that we never got right) to record undetectable Viral Load for which the machine we had reported “<40”, its lower limit. (Though maybe in FHIR that should be stored in a different property? Viral Load “<40” is both “normal” and “off scale low”.)

You wrote that B and W can be replaced by U/D, but this isn’t strictly true, since up/down imply knowing the direction of change, whereas better/worse can apply to a qualitative finding. (This is nitpicking though. I have no use case for this and I’m not arguing for adding it.)