OpenMRS Architecture Proposal: Modules and Bundles

bistenes · January 10, 2019, 1:16am

I’ve written a proposal for OpenMRS architecture, looking towards 3.0. I present a brief overview of the existing systems and solutions, and make a concrete software proposal. I advocate defining a single, universal configuration mechanism. I believe that this will encourage better software practices, make OpenMRS easier to use, and strengthen the community’s ability to support it.

This is a draft, a work in progress. I hope that it looks like it’s on the right track to a lot of the community, and that it can evolve with your feedback to be something with enough force and buy-in behind it to make it happen.

dkayiwa · January 10, 2019, 3:26pm

Thanks for the very nice ideas!

Just to get more clarity, would that be simply building upon this? https://github.com/mekomsolutions/openmrs-module-initializer

cc @mksd

bistenes · January 10, 2019, 4:17pm

I wonder if it would make sense to do so or not. Certainly Initializer (Iniz) is on what I believe is the right track. But at the same time a lot of what I’m proposing would be new. Perhaps a roadmap for building Iniz to meet my specification might look like

Phase one – module development
1. add an in-memory Store which Iniz Readers can load data into
2. create a declarative API for defining schemata, which would at once define a (CSV? YAML? I can think of good arguments for starting with either) Reader and a data structure in the Store
3. create the schema -> Reader mapping for whichever of CSV or YAML wasn’t implemented first, to support both config file types
4. add runtimeMutable to schema definition API, add an Admin UI panel for changing runtime-mutable properties (this will eventually replace Settings)
Phase two – core integration
1. core: create the Config Directory
2. core: add support for config specification classes, which are executed before any module is activated
3. Iniz: break into phases — first all config schema are defined. Then all config files are read and resolved into a tree, which is validated against the complete schema. This is what allows “verticals,” which cut across many modules.
4. creation of a section of the wiki for sharing verticals
5. define core config schema
6. deprecation of module-specific Admin UI and Settings
7. define ref app schema

I guess most of that is independent of whether we develop off of Iniz or not. But looking at this, I don’t think it would be unreasonable to do so.

mksd · January 10, 2019, 4:45pm

Thanks @bistenes, I had already commented in private but that’s a very interesting topic. @dkayiwa I had suggested to plan such work to be integrated into Iniz indeed!

To a small extent Iniz already provides something like that through the ‘JSON keyvalues’ domain, see here. It wasn’t expanded much more than what’s on the README but the idea is there. This domain is currently used to activate/deactivate reports out of a reports repository module.

dkayiwa · January 10, 2019, 5:28pm

@mksd since you know Iniz much more than i do, which of @bistenes’s proposal is already implemented?

dkayiwa · January 13, 2019, 7:27am

@mksd ping

@bistenes do you think you can walk us through this proposal on a design call?

mksd · January 13, 2019, 10:25am

Sorry for my late reply: @bistenes’s proposal deserves a lot of credit. Furthermore he has shared further details with me in private that I would like to attach to this thread for reference. I’ll try to do this after this post.

I had concerns about how Iniz would fit in all this, I realise now that it all hangs to the question: what should be stored in memory and what should be persisted? Iniz should remain and keep being extended for the latter, at the end of the day it is only a fairly dumb (meta)data loader, and it does the job.

I would be fully supportive of deprecating the scattered admin UIs by transitioning to having modules require their config upon “booting”. It is clear that this is an easy and big winner.

Then there will be the debate of what should become in-memory configuration that is currently stored in database, and how implementations will transition to the new state of things.

Anyway, of course we should have a design call about this.

bistenes · January 6, 2019, 9:17pm

Hi Dimitri,

Thanks for the thoughtful response. That makes a lot of sense. I especially appreciate the note about CSV being useful for working with non-techs, that seems very salient.

I guess the tension between Iniz and my proposal illuminates a deeper tension in OpenMRS: that between DB-stored config (“metadata”) and in-memory config (“Settings”). Iniz is really good at the former, for things that have good reason to be DB-stored config, e.g. drugs. (I’m hoping OCL-for-OpenMRS takes over the Concepts space, and does so justly; I don’t think anyone has figured out Concepts yet). (Does AddressHierarchy use Iniz to pull in config from the App Data Dir? Seems like it should…)

But there are a lot of things that we presently store in-DB that really ought to be stored in-memory. So I guess at the heart of my proposal is a new in-memory store, a Config tree. Things that fit the bill for this include…

idgen
locations
personAttributeTypes
programs
encounters
forms/fields
logic rules
programs
providermanagement
relationships
roles

(Do you know how dashboards & patientSummaryWidgets are stored? I think these should also be in-memory, probably YAML-configurable…)

How have you been handling those things above that don’t have Readers in Iniz?

The idea of hierarchy and overriding is very interesting. I haven’t thought about it at all, really. I’ve been imagining that having inconsistent configurations should cause a fast failure. How will a client of Iniz define what is supposed to override what?

I imagine a declarative config-defining API that is, in theory, not format specific. It would define specifically the shape of the config tree. So, e.g., the code

import org.openmrs.config

// define a relationship config object
ConfigObject relationshipConf = new ConfigObject();
relationshipConf.addKey(required=True, auto=False)
relationshipConf.addStringField('aToB', runtimeMutable=True, required=True);
relationshipConf.addStringField('bToA', runtimeMutable=True, required=True);
relationshipConf.addStringField('description', runtimeMutable=True);
relationshipConf.addBooleanField('allowSelf', runtimeMutable=True, default=True);

ConfigEnum personTypes = new ConfigEnum<String>('patient', 'provider', 'nonProviderUser');
ConfigArray allowedPersonATypes = relationshipConf.addArray('allowedPersonATypes', personTypes);
ConfigArray allowedPersonBTypes = relationshipConf.addArray('allowedPersonBTypes', personTypes);

// tell config module to expect a top-level array of relationshipConfs
config.addArray('relationships', relationshipConf);

This might be satisfied by some YAML that looks like

relationships:
  - key: DOC_PT
    aToB: doctor
    bToA: patient 
    allowedPersonATypes:
      - provider
    allowedPersonBTypes:
      - patient
      - nonProvideruser
  - key: SIBLINGS
    aToB: sibling
    bToA: sibling

It could also be satisfied with a CSV that looks like

key,        aToB,    bToA,      allowedPersonATypes, allowedPersonBTypes
DOC_TO_PT,  doctor,  patient,   provider,            "patient; nonProviderUser"
SIBLINGS,   sibling, sibling,,

Let me know what you think of all that. I won’t be in Boston at that time; I’m based in Mexico. Looking forward to talking more about this though.

dkayiwa · January 13, 2019, 8:04pm

For things that fit the bill, why would you want to store them in memory instead of DB?

mogoodrich · January 14, 2019, 3:02pm

First off at a high level, this sounds great. Plus +1 for a design call on this…

@bistenes can you request a design call:

https://wiki.openmrs.org/display/RES/Design+Forum

To answer a few questions:

dashboards and patientSummaryWidgets are configured as apps and extensions (in memory):

https://wiki.openmrs.org/display/docs/App+Framework+Developer+Documentation

it would definitely be good to support a hierarchy and overriding… we have rudimentary support for this in our PIH Config and take advantage of it to define certain properties at the country-level shared across multiple instances, and then specific properties for different sites, staging vs production servers, etc; @bistenes, you are obviously already familiar with the PIH Config… worth keeping what works from than and jettisoning what doesn’t in a new design
Definitely interested more about the pluses and minuses of moving metadata that currently are first-order objects, like Locations, Programs, etc, from the DB to in-memory…

Take care, Mark

bistenes · January 14, 2019, 3:49pm

Having information stored in two places (e.g., config files and a database) means that they can come out of sync. For metadata this is acceptable, since the database confers performance and scalability benefits over working with CSVs. For config, this is just one more thing that can go wrong. It also introduces the complexity of having to map large configuration spaces into flat K-V stores.

bistenes · January 14, 2019, 4:18pm

My concern with hierarchy and overriding is that it would probably make configurations very hard to read and reason about. In any case, this is probably pretty far down the roadmap and we can think about it more later.

Thanks for the links @mogoodrich! I’ll go ahead and request a design call.

mseaton · January 14, 2019, 5:32pm

@bistenes, am I understanding that your proposal might be to move metadata out of the database? This would be a pretty radical change well beyond standardizing on an improved configuration mechanism, as the entire relational model is built upon foreign keys to metadata tables.

The approach we have used with metadatadeploy is that configuration is defined in code (or config files that can be interpreted by code), and the metadatadeploy module updates the metadata in the database to match it every time the system is started. I would assume we’d continue with this type of a model.

There remains some value in user interfaces that enable configuration of metadata, as these enable relatively quick addition and validation and are more user-friendly to a range of users than raw json or yaml. What we should make more explicit is that these interfaces are intended to be used on a development or test server, and there should be features added to export data from these into a compatible configuration-file format for sharing and version control. Where we have fallen short until now is that we have not provided a good model and set of tools for going from database configuration into configuration files that can be shared and version controlled, and so there are distributions in use in which the full configuration is only accessible on the running production server or backups of it. These administrative UIs could be moved into an entirely separate project. They could even be made agnostic of the underlying OpenMRS instance they are connecting to. This is basically what OCL is doing for Concepts.

I think there is still a conversation to be had as to where the code enabling configuration of various features should live. Should “iniz” be made “awareof” every module, and have the code built into it to configure each as needed? Should modules own their own configuration tools, and depend on “iniz” or implement certain interfaces, use certain annotations, or follow certain patterns to plug into a broader configuration mechanism?

Looking forward to more discussions and a design call around this.

Best, Mike

mksd · January 14, 2019, 6:15pm

I remember that you and I chatted about this back in 2017, and in the case of AH we settled on having it being standalone. It’s not a big deal (as everything works fine since AH 2.11.0 when this was released) but I kind of regret this decision. Having Iniz to be the one place has proven quite practical in the end. It’s sort of clear that this configuration folder is that one folder processed by Iniz. But again, the fact that the ‘addresshierarchy’ domain is handled by another module (AH then) is not much of a problem, it’s more a consistency issue.

Actually that’s remotely on my to do list as right now Iniz still requires other modules, which is wrong but hasn’t proven problematic enough in our use cases (because Iniz handles Core metadata for the most part, except for patient identifier types from the top of my head.)

bistenes · January 14, 2019, 8:45pm

I want to draw a line between “metadata” and “configuration.” If there’s a lot of it (e.g. concepts, AH), then it probably makes sense to store in a DB, and it’s metadata. If there’s not, then there’s no good reason to keep it in a DB, and it’s probably awkward or restricting to represent in the relational model anyway.

Metadatadeploy is fine for PIH because we have a bunch of engineers. I think that OpenMRS more broadly relying on in-code configuration isn’t compatible with a vision of it being the platform of choice for low-resource settings.

The Admin config UIs you describe would indeed be useful. But there’s a lot of hedging around that. Making config elements runtimeMutable would satisfy the quick addition/validation needs. If there are configs that could really use to be generated from a GUI, it would be simpler to generate YAML directly than to find some mapping from in-memory config objects --> relational model --> config files. Implementation-agnostic metadata & config management tools sound like a great idea.

Something Dimitri mentioned in a private thread was that CSV-based configuration is useful for very-non-tech folks, who can then edit config in Excel. This seems like a good reason to have the YAML-CSV polymorphism. This would be a much easier way to allow configuration by non-tech folks than writing and maintaining Admin UIs, for config schemata that are sufficiently simple to admit a reasonable CSV representation.

I had been imagining modules depending on a config library and managing their own schemata. I could certainly be convinced that we should start by building configurability for other modules into Iniz. Perhaps we could roadmap how we then distribute responsibility for configuration back out to the modules themselves.

Actually, building config for other modules into Iniz would allow for config tree resolution and “verticals” prior to having core support for a “config phase” (c.f. my roadmap above). I like this.

mseaton · January 14, 2019, 9:18pm

Just to highlight this point, metadatadeploy does not necessarily require engineers, it’s just that the initial implementation of most of the deployment bundles have been Java-programming-heavy. There are examples of CSV sources and the module has examples of these in use. That said, I’m not trying to advocate for MetadataDeploy vs. Iniz (though one could argue that we should not have both of these independent initiatives that aim to solve similar things).

Mike

bistenes · January 24, 2019, 6:10pm

A few asides before the main thing:

After the call yesterday and a good night’s sleep, I’ve become sympathetic to the idea that the config dir should be a subdir of the app data dir. The SDK setup step can create it and symlink it from the working directory into app data. Does that sound right?
I think the biggest technical hurdle in proliferating this is getting existing modules to use this new config mechanism. I don’t really understand Spring, but I think what would be necessary for the module to read config is to use a different implementation of the db/...DAO interfaces, which presently (mostly? universally?) use a db/Hibernate...DAO implementation, which is wired up with Spring. Can that be overridden from outside the module, or would we have to add these new implementations to the modules themselves? The new implementation would, for calls having to do with “config” rather than metadata or data, make calls against the config library’s in-memory config store, rather than against Hibernate. Does that sound like a good approach?
I might propose a config file tree like

config/
  |--HIV/                     # vertical
  |    |--htmlformentry/      # module
  |    |    |--intakeForm.xml
  |    |    `--config.yml
  |    `--laborderentry.yml
  |--htmlformentry.yml        # overrides
  |--laborderentry.yml
  `--production/              # overrides enabled at command line or by env var
      `--htmlformentry.yml

i.e. where at the top level in the config directory you have vertical packages containing module-namespaced files, module-namespaced files that override those in the verticals, and packages of module-namespaced files that will override both of the former two when enabled. Though maybe these three categories should each have a top-level directory, like

config/
  |--verticals/           # or "packages"
  |   `--hiv/
  |--main/                # or whatever
  |   `--htmlformentry.yml
  `--overrides/           # or "environments"
      `--production/

where order of precedence goes overrides -> main -> verticals. Verticals could be managed through the SDK.

bistenes · January 24, 2019, 6:29pm

The main thing, which none of the above points are relevant to (I think):

We should figure out what next steps are (again, apologies for not directing the meeting more efficiently around this goal).

While this is very spiritually aligned with Iniz, I don’t think there’s much of the machinery that Iniz provides that will actually be useful in implementing this. I would like to build toward the API being in alignment with both Iniz and App Framework, so that migration is as seamless as possible for people using those.

So I think the right thing to do is to start a new module. I’d definitely need to talk to Mike and Mark about this, but… I would first build it to replace PIH’s own JSON configuration mechanism, which just configures the PIH module. This would be sort of a proof-of-concept.

Then I’d use it to configure some other module. Open to suggestions about what that should be. I propose Address Hierarchy, as an option.

There’s a lot of places we can go from there, but I think those would be good first steps.

darius · January 25, 2019, 1:21am

Sorry for being very late to chime in, and sorry if this has already been covered. (And I have not actually read your architecture proposal, because I need to dash this off quickly.)

Two thoughts based on the talk thread:

There’s huge value in having config be defined in text files in a separate folder, which can be version-controlled also. Bahmni does this already. Can we align with what they’re doing?
“in-memory” has some advantages but it can be problematic to scale this horizontally to multiple web servers. Although nobody is doing this today, I think we should avoid any architecture design choices that will prevent us from scaling that way in the future. The main mitigation for this is to ensure that any in-memory config is just a cache of something more persistent and/or to ensure there’s a proper ConfigService API that would let us someday also run the config as a separate microservice, e.g. with multiple web servers talking to it.

bistenes · January 25, 2019, 3:12am

Oh interesting note about scaling in-memory across servers. I think Burke said something along these lines on the call but I didn’t fully understand him then. Two straightforward (I think) possibilities:

the configuration directory gets shared across the servers somehow. Since configuration is read-only (with the exception of “mutable” properties, which should not be used in production), there’s no distributed-systems type problems that would need a database to solve
have the config module load YAML strings into the database. This evades the significant technical pain of mapping the expressive space of YAML into SQL by accepting a little ugliness. The database is essentially treated as a dumb store for sharing data between servers.

I’m more in favor of the former, but not sure how I’d implement it (SDK tooling?), and figured I’d throw both ideas out there.