How concept names are breaking dbsync

mksd · October 1, 2021, 4:19pm

I need to borrow some brains … @burke @ibacher @mseaton tagging your here because we already spoke about the wider topic that I am bringing here a while back on a TAC call.

Here is the problem statement. Data is synced (with dbsync) across a network of OpenMRS instances. However metadata are not synced, they are assumed to be installed through configuring each instance.

Each Diagnosis entity de facto points to a ConceptName entity via an embedded CodedOrFreeText, see here.

This is how the dbsync model handles it:

public class EncounterDiagnosisModel extends BaseChangeableDataModel {

    ...

    private String diagnosisCodedNameUuid;

    ...

This diagnosisCodedNameUuid is a reference to a concept name, it is a concept name UUID. The issue with concept name UUIDs is that they may differ for the same intended concept name from OpenMRS instance to OpenMRS instance. They are not really public universal keys to a concept name in the same way that a UUID can be used as a public universal key to a concept.

For instance if this concept CSV definition for a concept is loaded with Iniz on two distinct OpenMRS instances:

`UUID`	`Fully specified name:en`
`eae5d01b-987d-486a-9f37-a236f9800b82`	`Foobar`

We will indeed end up with a concept whose UUID is eae5d01b-987d-486a-9f37-a236f9800b82 on both instances. On both instances this concept will have a “Foobar” FSN in locale ‘en’. However this “Foobar” concept name will not have the same UUID on both instances. And this exactly is where dbsync breaks :-/ because dbsync always assumes that UUIDs are reliable public references to entities.

There are workarounds, at least it would be more or less straightforward to work around this within Iniz. But then what about OCL? Most likely OCL does not offer any guarantees as to what the concept names UUIDs will be right, when managing concepts?

My gut feeling is that it is wrong to have a relationship directly to a concept name as it is done implicitly in Diagnosis through CodedOrFreeText. And that is because concept names are sort of secondary metadata that are always managed via their encompassing concept, never as such.

Cc @wyclif @frederic.deniger @mksrom @dkayiwa

ibacher · October 1, 2021, 4:35pm

I’d need to think a bit about this as a problem, but I can answer this simple question

Actually we kind of do. More specifically, in the OCL backend all objects have an external_id field which we map to the OMRS UUID, so when loading concepts into OpenMRS, we should have assigned UUIDs to concept names. The trade-off that we make with the OCL module is that everything is identified by a UUID.

akanter · October 1, 2021, 8:40pm

Good terminology management would have a unique ID for each concept name. This ensures that the name used to record a diagnosis is displayed back to the user (rather than a default). I don’t know how this is handled in the wild, but CIEL does send this info to OCL.

mksd · October 1, 2021, 10:10pm

@ibacher does that mean that when importing an OCL export JSON file into an OpenMRS instance, the outcome is guaranteed to be the same? In particular all the imported concept names are imported everywhere with the same UUIDs?

when importing an OCL export JSON file into an OpenMRS instance

And I guess that using the subscription module doesn’t do anything different, it’s just re-importing an OCL export JSON file.

mksd · October 1, 2021, 10:35pm

In OpenMRS this means that each concept name should be assigned a UUID that could be used as a public reference to the concept name.

@akanter when you say that “CIEL sends this info to OCL”, what does that mean exactly in practice?

bistenes · October 1, 2021, 11:13pm

I agree with your gut feeling, FWIW.

mseaton · October 1, 2021, 11:44pm

I will likely have more to say on this subject, and happy to discuss more on the TAC call. To answer one specific question:

I think this means that CIEL has fixed uuids for it’s concept names, generated in the same way as done for concept uuids. Whereas many are familiar with the CIEL “AAAAAAAAAAAA” concept uuids, CIEL’s concept names seem to follow a similar pattern with “BBBBBBBBBBBB”.

For example, see the External IDs on the concept names here: https://app.openconceptlab.org/#/orgs/CIEL/sources/CIEL/concepts/161273/

Mike

mksd · October 2, 2021, 8:04am

Interesting. But then, if concept names are universally identified as such, does that mean they are reusable? This design implies that a concept name could be hooked to multiple concepts, and that I’m pretty sure isn’t the case (at least in OpenMRS, I wouldn’t know about OCL yet.)

mseaton · October 2, 2021, 11:40am

I don’t believe there is any intent that a concept name could be attached to more than one concept. The value that I see is in being able to uniquely identify a concept name attached to a concept for loading purposes - otherwise, assuming that changes to the actual name text is allowed, there is no way to determine when concept names are modified, unmodified, added, or removed when importing a dictionary. But dealing with uuids in this way can be incredibly unwieldy.

Given the use case that I’ve heard @akanter describe, where concept names are first-class metadata that are directly associated with OpenMRS Data to record the specific name that a user chose, if we want to continue to support this we clearly need a way to preserve existing concept names between concept import jobs.

Supporting the ability to attach a fixed uuid to a concept name in our loading tools (i.e. initializer) would seem like a reasonable first step, given that it is relatively quick and easy to implement and would provide this flexibility to those that need it. This is among the few improvements suggested in a ticket around this. But I recognize that most existing implementations using initializer for Concepts won’t have this in their CSVs and many won’t want to add them.

The other approach one might consider would be to assume that changes to the actual wording of a Concept Name represent the need for a new Concept Name and do something like follows:

If an existing name has exactly the same name text as one you are loading, then modify it if other properties have changed (name type, locale, locale preferred) and make no change otherwise
If no existing name matches the one that you are loading, create a new name
If names exist on the existing Concept that don’t match the name text of any concept that is being loaded, then for each of these determine if the existing name is in use (i.e. attached to a diagnosis) and if so retire it, otherwise purge it.

Unfortunately, in OpenMRS we have Concept Name as a Voidable entity, not a Retireable entity. I don’t know/recall if this was an intentional design decision or an oversight, but this would pose a challenge to the above strategy, if it were worth pursuing.

burke · October 4, 2021, 1:56pm

I think we’re paying the price now for refactoring that wasn’t done as the design around concept names evolved.

When we initially created concept names, we treated them like secondary metadata (that wouldn’t be referenced directly). Over time, however, we ran into cases where we wanted to preserve the specific name chosen. Naming a diagnosis in a condition list or as an encounter diagnosis are examples of this, where the provider wants to see “GERD” (a synonym) in their problem list instead of “Gastroesophageal Reflux Disease”. Another example is capturing the specific name used as an answer on a form (which led to concept.value_coded_name_id).

I believe this is a result of the history, where concept names were not initially designed to be referenced directly and, over time, features were added that refer to them directly. Ideally, we would have refactored concept names to be retired instead of voided at that time.

I don’t think we can (or want to) get away from the requirement to be able to refer to specific concept names in some circumstances, which means that they should probably be retired (not voided). That said, in these cases, it’s the literal name that defines equality in 99% of cases. I would suspect changing a synonym after its in use to be an exceptional situation.

akanter · October 4, 2021, 2:57pm

Currently OCL receives the UUID from CIEL and puts it in the external ID field for concept_names.

mksd · October 4, 2021, 7:49pm

I still don’t fully understand why a name could not be hooked to multiple concepts. I am not saying we should go down that route but I would like to understand why it is not a good practise.

If I take the simplified example of the name “Diabetes”. Typically this kind of name is that of a Misc/NA concept or maybe Diagnosis/NA concept. In other words it is most likely an answer to some question.

But then imagine the hypothetical case of a form where people would to ask the question whether the patient is diabetic or not:

That one would be the name of a concept Question/Boolean.

Wouldn’t it make sense to reuse that name and hook it to both concepts?

ibacher · October 4, 2021, 8:31pm

It would if concept names were just strings, but they’re actually strings with metadata (e.g., locale, name type, whether or not it’s the preferred name, whether the name is active or voided, etc.)

In terms of your example, let’s say I decide down the line that the diagnosis of “diabetes” isn’t specific enough for the diagnosis, and I actually want it to say “diabetes mellitus”, so I retire the diagnosis name “diabetes”, add a new “diabetes mellitus” name, and add “diabetes” as an index term for my diagnosis concept. Do I also want to make that change to the display name on the questionnaire?

burke · October 5, 2021, 12:14am

This is one of the more frequently used anti-patterns when modeling a dictionary.

Diagnosis as question	Diagnosis as answer
Diabetes Asthma ☐ Hypertension	What are the patient’s diagnoses? `Diabetes ⊗` `Asthma ⊗`

It seems simple when starting out and people commonly choose the former (diagnosis as question); however, it becomes an n² problem as you begin managing question & answer forms of each concept for thousands of concepts. For those who are old enough to remember when dictionaries were books, I like to think of the analogy of a physical dictionary – i.e., you don’t find two separate entries for “blue” depending on whether someone asked “Is that blue?” or “What color is that?”… the definition of “blue” is distinct from how it is used. Likewise, “Diabetes” isn’t a question any more than “blue” is a question.

But you don’t really have to choose. Either/both of these approaches can be taken in the UI while using the same model (diagnoses as answers) to model & persist the data.

mksd · October 5, 2021, 8:02am

Sure, as for the way it is modelled right now this wouldn’t work as such, but things could have been modelled in a way that would allow it.

If the user is offered preset coded choices when filling a form, then I guess the user accepts that the management of the coded values is out of their hands. Alternatively there could be situations (a bit like with address hierarchy) where the choices are coded but ultimately the chosen value is saved as text.

I don’t know whether one rule fits all or if a strategy or another should be implemented depending on the context in the EMR. Most likely the latter.

Believe me, I know I personally came up with these rules of thumb when setting up FSNs of concepts:

Answers are short and reusable.
Questions are long and context rich.
This one is not so good (*): Questions can be overridden by using labels in the UI.

Admittedly there will be plenty of times when those rules should be broken, but often times they serve their purpose.

(*) Not so good because the actual question asked is not recorded, but maybe there are workarounds in the Obs model.

mksd · October 6, 2021, 9:10am

For the record, and in order to unlock the original issue with dbsync, I have created this Iniz issue:

#141 - Full specified names UUIDs to be seeded from their concept’s UUID.