Assigning concept IDs when using the OpenMRS Dictionary Manager

We were looking to add the ability to clone concepts within the Dictionary Manager (OCLOMRS-1007) and ran into a question that I don’t believe has been clearing answered for concept creation within the OpenMRS Dictionary Manager, especially in cases where we are “automatically” creating concepts (like copying over a list of answer concepts):

What should be used for the concept ID when copying a concept from another dictionary/source in OCL?

There are two “simple” answers to this question:

Approach Pro Con
Auto-assign next available ID for local dictionary Matches behavior of OpenMRS OCL stores concept IDs as strings and calculating the next available integer would require downloading all concept IDs and sorting them numerically to find the current (numerically) maximum ID value.
Use the same ID from the original source Straightforward. If you clone CIEL 123 into your FOOBAR dictionary, you get FOOBAR 123 Chance of conflicts. What do we do if concept ID 123 already exists in FOOBAR?

Given the challenge of finding the “next available” ID within OCL, I think adopting the same ID from the original source might be the simplest & most intuitive piece. The challenge will be how to handle conflicts where an ID already exists in the local dictionary.

Proposal

  • Cloning a concept (e.g., from CIEL) into your local dictionary would create concepts using the IDs that CIEL uses (clone CIEL 123456 and you get a concept 123456). Any answers or set members are cloned as well. Any concepts that already exist (e.g., mapped as SAME-AS to concept being cloned)

  • We start by testing all of the IDs of all concepts that will need to be created. If any conflicts are found (i.e., IDs that would be created already exist in the local dictionary), we present the user with a list of conflicts and they must manually provide alternative IDs for each case before the cloning proceeds. In other words, we anticipate conflicts and force users to manually address them before cloning can proceed.

Does this make sense? Thoughts? Suggestions?

/cc @ball @akanter @ibacher

Burke’s proposal (auto assign next id) seems reasonable and definitely the safest choice. Especially about showing any conflicts before cloning.

If the CIEL concept had any mapping that would be wrong for PIH, can I remove that from the cloning? or I’d just be notified?

/cc @burke @akanter @mogoodrich

@burke @akanter @ibacher @ball

My gut reaction is that there has to be support for autoassigning, don’t you think?

I would think that once you start to include multiple sources in your dictionary, you are going to run into conflicts frequently?

And resolving them manually will be difficult, because the end user would need to have a list in front of them of all the existing concept ids in their dictionary? Otherwise they’d just have to start manually “guessing” numbers and enter them until one doesn’t collide?

And just to be clear about what we are talking about when we mean “concept id” in OCL, in the following screenshot of OCL, the concept id of the concept in the CIEL source is 306?

I’m guessing this concept originally came from AMPATH, because it has an AMPATH mappign of 306 (but no CIEL mapping of 306, for what it’s worth)?

Take care, Mark

In a closely related discussion on importing the PIH dictionary into OCL, we discovered that PIH might not be using concept IDs for all concepts:

@mogoodrich I understand concept IDs can vary across servers, but always assumed PIH had a “gold” concept server with the official concept IDs. From your comment above on ocl_issues #45, I gather you treat concept.concept_id as the internal id it is and assign a PIH SAME-AS mapping when you need a human-friendly identifier (e.g., to reference the concept for a form, report, etc.). The interesting/new idea I didn’t see coming…

Assuming “code” refers to a human-friendly unique identifier (like PIH:5 for Asthma or PIH:5356 for Current WHO HIV Stage), is the following statement true?

Many PIH concepts do not have a PIH code

Looking more closely, in many cases where there isn’t a PIH code, there is a CIEL mapping. I suppose we could consider these cases to represent PIH using a CIEL concept. But there are still several hundred concepts that don’t have a mapping to PIH or CIEL. For example, if I ask you what concept does PIH use for Social Skills Evaluation (text), you could tell me you refer to it as 14f5a8c9-6db0-45c5-bf56-b30b62a5394f, but you couldn’t give me a PIH code (without adding a new mapping), right?

/cc @ball @paynejd @akanter

@burke yes, if I understand what you saying, I think that is correct.

A PIH concept can have 0 to n mapping from the PIH Concept source. For example, here are 3 concepts from our “golden” server, one with 0 PIH mappings, one with 1 PIH mappings, and one with 2 PIH mappings.

Concept with two PIH mappings, one numeric and one text. The numeric matches the concept_id but nothing is strictly enforcing this, this was just how that mapping was created back in the day:

Concept with one PIH mapping, text:

Concept with no PIH mappings:

2021-08-17_09-17

Note that for PIH mappings that are numeric, the mapping most likely (and maybe always?) equals the concept_id on the “golden” server, but we haven’t been automatically creating that that mapping when new concepts are created…

If the idea is that we want/need a 1-to-1 concept reference mapping on the concept that would we used to create the primary key in OCL, we probably should add a new source like “PIH_OCL_CODE” and just assign those to all concepts in the PIH concept dictionary (possibly based on the concept_id on the “golden” server).

We could instead just assign numeric PIH mappings to all concepts that don’t have them, but we’d also have to address the dozen or so concepts you found that already have two mappings, for one reason or another. (Which is completely legal from a mapping perspective, but not really from a primary key objective).

@burke , as a small add-on, @ball mentioned that you were only importing the numeric PIH codes (hence probably why you didn’t see a code on “Social Skills Evaluation (text)”. We definitely use those text-based based codes to reference concepts on our forms, so they definitely will need to be imported into OCL.

Yes. I know PIH uses mappings to PIH codes (both numeric like PIH:3 and text like PIH:ASTHMA) to find concepts. When @ball sent PIH concepts with random concept IDs (concept.concept_id in the database), I thought I might be able to reverse engineer the “gold” concept IDs for PIH by using the numeric PIH codes in mappings. But I soon discovered:

  • Not all concepts have a mappings with numeric PIH codes
  • Some (n=16) concepts have two mappings to numeric PIH codes

Then I thought to consider CIEL mappings… perhaps concepts didn’t have a PIH code because they were CIEL concepts (your forms could refer to them with the CIEL code like CIEL:5586, making a PIH code unnecessary). But there are hundreds of examples of concepts with neither a PIH (numeric) code nor CIEL code.

I’m assuming the handful of concepts with two numeric PIH codes are concepts where an original concepts was retired – i.e., one PIH code reflects the retired concept’s “gold” ID and the other reflects the new/current “gold” ID for the concept. I believe some of these duplicate PIH codes have made their way into CIEL (CIEL has maintained mappings to legacy PIH codes).

This is close, but not quite the point. Also, we don’t want to have to create a special “X_OCL_CODE” source for every implementation. The goal is to allow humans to uniquely refer to PIH concepts. It’s not a PIH “OCL” code that we want… it’s a PIH code. While you might be able to program tools to use the UUIDs, today how could @akanter (for CIEL) or someone at AMPATH trying to map to the PIH dictionary map to you Social Skills Evaluation (text) concept? Just like LOINC, SNOMED, ICD, AMPATH, etc. have a unique numeric code for every concept, we’d like the same for PIH.

If I had a time machine, I would have added concept.code to the OpenMRS data model to hold an official organizational “gold” code for concepts that – unlike the primary key for the table – could be reliably persisted across multiple servers and used to find concepts. PIH found a workaround for the lack of a concept.code: mapping to a PIH code. The advantage of using mappings is the same mechanism can be used for a PIH concept or CIEL concept. The challenge is nothing ensures such a mapping exists or that there’s only one per concept.

What we’d like to find is an approach that any OpenMRS implementer could use. Here are some potential approaches:

  1. Use concept ID. Create a SQL dump using “gold” ids. To import into OCL, an implementation will need to create a SQL dump using their “official” concept IDs.
  2. Use org mapping to numeric code. Add a PIH code mapping for every concept – i.e., give every concept an official code to be used both internally & externally.
  3. New map-type. Create a special map type (like “IDENTIFIED-AS”) for official “gold” concept IDs and make sure there’s one for every concept.
  4. Concept Code. Add concept.code to the OpenMRS data model and populate it with the official organizational code (“gold” id) for each concept.

If I had a time machine, I’d favor the concept.code approach, since a required unique attribute on the concept table would ensure every concept has a reliable organizational “gold” ID associated with it that isn’t conflated with the table’s primary key. But that’s a big (giant) lift.

Using concept ID (insisting the SQL dump use gold IDs) is the easiest for me & for OCL, but might be a pain for an organization like PIH. Not only would you need to ensure every concept was assigned a gold ID, you’d need to make sure the SQL dump we use to import into OCL uses those IDs.

I don’t like the idea of introducing a new map type. Plus, these would duplicate many of your existing mappings.

Option #2 (Use org mapping to numeric code) might be the best option, since you already have these for many of your concepts. If you copied the CIEL code for those without a numeric PIH code, you’d only have a few hundred concepts without PIH codes. Assuming you still have a “gold” concept server, you could just use whatever concept ID it has for these. If we chose this option, we’d need to do two things: (1) using a different mapping – like “ASSOCIATED-WITH” – for mapping to codes of retired/replaced concepts and (2) add a validation rule to ensure OpenMRS concepts only have one numeric SAME-AS mapping to the source organization.

I think option #1 (using concept ID) – where PIH does the work – or option #2 (use numeric SAME-AS mappings) – where we share the work – are the most viable options.

Thanks for the detailed explanation @burke !

I do think that Option #2 (quoted above) makes the most sense, thought it’s worth reviewing with @akanter and the rest of the community whether there would be issues with adding the restriction: “add a validation rule to ensure OpenMRS concepts only have one numeric SAME-AS mapping to the source organization.

(Also, if I had a time machine I’d be doing much cooler things than changing the OpenMRS data model) :slight_smile:

Thanks for suffering through my pleonasm, @mogoodrich. I’ve only come up with two uses for a numeric SAME-AS mapping to the same organization:

  • Identifying a concept’s official code
  • Linking to a concept that was replaced (a retired concept)

I haven’t been able to think of a scenario where a dictionary would link two non-retired concepts as SAME-AS, since having two or more active concepts for the exact same thing within a dictionary seems like an anti-pattern.

@akanter if we used “Associated with” to link concepts to prior (retired/replaced) concepts, can you think of any other case where you or an organization might use a SAME-AS mapping between two concepts within the same organization?

I think we’ll need a different relationship type to described maps to retired codes. Associated with… is used in SNOMED for other things. As for multiple SAME-AS maps, I know this happens in SNOMED due to duplicate concepts. I think IMO has had to address these on a case-by-case basis. I know that AMPATH had actual duplicates and they might not have retired all but one… so we might have to address that with each source. For example, I might have to pick one as SAME-AS and arbitrarily assign the retired map code to others.

Okay. How about something that covers more use cases than retiring concepts? I’m thinking of something like “REPLACES”. We already have “MOVED FROM” and “WAS A”… maybe using one of those would be better (e.g., “MOVED FROM”) to avoid having to introduce a completely new map type. We’re going to need to ensure code used to look up concepts via SAME-AS mapping will still work for this new map type (e.g., finding a concept from a mapping in a form or report).

@mogoodrich does PIH still maintain a “gold” dictionary server? If so, could you give me a dump of concept IDs + UUID + fully specified name from that server? I could use it to estimate the “official” PIH codes for most concepts while we’re testing & working through the details.

@burke yeah, I should be able to get you that from our concept server… the issue is that it will have a lot more concepts than the PIH EMR set @ball gave you… do you need it culled down to the ones we actually want in OCL? And although I think the concept ids should match the PIH mappings in all cases, I can’t confirm it.

It doesn’t matter. I won’t use it to create new concepts, just to try to get the concepts I have from Ellen closer to the actual PIH codes (e.g., when I have a PIH concept without a PIH code, I can default to the concept ID on the gold server, and when there’s a concept with two PIH codes, the gold list can help me figure out which is the active PIH code).

Just a list (csv) with concept ID, fully specified name, uuid, and retired (Y/N) would work.

1 Like

This consideration makes me think that having a specific “REPLACES” map type, e.g., to capture a relationship where one concept should be treated as if it were a previously extant concept is something worth having. There’s a pretty low cost to adding new map types and a higher potential cost to re-purposing existing ones (i.e., now someone’s report contains unexpected data and they are unaware it’s because we decided that “MOVED FROM” means “should be mapped to”).

@burke just emailed you the CSV… three concepts were missing in my export, assumedly because they don’t have fully-qualified English names… however I assume those are invalid/old concepts and probably not in the set you got from @ball … let me know if you have any questions and/or need anything else.

Take care, Mark

The requirement to address remaps must also be in SNOMED, so perhaps there is already a best practice here.

1 Like

@akanter what do you mean by “the requirement to address remaps”?

I am assuming that SNOMED has the same problem where a code has been retired and it gets remapped onto a new code. They use a REFSET system for this, but I am guessing they have a preferred relationship type for the remaps.