What approach to use when cleaning up an old concept dictionary with duplicates to baseline to CIEL

We have inherited an old legacy concept dictionary, which was (and still is) being continuously updated using SQL scripts.

Going forward we are looking into standardizing on CIEL as the source for concepts, and deploying concepts via MDS.

The challenges I see:

  1. Even within CIEL, what source dictionary mapping (PIH, AMPATH, SNOMED) was used for the WHO Express installation from 1.6.3 (Does anybody know where this came from) as I see combinations of all of them

  2. What is the plan for custom concepts which are not in CIEL - are there concept Ids that are reserved for private implementations which will not be overwritten by CIEL?

  3. When replacing duplicates, how do I decide which duplicate concept to keep and which one to remove - is there another way around this?

  1. Even within CIEL, what source dictionary mapping (PIH, AMPATH, SNOMED) was used for the WHO Express installation from 1.6.3 (Does anybody know where this came from) as I see combinations of all of them

AK> To

  1. What is the plan for custom concepts which are not in CIEL - are there concept Ids that are reserved for private implementations which will not be overwritten by CIEL?

AK> The idea is that OCL will allow you to manage custom “Local” concepts in the cloud and synch them down on to your server with the CIEL content that you want. The local concept IDs are irrelevant and will be assigned by the module as they come down. @rafal or @paynejd might be able to provide more detail.

  1. When replacing duplicates, how do I decide which duplicate concept to keep and which one to remove - is there another way around this?

AK> I don’t understand what you mean. If you are referring to existing concepts which map from CIEL to your dictionary, by attaching the CIEL source and code to the reference map, you should be able to say that going forward, changes to CIEL will be proposed to that concept. They also will not be duplicated as OCL will assume that the concept already exists on your server.

@akanter Thank you for the information, so what URL do I subscribe to where I can create the custom “local” concepts?

My question on the duplicates are the ones that I find in the legacy database - as this has evolved over time so we are looking for ways of getting the concepts and distributing via MDS with validation.

I think you need to subscribe to OCL to do this (via the OpenMRS Module). @paynejd can explain. As for duplicates that pre-exist, you definitely want to retire all but one. You can use common SNOMED CT reference maps, or the same CIEL concept maps to identify which sets of terms need to be reviewed, but the final retiring step will have to be done manually.

By the way, it’s worth pointing out that MDS has scale issues – it seems to slow down O(n^2) as the package size grows.

So, depending on your concept dictionary size, this may not be a good strategy. (Maybe @ball or someone can comment on how many concepts PIH has in its Mirebalais installation, and how long it takes to install the package.)

The correct approach in the future will be to use OCL, basically to upload all the concepts there, and use OCL to pick your relevant subsets, and then have implementations subscribe to the subset. However this functionality is not yet complete, and we don’t really have an expected date when it will be ready.

For the Mirebalais installation, we use 17 mds packages for managing 3000+ concepts. (A small amount of other metadata (ie. encounter_type, etc) are included in those packages.) Some packages (ie. oncology or mental health) are relatively small and can be imported/exported quickly. Other packages contain all our diagnoses (1400+) and can take 30+ minutes to create the mds package about 60 minutes to build the Mirebalais metadata module.

We look forward to using OCL when the functionality is added.

As for cleaning up an old concept dictionary, I have spent lots of time doing this. I have some mysql and groovy scripts which search for concepts without obs, programs, workflow, states, and void/delete unused concepts. This is challenging since we use htmlforms. It is possible that a question exists on a form but never used/answered. It requires much testing. The validation modules is also helpful for finding duplicate names and null descriptions. Let me know if you are interested in my home-grown scripts. They are not without hiccups.

Regards,

Ellen

@ball I am very interested in your home grown scripts as I have tried to do this twice without too much success.

@akanter I am finding active concepts like this:

  • 116344 (CEIL) and 890 (AMPATH) for leprosy
  • 117767 (CEIL) and 893 (AMPATH) for Gonorrohea

The question is do we keep CEIL and delete the AMPATH ones? We can go ahead and update the forms to use the correct concepts, so that is not an issue at this moment.

Please ensure you are using the concepts that are not retired - so use 116344

FYI - CIEL team provides curation services to handle these issues on a pragmatic level . Talk to me or Andy

Judy

CIEL manages the maps to the AMPATH concepts. You should keep the CIEL concepts and retire the AMPATH ones.

@ssmusoke - I have many scripts for cleanup of everything with OpenMRS concepts.

There is one script which generates temporary tables with used/unused concepts. This can/should be manually modified with any concepts which are used on htmlforms and you need to keep. It does not consider any obs-less concepts which are included in modules, xforms, InfoPath.

Enjoy,

Ellen

Thanks @ball

@ball does any of the scripts help separate between CEIL and mapped concepts so that I can delete the non-CIEL ones?

You want to delete the non-CIEL mapped concepts? You’ll need to write that, but it shouldn’t be difficult. Search for all concepts without CIEL mappings? Or you want to do this only for diagnoses?

Ellen

@ball For concepts with a CIEL mapping, I am looking to delete the non-CIEL mappings so that I can create an MDS package for distribution

@ball coming back for your help again. I have a couple of duplicate mappings with both CIEL and non-CIEL concepts inherited from a 1.6.3 base.

My question is how can I identify the CIEL mappings so that I can retire the rest for MDS package export?

Dear Stephen -

Not sure I understand your question. I had provided a link to some concept cleanup scripts which work with OpenMRS 1.9 (https://github.com/PIH/openmrs-concept-scripts). One of them is a groovy script which will remove maps from a specified source (ie. “local” in https://github.com/PIH/openmrs-concept-scripts/blob/master/groovy/delete_concept_maps.groovy).

It’s been a long time since we used OpenMRS 1.6 and the concept tables have changed much after that release. Let me know exactly what you need to do.

Ellen

@ball let me look at the groovy script. I have duplicate concepts in a 1.11.6 format database with mappings from 1.6 so I am trying to get rid of non-CIEL versions of the concepts and keep only CIEL mapped concepts going forward

You need to write a script to check for duplicate names (or you can use the validation module to find them for you). Check for a CIEL mapping for the concept OR a CIEL uuid (ending in AAAAAAAAAAAAAAAAAAAAs). Keep those and retire the others. It’s not so easy.

e

@akanter Sorry to review this thread, but I was just checking to ensure that my understanding is correct. As we wait for OCL to become more mainstream, is there a private batch of numeric concept Ids that an implementation can use safely without worrying about overwrite by CIEL?

If it does not exist, can you at least make a provision for it as then it means that implementors can safely add concepts while maintaining CIEL compatibility (running the latest dictionary SQL script) as it becomes available.

I know that there are curation services available, but at times you need a really localized concept to solve an issue immediately

Obviously to try to scratch my own itch, I have found that there is a gap between 86771 (inclusive) and 103154 (inclusive) which are empty in the CIEL Concept dictionary. Can one safely use these numeric IDs?

Trvia history question to fill some empty space in my head: How did this gap come to be?

cc @judy

Remember that you’d need a gap not just in the concept table, but in all the other ones like concept_name and concept_description.

-Darius