Defining our concept validation rules for OCL

burke · July 25, 2022, 2:00pm

@suruchi, from what I can see, OCL currently only validates that external_id is 36 characters in length or less. I created OCL issue #1340 to improve external_id validation.

burke · August 14, 2022, 7:58pm

FYI – I created a series of SQL queries for @akanter to check OpenMRS concept validation rules on the CIEL dictionary. These may be useful for others who are contemplating migrating their dictionary into OCL.

mseaton · September 2, 2022, 1:07pm

@burke - following up on this to clarify whether these same concept validation rules we are putting in place for OCL are expected to match the validation rules in OpenMRS. There are a few cases where I have questions:

Can you clarify what you mean here by “except short names”? Are you saying that for a given Concept, you can’t have exactly the same name in multiple locales, except for short names? For example, take an example like “Tumor”. Are you saying that this can’t be used as both the English and Spanish name for the same Concept? That would seem incorrect.

This is correct, but it leaves open a question for what should be enforced in a particular locale. I think there is some inconsistency around how we handle locales without a fully specified name. It seems like you are implying (and our existing ConceptValidator supports this) that as long as, say, there is an FSN in at least one locale, that other locales can have a variety of non-fully-specified names and not need to have an FSN themselves? Other aspects of the codebase contradict this (see my “Other questions and comments” below).

Specifically, should we enforce that if any names exist in a given locale for a concept that at least one of these names must be fully specified?

Can you clarify what you mean by (except short names and index terms)? I thought in the discussion it was agreed that FSNs need to be totally unique? Was this not updated?

Per my above point, shouldn’t we allow a single concept to have the exact same FSN in multiple locales, since there is no ambiguity around what Concept that FSN refers to?

Other questions and comments:

The OpenMRS Concept logic is spread across a variety of service methods, validators, and the Concept and ConceptName domain objects. This can and had led to a lot of inconsistencies. Locale plays a very prominent role in some unexpected ways - for example, searching for a concept by name will often exclude matches from locales that are not your current locale, which is not always desirable. It would be helpful to clearly lay out expected rules - not just for concept name validation, but also how the API should handle searching for concept across locales.
There is a lot of mutation happening in the Concept and ConceptName domain objects themselves, which are inconsistent with some of the points made in this thread. For example there is logic in the Concept class that, when adding a name to a Concept, will deliberately set it as the FSN if no FSN already exists in that locale. This leads to situations where the order in which names are added to a Concept can change those names in meaningful ways, which is not good. We actually run into this and have to work around it in the OCL subscription module, which attempts to populate a concept with names and needs to sort them and add them in a specific order for things to work correctly. But this logic implies that if any names are in a locale for a Concept, at least one must be fully specified.
Can you explain what the difference is between an “Index Term” and a “Synonym”? Are these the same thing? Or is a synonym different, and represented by a name with ConceptNameType = null? Because our ConceptValidator prevents marking Index Terms and Short Names as preferred.

Sorry for the long set of comments and questions…

burke · September 2, 2022, 2:16pm

Short names are used for a special purpose – to specify the name to use when labelling a row or column on a flowsheet spreadsheet – which might match an existing name for the concept. So, we do not require that short names are different from other names.

I don’t think so, since other locales are commonly used only for display purposes – i.e., to communicate the concept in another language – and not for concept management within that other locale. In many cases, we don’t have a terminologist who is fluent in every locale, which I would suspect would be important in creating a proper fully specified name.

The primary purpose of a FSN is to ensure there’s a way to uniquely refer to the concept by name. While it’s helpful to have a FSN in every locale, using FSN in another locale (e.g., English) is a convenient and reasonable fallback.

Short names aren’t required to be unique for the reasons mentioned earlier. Index terms also are not required to be unique because they are solely used to help users find the concept through searches, which means they could include misspellings or even the exact name of another concept (e.g., if you want anyone searching for “Thyroid 2nd generation test” to also see there’s a “Thyroid 3rd generation test”, you could use “Thyroid 2nd generation test” as a lookup term for the 3rd generation test).

This hasn’t come up because concepts are generally managed in English and we haven’t required a FSN in every locale (based on reasoning above). It’s also rare that FSNs would exactly match in different dialects. If we did require every locale used to have a FSN, then we would have to allow duplicates within a concept, since we’d often have duplicates across similar dialects (en-US and en-GB).

Agreed

Agreed. The assumption of FSN was likely an artifact from wanting to ensure we had a FSN for each concept and avoiding having to always manually remember to identify the FSN and a mistaken assumption that every locale must have a FSN. If a new concept is being created within one name, I could understand assuming that name is the FSN, but I’d agree we shouldn’t be automatically setting names to FSN, especially if the concept already has a FSN in another locale.

I don’t think this is viable for the reasons I mentioned earlier. Otherwise, we’ll undoubtedly end up labelling names as fully specified when they are not fully specified.

Synonym

An alternative name for a concept. May include colloquialisms or common abbreviations. For example, “GERD” for gastroesophageal reflux disease.

Index Term

A term used to find a concept when searching. These should never be displayed to users. May include common misspellings or the name of another or retired concept to help redirect users to a new/recommended concept. For example, “symtom” (sic) for “symptom” (in cases where the common misspelling isn’t captured by fuzzy searching algorithms). Could also be used to find a vaccine by barcode value without forcing users to see the barcode value as a valid “name” for the concept in any reports or searches.

burke · December 15, 2022, 2:55pm

@akanter, @raff brought up a question in an OCL call today that seems like it might be an important validation rule:

As we were walking through the cloning process, we realized this could create ambiguity. For example, if we are adding a CIEL concept to our local dictionary that has an answer of “No” (CIEL:1066) and we already have a local concept mapped to CIEL’s “No” (i.e., SAME-AS CIEL:1066), then we map the local copy of the CIEL concept we’re creating to our local “No” concept. But what if we had two or more local concepts mapped SAME-AS to CIEL:1066? Which one to use would be ambiguous.

This would not be saying all SAME-AS mappings must be unique; rather, saying that you can’t have more than one concept mapped SAME-AS to the same source & code. I suppose it could get troublesome if applied to mapping to reference terminologies (e.g., where a SNOMED-CT code effectively matches more than one local concept)… but those cases could be managed by more precise mappings (e.g., using map types other than SAME-AS for inexact mappings).

I scanned the version of the PIH dictionary we loaded into OCL’s staging environment and found only one example of a duplicate SAME-AS mapping to CIEL: both HIV Staging - Coccidiodomycosis, disseminated (5032) and HIV Staging - Child HSV infection (5037) are mapped SAME-AS to CIEL’s HIV staging - coccidiodomycosis, disseminated (5032), but this looks like it might be a mistake (i.e., Child HSV infection is not disseminated coccidiodomycosis). But there are multiple examples of duplicate SAME-AS mapping to other reference terminologies (SNOMED-CT, ProblemIT, ProcedureIT, LiberiaMOH, LOINC, ICD-10-WHO, ICPC2).

So, I suppose we would either need for enforce unique SAME-AS to CIEL (which seems hacky) or come up with a way to handle the ambiguity if you are cloning a CIEL concept and have more than one concept marked as equivalent to an answer or set member of that concept or one of its descendants (e.g., pick the one with lowest ID and emit a warning).

/cc @ball @ibacher

akanter · December 15, 2022, 3:55pm

This is a great question, and should only apply to non-retired concepts. Clearly there can be two different concepts mapped same-as to the same reference code if one is retired.

But in the example, I can see why someone might want to have a two local terms mapped to the same CIEL concepts to capture reference mappings, etc. The local terms having greater specificity than the CIEL term or required by the reference codes. However, these should not be SAME-AS maps. I can’t think of a reason why granular concepts like CIEL or SNOMED should ever have duplicate SAME-AS maps into a dictionary. I am a little worried that categorical terms like ICD might have redundancy, but I really need to do a data-informed look. As for the HSV example, that seems like an error.

burke · September 13, 2023, 4:12pm

While working through a release for CIEL v2023-09-11, we stumbled on an issue with one of our validation rules for dictionaries:

Fully specified names must be unique across all names (except short names and index terms) in a locale

As we discussed a couple of years ago, there are cases where FSNs aren’t truly fully-specified (e.g., “Correct” or “Lactic Acid”) and are used as synonyms on other concepts.

From discussing with @akanter, I think the proposed change to this rule would be:

Fully specified names must be unique across all fully-specified names and preferred synonyms in the same locale.

FYI – @akanter, @ball, @mseaton

p.s. I created OCL issue #1665 to update the OpenMRS Custom Validation Schema rules in OCL to reflect this change.

burke · December 1, 2023, 4:41pm

As folks have started trying out creating new content within OCL, we discovered that names & descriptions of new concepts created within OCL are not automatically assigned their own UUIDs (i.e., as an “external id” in OCL). This leads to concepts failing to successfully import into OpenMRS.

This issue is being addressed in OCL #1683. I also added this to our list of validation requirements (at the top of this thread) and filed OCL issue 1705 to include it in the OpenMRS custom validation schema rules.