Defining explicit conventions for concept source name equivalency

burke · January 29, 2022, 6:23am

As we work on steps to support migrating organizations like PIH to OCL, we discovered that we need to define explicit conventions for source name equivalency. This will be used for when both importing & exporting concepts to/from OCL (e.g., in any import processes and within the OCL subscription module).

Initially, I thought we could remove everything but letters and numbers

When I test this against all current OCL sources, I found two examples where this approach might mistakenly conflate two different sources (i.e., two different source names would become identical). The first is “ICD-10” and “ICD 10”, which I think we can all agree are both referring to ICD-10. However, the second case is “KenyaEMR” and “KenyaEMR+”.

I suspect “KenayEMR” and “KenyaEMR+” are not the same source.

This makes me think we should be more selective in what we remove. For example, just remove whitespace, hyphens, and underscores (s/[ \-_]//g) before doing a case-insensitive comparison. This would treat use of hyphen or space in “SNOMED-CT” and “ICD-10” as equivalent, but would treat “KenyaEMR” and “KenyaEMR+” as distinct sources.

Proposed convention for source name equivalency:

Case-insensitive comparison of source names, ignoring whitespace, hyphens, and underscores.

function normalize(s) { 
  return s.toLowerCase().replace(/[ \-_]/gi, "");
}
function assertEquals(a, b) {
  console.assert( normalize(a) == normalize(b) );
}
function assertNotEquals(a, b) {
  console.assert( normalize(a) != normalize(b) );
}

assertEquals("ICD-10", "ICD 10");
assertEquals("ICD-10", "ICD10");
assertNotEquals("KenyaEMR", "KenyaEMR+");

Thoughts?

/cc @mseaton @paynejd

mseaton · January 29, 2022, 1:53pm

Thanks @burke - yes, this will work well for us at PIH. The code we currently have in place to normalize source names does exactly what you propose:

Converts to UPPER CASE
Removes dashes, underscores, and spaces

With this in place all of our existing sources match with how sources are currently named in OCL staging.

Thanks, Mike

grace · February 2, 2022, 2:41pm

FWIW, yes, these are two separate distributions managed by separate organizations. KenyaEMR = pepfar funded, hiv-related care, led by Palladium. KenyaEMR+ = everything else not hiv-related, led by Uni of Nairobi (I believe).