As we work on steps to support migrating organizations like PIH to OCL, we discovered that we need to define explicit conventions for source name equivalency. This will be used for when both importing & exporting concepts to/from OCL (e.g., in any import processes and within the OCL subscription module).
Initially, I thought we could remove everything but letters and numbers
When I test this against all current OCL sources, I found two examples where this approach might mistakenly conflate two different sources (i.e., two different source names would become identical). The first is “ICD-10” and “ICD 10”, which I think we can all agree are both referring to ICD-10. However, the second case is “KenyaEMR” and “KenyaEMR+”.
I suspect “KenayEMR” and “KenyaEMR+” are not the same source.
This makes me think we should be more selective in what we remove. For example, just remove whitespace, hyphens, and underscores (s/[ \-_]//g
) before doing a case-insensitive comparison. This would treat use of hyphen or space in “SNOMED-CT” and “ICD-10” as equivalent, but would treat “KenyaEMR” and “KenyaEMR+” as distinct sources.
Proposed convention for source name equivalency:
Case-insensitive comparison of source names, ignoring whitespace, hyphens, and underscores.
function normalize(s) { return s.toLowerCase().replace(/[ \-_]/gi, ""); } function assertEquals(a, b) { console.assert( normalize(a) == normalize(b) ); } function assertNotEquals(a, b) { console.assert( normalize(a) != normalize(b) ); } assertEquals("ICD-10", "ICD 10"); assertEquals("ICD-10", "ICD10"); assertNotEquals("KenyaEMR", "KenyaEMR+");
Thoughts?