OCL Concept Identity

While this might sound a bit like this thread, this is actually intended as a discussion of a different point about concept identity in the OCL module, specifically, how we relate mappings to concepts.

There are two different things that the OCL module uses to determine concept identity:

  • The external_id (in OCL) refers to the UUID of the concept.
  • The “URL” of the concept from OCL prepended with the name of the OCL server the OCL module is synchronised with.

The first of these is more important for determining how the module interacts with existing entries in the concept dictionary: essentially, if a concept exists with the same UUID as the OCL concept, the OCL concept will overwrite it.

The second of these, however, is important for determining how the module deals with OCL’s internal references.

Each object in OCL has a few system identifiers:

  • a UUID field (which is so named for legacy purposes… it actually stores an auto-incrementing database identifier)
  • an ID field (which stores the “mnemonic” for the concept, e.g. “1065” for CIEL:1065 or " 95941-1" for LOINC: 95941-1)
  • a URL (which identifies the object in the REST API)
  • A version URL (which identifies the specific version of the object in the REST API).

For the current version of CIEL: 1065 the relevant fields look like this:

{
    "external_id": "1065AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
    "id": "1065",
    "name": "1065",
    "owner": "CIEL",
    "owner_type": "Organization",
    "owner_url": "/orgs/CIEL/",
    "source": "CIEL",
    "type": "Concept",
    "url": "/orgs/CIEL/sources/CIEL/concepts/1065/",
    "uuid": "3678",
    "version": "3678",
    "version_url": "/orgs/CIEL/sources/CIEL/concepts/1065/470101/",
    "versions_url": "/orgs/CIEL/sources/CIEL/concepts/1065/versions/"
}

We use the URL and version URL fields in the following ways:

  1. When importing a concept or mapping, we check to see if it was already imported and if so if this version was already imported. If it’s already imported, we simply skip the concept.
  2. When importing a mapping, the concept URL is used to determine which two concepts the mapping pertains to.

2 above comes about because the mappings we receive from the API look like this:

{
    "external_id": "7180CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC",
    "from_concept_code": "155569",
    "from_concept_url": "/orgs/CIEL/sources/CIEL/concepts/155569/",
    "from_source_name": "CIEL",
    "from_source_owner": "CIEL",
    "from_source_owner_type": "Organization",
    "from_source_url": "/orgs/CIEL/sources/CIEL/",
    "from_source_version": null,
    "id": "303992",
    "map_type": "Q-AND-A",
    "owner": "CIEL",
    "owner_type": "Organization",
    "source": "CIEL",
    "to_concept_code": "1065",
    "to_concept_url": "/orgs/CIEL/sources/CIEL/concepts/1065/",
    "to_source_name": "CIEL",
    "to_source_owner": "CIEL",
    "to_source_owner_type": "Organization",
    "to_source_url": "/orgs/CIEL/sources/CIEL/",
    "to_source_version": null,
    "type": "Mapping",
    "url": "/orgs/CIEL/sources/CIEL/mappings/303992/",
    "uuid": "303992",
    "version": "303992",
    "version_url": "/orgs/CIEL/sources/CIEL/mappings/303992/1226376/",
    "versioned_object_id": 303992,
    "versioned_object_url": "/orgs/CIEL/sources/CIEL/mappings/303992/"
}

Basically, we just leverage the from_concept_url and to_concept_url properties to work out the concepts being referred to.

So why does this matter?

Currently in the OCL module we alway pre-pend a server URL to the URLs so that we can distinguish between a concept from api.openconceptlab.org (the OCL server), api.staging.openconceptlab.org (the OCL staging server), or something like api.ocl.brown.edu (e.g. if, for whatever reason I were to stand up my own OCL instance). The way we determine bit to pre-pend to the URL is derived from the global property that the subscription URL (if any) is saved in.

This works fine in the case where the subscription module is used to subscribe to a live OCL instance and would work for the offline case if someone entered a subscription URL and then uploaded a zip file using the UI (the “offline mode” feature). Where this begins to fall down is in a few of the cases that we’re currently contemplating. For instance, if we import the zip file on start-up without setting the subscription URL (as can happen today or when we get things working with Initializer), then it’s possible to end up with concepts with inconsistent URLs. Likewise, @mseaton relatively recently opened a PR to add the ability to upload a single concept from OCL as a file downloaded from the Term Browser, which seems like a useful feature.

So, the question is, how should we support these use-cases where an actual subscription may not be present while still maintaining the ability to import mappings correctly?

Option 1: We stop prepending anything to the URLs and just use the URLs as they are in the files we get from OCL. This is by far the simplest and least disruptive option, but it would mean that we can’t distinguish between concepts loaded from two different instances of OCL, which may be a problem down the road if countries and implementers opt to host their own OCL instances rather than using the central server.

Option 2: We continue with things as they are but where the subscription URL isn’t filled in, we assume that the content came from https://api.openconceptlab.org, i.e., the OCL server that the OCL team at Regenstrieff supports.

Option 3: We continue things as they are today and let implementers deal with potential inconsistencies where data was both loaded from a zip file and a live subscription.

@ibacher , great post, really well described. One point to clarify if you can: how is the subscription URL used? i.e. when a server URL is pre-pended in the OCL module, what impact does this have on the actual operation of the module? What uses this, and how does it impact a particular import?

So the subscription URL has three uses:

  • It’s used as the URL to download the actual concepts from OCL
  • It’s used to determine whether the subscription is to a “snapshot” version of a collection (basically meaning the HEAD version) or an actual release. For cases where it’s subscribed to an actual release, the release is only downloaded once.
  • It’s used to construct the base URL, although here we only use the first part of the URL (code).

I’ll ignore the snapshot / non-snapshot parts for now. Basically, the base url gets pre-pended to any of the URLs when they are used. So, for example, if I have my subscription URL set to https://api.openconceptlab.org/orgs/CIEL/collections/COVID-19-Starter-Set/v1.9/ (i.e., the most recent release of the CIEL COVID starter set), then any concepts I import as part of the subscription will have the https://api.openconceptlab.org pre-prended to them. So the “url” property for the 1065 concept I used as a demo above would become https://api.openconceptlab.org/orgs/CIEL/sources/CIEL/concepts/1065/. This gets saved in a record in the “openconceptlab_item” table and is used by the module when it encounters a new concept to see if it has already imported a version of the concept.

Now, if the concept mapping I gave an example of above were part of that starter set, when it was imported, the module would change the from_concept_url to https://api.openconceptlab.org/orgs/CIEL/sources/CIEL/concepts/155569/ and the to_concept_url to https://api.openconceptlab.org/orgs/CIEL/sources/CIEL/concepts/1065/. It uses that URL to look up the corresponding items in the “openconceptlab_item” table. One of the entries on the “openconceptlab_item” table is the UUID of the concept when it was imported, so essentially it’s used to map an OCL url to the UUID of the concept in the local concept dictionary.

So, my particular concern is that with the way things are setup, if, say, you were to upload the CIEL:1065 concept into your dictionary at one time and then a different CIEL concept that used CIEL:1065 as an answer later, the module might not be able to create the appropriate concept_answer because it wouldn’t recognise that, e.g., file:///tmp/file.json/orgs/CIEL/sources/CIEL/concepts/1065/ was the same as file:///tmp/file2.json/orgs/CIEL/sources/CIEL/concepts/1065/.

Similarly, if I imported a bunch of concepts using Iniz and an OCL export zip, I might end up with URLs like /orgs/CIEL/sources/CIEL/concepts/1065/ which then wouldn’t match https://api.openconceptlab.org/orgs/CIEL/sources/CIEL/concepts/1065/ (which I might get a reference to once I actually subscribed).

Does that answer you questions?

This would be my vote. I don’t personally think that supporting a world in which multiple OCL instances exist and each contribute to the same OpenMRS instance, while also containing organizations with the same name and same concepts but which have different meanings and definitions, is something we need to code around up front.

If we ever need this down the line (is this likely?) it would seem straightforward to add an additional column at that time into the openconceptlab_item table to store whatever information we need to make this distinction. For now this just seems like added and unnecessary complexity.

My 2 cents.

2 Likes

I like Mike’s perspective. I also thought we were only allowing subscription to versioned releases of a dictionary and so HEAD would not be allowed. I think we need a global OCL view (@paynejd ) where there could be more than one OCL server. Since the goal is shared, curated content, then I think sharing of non-OCL servers for more than an individual organization’s use case would be really discouraged.

Technically, the way the module was built, it was initially built to support only subscribing to HEAD and had support for released versions added in 2016. We haven’t removed support for HEAD versions from the module, though it’s definitely not the recommended way to use it. I think the only thing that was done on this front is that in the Dictionary Manager the only subscription links exposed are those to specific versions.

Fundamentally, there are two ways to uniquely identify a concept:

  1. UUID
  2. Authority + Code ( + Version)

The URLs are less than perfect substitutes and it shouldn’t matter where (which host) or how (online or offline) a concept or mapping is obtained if the identity is the same (i.e., a matching UUID or authority+code should be sufficient for identity).

It would help if we got a canonical URI for the source (authority) along with the code. The OCL team is working toward that approach. In the meantime, we’re left trying to infer these from what we can get from the API.

So, I would lean toward option #1 and put the responsibility on sources/hosts not to present the same concepts or mappings differently in different locations (e.g., if CIEL v2021-10-12 concept 123 is hosted in 17 different places, I would expect it to be the same across all 17 hosts).