OCL for OpenMRS: Sources and Collections

karuhanga · January 9, 2020, 3:13pm

I’d like your thoughts on this, I’d shared this on the google doc but never quite got feedback on it

Proposed Design Change

This document refers to a ‘dictionary’, the assumption being that sources and collections are abstracted into this single concept(a dictionary). The current approach we are taking to keep the source and collection in sync for custom concepts is whenever items are added to or updated in the source, we update the collection reference so that it points to the latest version. During the past few months this has been a major source of confusion for devs on the project and very bug prone.
I am proposing a small rethink of this process.
When a user creates a dictionary

Currently	Proposed
* We create a source(source) and collection(dictionary). The source holds custom concepts and the dictionary is kept up to date as described above	* We create a source(source) and two collections(collection and dictionary). One of the collection acts as the ‘dictionary’ and the other remains a collection, holding references to concepts and mappings from other sources e.g CIEL, that are not custom
* Permissions for these are kept in sync as well(Updating dictionary visibility updates permissions on both the source and collection)	* The source and the collection are private. The permissions set by the user only apply to the dictionary
* Creating a release creates a new version of the dictionary	* Creating a release deletes all references from the dictionary, then copies all references in the collection into the dictionary and all concepts and mappings from the source into the dictionary
* Updating concepts and mappings requires an immediate update of the each updated item in the dictionary.	* Updating concept and mappings does not require immediate updates to the dictionary as this will be done at a single point when releasing a new dictionary version

The goal of this is to

make it conceptually clear to whoever is working with the application what happens when and makes it clear to the user where there concepts live
Decouple sources and collections as the API does. This might be important going forward
makes it considerably easier to work with the API as it works within its design.

None of this requires any significant changes to the API and for v.1.0.0, won’t change anything from the user’s perspective.

burke · January 9, 2020, 4:04pm

I thought an OCL “source” was a version repository of original concepts[ref] and “collection” was a reference to concepts from any combination of sources and/or other collections and assumed collection references were either static (referencing a source’s concept at a specific point in time to support versioning) or dynamic (always return the head/current definition of the concept from the source).

If that were true, the only reason to create a collection when creating a source would be for releasing a version, right? Is the reason for automatically creating a collection when a source is created because there is not a way for collections to make a dynamic reference to a source concept?

Perhaps my assumptions are incorrect and the reason you’re suggesting two collection be created is to have one “dynamic” (the private collection) and one “static” collection (representing a released version of the source).

karuhanga · January 10, 2020, 4:02pm

At the moment, we have no way of adding ‘dynamic’ refs. Only static ones. Everything

The primary goal of this change is to stop coupling source updates to the dictionary updates and allow for us to start thinking of a ‘dictionary’ on its own which in the future allows us to start doing dictionary only and collection things that at the moment are coupled together and not consistent with the api structure.

akanter · January 10, 2020, 7:22pm

Burke, I think you are correct. The idea of creating a collection for PIH’s source was a way to get around the bug which restricted collections to the CIEL source, or some limitation of the subscription module which required a URL from a collection. I don’t see why you’d want to create a collection for an entire source.

The opposite is true, however. Which is that if you add custom concepts to a collection (not coming from a published source), you need to create a source for those custom concepts since the collection cannot also be the source.

Not sure if this answered your question. Andy

burke · January 10, 2020, 10:26pm

Okay, I think I’m getting there.

Unfortunately, repository versioning didn’t get documented in ocl_web. I’m assuming sources (and perhaps collection too) are versioned, which would act like tagging in gihub – i.e., publish a reference to each concept in a source (or collection) at that point in time. Is that correct? From @karuhanga post, it sounds like there’s more cloning than tagging going on with versions (i.e., instead of publishing a reference to some point in the history of a concept, we’re making a copy of the concept).

I can see why anyone wanting to combine another source with their own concepts would require, at a minimum, their own source (in which to create & edit their own concepts) and at least one collection to combine concepts from their source with the other source.

From @karuhanga’s post, it sounds like we’re auto-updating the collection so editing concepts in the custom source are immediately reflected in the collection. I presume this was done to avoid making people publish versions of their custom concepts for their own use. In other words, if you want to use CIEL and add a few concepts of your own, you don’t want to have to publish version 1.0 of your custom concepts and then update your collection to use this new published version; rather, you just want to see your custom concepts appearing in the collection.

So, @karuhanga is suggesting we keep this “dynamic” collection private (only for the owners use) and force them to make a second collection if/when they decide to publish a version of their dictionary.

Do we need to create the second collection up front? I’d assume it only need be created if/when the person wants to publish a version of their dictionary for others to consume or for their own concept management purposes.

karuhanga · January 11, 2020, 7:58am

The versioning is still going on, but since the references are not dynamic, we need to update the dictionary with a new concept reference after an update.

Everyone that uses the client will eventually need to do this. We might as well create it upfront.

darius · January 12, 2020, 6:32am

I’ve just had a chance to skim this. It’s an interesting idea, though I’m not sure I fully understand. It would help if you named the two proposed collections and referred to them by those names in the bullet points.

It sounds like the biggest issue is having to simultaneously update the source and the collection from the front end UI, which can never really do this atomically. The idea was always that this synchronization should really be handled by the back end. Could we pursue getting this implemented in the OCL API? Would that be the right way to address the underlying issue?

In my original design I said we’d need to automatically/immediately update the collection so that when you display a dictionary (an aggregated summary, or else paging through the concepts) there is just one thing on the backend that represents this. With your proposed change, won’t you need to constantly be querying the source (for custom concepts) and one of the collections, and merging the results, for each UI display screen? And if you have thousands of concepts, won’t this break down?

akanter · January 13, 2020, 3:26pm

I think this discussion about sources and collections, and custom dictionaries with their custom sources needs its own thread. I don’t think I have permission to do that. @Burke can you?

I believe we should resurrect the graphic which lays this out since it seems that there are behaviors specific to the custom dictionary/source than regular collections build from sources published by others.

There is also the public/private designations on sources/dictionaries(collections).

In principle, I do not think that collections created from public sources should auto update when the sources publish new versions. This should be done through a notification and reconciliation process, to ensure the user wants all the changes in a new published source. For custom dictionaries where the source is being created automatically, neither the custom collection/dictionary, nor its underlying custom source probably should be usable in another collection unless they are published. The user’s starter collection which includes the custom sources can be used by them without publishing, but I do think we want to enforce control if any custom work is to be used by others. Since I don’t think the subscription module will allow subscribing to a dictionary/collection that is not published, we probably should automatically publish a version of the custom source when a custom collection is published.

akanter · January 13, 2020, 8:30pm

Please see drawing. In my mind, the custom source should be associated with the owner/author and not the dictionary since the owner might want to use the custom concept in other dictionaries. However, there are complications based on the following. The Abc dictionary requires publishing before it can be used by others or as a subscription. At the time of publishing the Abc dictionary, the source for the corresponding custom concepts should be auto published. This becomes more complicated if the source is based on the user and not the custom dictionary (since any of the user’s dictionaries being published would force publishing of the custom source).

Comments?

karuhanga · January 14, 2020, 12:42pm

Yes. I also considered the fact that we are hiding the existence of a source potentially problematic in the future. I think it would be better to be clear to the user on where everything is being kept, i.e custom concepts live in the source and other people’s concepts that you choose to bring in live in this collection. We helpfully combine the two for you in the dictionary which you can subscribe to.

No, they’d view custom concepts and non custom ones on separate screens.

If we can get dynamic references implemented, a lot of this would be simplified but we’d need to add mapping references on creation.

**I’d like to also add a caveat, that I did some testing over the weekend and the batch copying process after deletion might take prohibitively long to use.

karuhanga · January 14, 2020, 12:47pm

I don’t think this would be a problem, since users can query their own private sources. I don’t know if keeping all of a person’s concepts in a single source is the right approach.

Ahhh! On second thought, you do raise a good point @akanter, if sources are tied to the owner like we currently do, we’d have to allow the user query from all sources they have access to, including those owned by their orgs.

darius · January 14, 2020, 6:12pm

My original goal was to drastically simplify the the mental model required for users, i.e. an entry-level OpenMRS admin, to do concept management correctly. I.e. they don’t really need to know about source/collection. They just need to know that they’re creating a Dictionary that combines their custom concepts, and shared concepts from others. (This may or may not be right to stick to at this point. I’d just say that the PIH and AMPATH folks who I assume are acting as MVP users are much more advanced and experienced than the average user at scale.)

Okay, this is exactly what I was going to suggest. If it significantly simplifies the implementation and de-risks things, then modifying the UI to show “Custom concepts” and “Shared concepts” separately should be totally doable. I would still want the summary page to combine statistics (e.g. # of diagnosis concepts) across both of them.

I was specifically thinking of a backend feature where whenever you make a change to a source, it automatically updates exactly one collection with this change. I thought we had discussed that with @paynejd and it just needed someone to work on it.

100% agree with this.

Good points.

My position is that for the first implementation we should do things in the simplest possible conceptual way, which is to have the source tied to the dictionary, and not reusable in other dictionaries.

For letting an org/user have a source that they share across multiple dictionaries, I prefer to implement this by (1) having a “create my new dictionary based on this source” function, and (2) supporting multiple preferred sources in a dictionary, e.g. CIEL and PIH-Shared. #2 simplifies the lifecycle, at the expense of the user having to do more work to merge in updated concepts because e.g. PIH-Hospital-X wouldn’t automatically get updates from PIH-Shared (just like it wouldn’t get automatic updates from CIEL).

Stepping back, I feel like I must have missed a key discussion, because I don’t understand the underlying motivation of some of these proposed changes. If someone can fill me in, that would be helpful, but maybe impractical.

Also, just to be explicit, you all are completely welcome to take over the design and evolve it as needed. I wrote down a vision for this years ago. But I’m not well positioned to know how to get things to completion today.

I’m also willing to engage more if you want me to keep (co-?)owning design aspects. Let me know.

akanter · January 14, 2020, 6:22pm

darius:

My position is that for the first implementation we should do things in the simplest possible conceptual way, which is to have the source tied to the dictionary, and not reusable in other dictionaries.

For letting an org/user have a source that they share across multiple dictionaries, I prefer to implement this by (1) having a “create my new dictionary based on this source” function, and (2) supporting multiple preferred sources in a dictionary, e.g. CIEL and PIH-Shared. #2 simplifies the lifecycle, at the expense of the user having to do more work to merge in updated concepts because e.g. PIH-Hospital-X wouldn’t automatically get updates from PIH-Shared (just like it wouldn’t get automatic updates from CIEL).

Stepping back, I feel like I must have missed a key discussion, because I don’t understand the underlying motivation of some of these proposed changes. If someone can fill me in, that would be helpful, but maybe impractical.

Also, just to be explicit, you all are completely welcome to take over the design and evolve it as needed. I wrote down a vision for this years ago. But I’m not well positioned to know how to get things to completion today.

I’m also willing to engage more if you want me to keep (co-?)owning design aspects. Let me know.

Darius, I would agree with this. I do want us to understand the visibility and use of dictionaries/sources depending on published/unpublished and private/public status in all parts of the lifecycle of a dictionary/source.

As for your involvement… I would love for you to stay involved, but I think we have turned over ownership of the design to the squad being led by PIH and AMPATH. Please keep in the discussion though, since your perspective is incredibly valuable and desired.

jennifer · January 15, 2020, 12:41am

@darius I think everyone would welcome your ongoing involvement! We’ve kept the Wednesday 7am PST OCL for OpenMRS time slot for the weekly meetings and set up a dedicated Talk thread for the squad. Seems to me like most of the Talk discussions are happening on that thread, here, and on the Developing the “OCL for OpenMRS” Application.

ball · January 15, 2020, 2:29pm

Absolutely, @darius. Your experience is invaluable. I’m sure it will be satisfying to continue – at least until this is in production and fully appreciated at PIH and AMPATH.

@darius Is your design document included as a reference on the OCL for OpenMRS squad page?

darius · January 21, 2020, 10:44pm

It’s the second bullet point under Reference Docs on that page.

I’ll try to join the squad call tomorrow.

karuhanga · February 13, 2020, 11:31am

You didn’t, it was something I thought about a while ago and wrote down in the doc to try and understand whether it would simplify the flow and development, but after talking to you guys, I’m convinced that with the current backend, the already existing approach is indeed the better one.

Dropping this.