We are investigating moving UgandaEMR concept management into OCL, however in order to reap the benefits, we would like to be able to download the concepts into an Iniz formatted CSV file to make it easy to load into our distributions
Is this a feature that can be added to ease the migration path to adopters?
The zip download from the OCL subscription module does not work for us since the zip is a binary and cannot be version controlled, this is the same challenge that forced us to move from Metadata sharing to Data Exchange based Dbunit XML files
I’m a big fan of this idea. I think the Iniz Concept CSV format is preferable as a transport mechanism to the OCL ZIP format for deploying concepts to EMR servers. Deploying to an actual EMR server, none of the “concept metadata you can represent in OCL ZIP but not Iniz CSV” actually matters. Iniz CSV is simply much easier to work with.
Do we need to choose between CSV or OCL JSON? Why not support both formats? If you’re happy with the features you can get from CSVs, then use CSVs. If you want features that CSVs don’t support, then use the OCL JSON format. This would allow us to use CSVs for the simple cases and avoid have to choose between abandoning support for features CSVs don’t handle well or trying to create hacky or gruesome solutions to jam every feature into CSVs (and lose their simplicity in the process). I believe we have run into some limitations to CSVs (e.g., having a preferred name per locale or some other synonym or name-related features).
It wouldn’t be hard make a tool to extract the zipped OCL JSON to a CSV in a deterministic way (predictable ordering to make diffs easier). So, if we support either format, then there are multiple benefits:
Use CSVs for the simple cases.
Use zip files (OCL JSON) when you need features that aren’t supported by the CSV format.
If you need features supported by OCL JSON and not CSV, you could use a tool to automatically generate corresponding CSV views of the zipped OCL JSON files in your CI pipeline to have human-friendly & github-diff-friendly views into the zipped contents.
While it’s certainly possible to push the work to OCL, it would create tight coupling between OCL and Iniz (i.e., different groups managing OCL and Iniz would be forced to ensure any changes to the CSV format would have to be coordinated and released in lock step). And we’d still have no easy & maintained way to get from one form to the other.
There already is an option to download as CSV from the Term Browser, though I believe it currently only handles up to 1000 entries for performance purposes. The JSON export is built asynchronously and stored in AWS in order to accommodate any size export (e.g., CIEL). I imagine they would have to implement the same thing for CSV as well.
If people are exporting a CSV from OCL, don’t you think they’re going to want to be able to simply drop it into OpenMRS (instead of annoyingly having to flip the ordering of some columns and change the casing of some column names)?
I had long pushed to make this an essential component of moving to OCL, but have since put this initiative on the back burner in order to prioritize just enabling PIH to transition from MDS packages to OCL packages so that we can more effectively start utilizing OCL for concept management. At the end of the day, distributing an OCL Json Zip is a not a vast improvement over doing the same with a MDS xml zip, but it’s no worse either. So at PIH we have tabled that to a later phase.
I still believe that committing large, impossible-to-compare zip files of metadata to a distribution is non-ideal, for the various reasons @ssmusoke shared. It’s not 100% accurate to say they “can’t be version controlled” - rather, the issue is that they can’t be easily diffed or reviewed in a change log, and so it is impossible to confirm in a particular commit that form X had these questions added and the corresponding concepts were added at the same time, and had the correct configuration. And it’s impossible to do a code review in a PR and review the nature of concept changes taking place in OpenMRS.
I do think it’s worth continuing to explore options for converting from OCL exports to Iniz CSV for these reasons. However, in my short spike on this several months ago, I found that it’s not as simple as I would have liked, as there are a lot of options for how one configures concept CSVs, and a lot that is in an OCL export that one might not choose to add if they are doing so manually (every mapping, every translation, etc).
If one were to pursue this, my assumption is that this code would live in one of these places (or likely in several of these places in some capacity):
In the Initializer module
In the OCL import module
In the SDK
Happy to be part of any discussions and design around this moving forward.
The default CSV export is based on the OpenMRS tables
A custom iniz export and it can be called Iniz CSV
IMO the amount of effort to keep up with changes with Iniz is so much lower than the value that the CSV export brings. As an implementor, we are not stuck at our adoption of OCL due to such seemingly small issues
Why not try adding this custom format to see the uptake as a custom export format which is common in enterprise applications
@burke Would there be a possibility to extend OCL’s CSV export to:
Support exporting the full set of concepts and mappings (presumably cached on S3 the same way the JSON zips are).
Adding (somehow) the concept names and descriptions to the CSV export (these could be separate files).
With those in place, we could look into modifying the OCL import module to support importing from CSV files rather than just the JSON files. That would at least get us to a point where we could export CSVs from OCL, check them into source control and load them into an OpenMRS instance.
This wouldn’t solve all the problems raised in this thread (the CSVs would likely still not be suitable for manual human review) but it might be some concrete steps in that direction.
There are, I think, two fundamental challenges here:
The OpenMRS Concept Dictionary itself is fairly complex (it’s comprised of 24 different tables) and needs to be simplified to be in a format suitable for human review. Iniz is a solid proposal for what such a simplified format could look like.
The OCL data model is very different from OpenMRS.
2 is more or less the problem that the OCL subscription / import module aims to solve. 1 is a harder problem, but if we can agree on some heuristics to simplify things (though with Iniz and especially @mseaton’s contributions over the past year) we’re getting closer to getting the simplifications to the right point (at this point the only major simplification we make is that Iniz only supports SAME-AS mappings; I’m also not quite so clear on how Iniz would support more exotic things like complex concepts).
I’ve done a little experimentation on converting from OCL to iniz CSV format. Here’s my Python Notebook using the CIEL’s COVID-19 Starter Set as an example and the resulting CSV (with its 86 columns ).
Some observations from my experimentation:
The CSV format is customized with special rules for column names
There are limitations on what can be represented in CSV format (e.g., controlling preferred names doubles the number of name columns)
The columns for the CSV format varies depending on which concepts are exported
The CSV can be optimized for humans (column ordering) and for Iniz (row ordering to minimize missing dependencies)
We could use a script like the one in my experiment to make it simple to generate a CSV from an OCL export wherever the full detail of the OCL export isn’t needed and a CSV format is preferred.