JSON representation of OpenMRS data

@darius , in a previous thread (see below, pre-openmrs.talk), you had discussed throwing rest representations of encounters + obs into elasticsearch. We are in the design process of moving data into elasticsearch at this point primarily for analytics purposes. However, in the future we’ll be working on an offline version that I’d like to utilize pouchdb to support. Ideally, we’d come up with a single nosql representation that we could use across nosql dbs (perhaps with some minor configuration changes).

Presently, as you mention, the representation conflicts with how elasticsearch wants one to do things. My preference to solving the “value” problem mentioned below would be to have obs represented identical to the mysql table. That is, we do not reduce the value to a single “value” field, but keep all the “valueNumeric, valueBoolean” etc as fields in the document. On the elasticsearch side, we could then define our mapping properly to handle the different data types and thus conduct queries properly.

@darius, did you make any decision on how to do this? Ideally we would change the rest ws module to reflect these changes. I suspect that anyone using rest may have already build models using the current form. It would be import to find a way of supporting both. I’m think, keep the current representation (and deprecate later) and add in this new representation.

Would love to hear the community’s thoughts/experiences with this.


From Feb 20, 2015

Related to this thread, I’ve been playing around with putting OpenMRS REST representations verbatim in ElasticSearch (ES). The hope is to be able to simultaneously store OpenMRS data so that it might be imported into another OpenMRS server (e.g. for patient transfers), while also taking advantage of some of the cool search features that ES can do.

ES is built on Lucene, and they pitch the idea that “Toss it a JSON document and it will try to detect the data structure, index the data and make it searchable.”

However I hit a problem related to the way we do our REST representation of Obs: because we have a “value” property whose datatype may vary, ES chokes (basically it infers the datatype from the first obs you put in, and fails when you try to put in an obs of a different datatype). I assume the underlying issue would hold for anything based on Lucene, but I haven’t verified that.

This is easy enough to work around, but I’m wondering if it’s telling us something, and that we’d be better off with our Obs representation having specific fields like valueNumeric, valueText, etc.

Anyone have thoughts on this?

1 Like

The quick answer is: that was a spike and I didn’t take it any further.

Aside: I’m not sure that you really want identical representations in all nosql databases. E.g. transactional usage of the data vs analytic usage of the data won’t prefer the same representation.

My offhand opinion is that having the REST representation go back to mirroring the MySQL table would be a step backwards, and undesirable. I prefer the idea that whatever process is populating your elasticsearch db takes care of the transformation of adding those raw fields if it’s what you want.

I would prefer to stick with a single “value” field, but I’d want that field to be an object with descriptive metadata attached (data type, mime type, or uri link, maybe, and perhaps server-provided formatting).

I’m curious what other real ground users of the REST API prefer!

I think I was thinking about a mapping shown below. The idea is not so much that every obsSet object would have every valueX, but it would allow for any valueX. The different fields is necessary for elastic. I supsect this isn’t so different than perhaps what you (@darius) were recommending.

Also interested to hear others thoughts on this.

Note on this mapping, I am calling any obs an obsSet and an obsSet is permitted to have obsSets. An elastic mapping would require you to explicitly declare in the mapping the number of permitted sets within sets. I suspect that 2-3 levels would be sufficient for most users.

"mappings": {
    "encounter": {
        "properties": {
            encounterDatetime: {"type": "date"},
            encounterTypeUuid: {"type": "string"},
            encounterTypeName: {"type": "string"},
            formUuid: {"type": "string"},
            formName: {"type": "string"},
            locationUuid: {"type": "string"},
            locationName: {"type": "string"},
            encounterId: {"type": "integer"},
            encounterUuid: {"type": "string"},
            providers: {"type": "string"},
            dateCreated: {"type": "date"},
            voided: {"type": "boolean"},
            dateVoided: {"type": "date"},
            obsSet: {
                "type": "nested",
                "properties": {
                    dateCreated: {"type": "date"},
                    obsDatetime: {"type": "date"},
                    conceptId: {"type": "integer"},
                    conceptUuid: {"type": "string"},
                    valueCoded: {"type": "integer"},
                    valueBoolean: {"type": "boolean"},
                    valueNumeric: {"type": "float"},
                    valueDatetime: {"type": "date"},
                    valueText: {"type": "string"},
                    locationUuid: {"type": "string"},
                    locationName: {"type": "string"},
                    voided: {"type": "boolean"},
                    dateVoided: {"type": "date"},
                    obsSet: {...}